Wednesday, June 25, 2025

Dijkstra's Algorithm

 Imagine for a minute you are Google Maps (because they actually do use something very similar to what we’ll describe). Your job then is to get someone from A to B as quickly as possible. How do you find the best path?


Suppose we have a road network, seen here:

Dijkstra's Algorithm - Shortest Path Finding

The numbers represent some resource (time, fuel, distance), which is precious, and which we want to use the least amount of, to get from one place to another.

Suppose we live at A and work at F.

We can use Dijkstra’s algorithm to figure out the shortest path.

1.       Assume that everything is infinitely far away, because we haven’t calculated ant distances

2.       Look at the graph, and see that from A, we can get to B in 4 and C in 5. Everything else is infinitely far away.

3.       Now, from B, we can get to D in an additional 9, or E in an additional 7. From C, we can get to B in an additional 11, or E in an additional 3. (We ignore going back to A for 5 as useless).

4.       We already know that the shortest path will not be ABE…, because there’s a faster way to get to E, via C, as in ACE…: ABE has a total cost of 4+7 = 11, and ACE has a total cost of 5+3 = 8; 8 < 11, so the 8-cost path is better.

5.       From E, we can go backwards to C (which we again ignore), or to B for 7, or to D to 13 or to F for 6. We choose the best (which happens to be the leg that gets us to our solution) and go to F for 6.

6.       Now, we have the whole path which minimizes the use of our precious resource: start at A, and go through C and E en route to F.

The following paths (without loops) are possible:

Path

Cost

ABEF

17

ABDF

15

ACBDF

27

ACEBDF

26

ACBEF

29

ACBDEF

46

ACEF

14

 

Of all the paths, the one we came up with with our strategy is the shortest between A and F.

Heuristically, the approach is twofold:

  • Choose the next location as the one that can be reached at the smallest total cost
  • Update the shortest known distance to every node, if a new shortest path has been found

This approach is Dijkstra's algorithm. If I'm not mistaken, it was invented in about half an hour while its inventor (a professor and pioneer of computer science) was on or waiting for an intercity train, sometime in the 1950s or 1960s-- and still today it remains a cornerstone algorithm at the heart of so many different applications. 

There is one important consideration when using Dijkstra's algorithm. Most of the time, edge weights are positive: it costs you some money, or it requires some amount of gas, or you need to spend time. But that does not always need to be the case in every graph; edge weights can be negative. However, if a graph has any negative edge weights, Dijkstra's algorithm will not necessarily work, so it should not be used. 

Here is the pseudocode for the algorithm:

function Dijkstra(Graph, source):

    // Initialize distances and previous node trackers

    for each vertex v in Graph:

        dist[v] := infinity

        prev[v] := undefined

    dist[source] := 0


    // Create a set of all unvisited nodes

    Q := set of all vertices in Graph


    while Q is not empty:

        // Select the unvisited node with the smallest distance

        u := vertex in Q with smallest dist[u]

        remove u from Q


        // For each neighbor v of u still in Q:

        for each neighbor v of u:

            alt := dist[u] + weight(u, v)

            if alt < dist[v]:

                dist[v] := alt

                prev[v] := u


    return dist, prev


And here it is in Java:

import java.util.*;

class Edge {
    int target;
    int weight;
    public Edge(int target, int weight) {
        this.target = target;
        this.weight = weight;
    }
}

public class Dijkstra {
    /**
     * Runs Dijkstra's algorithm on a graph represented as an adjacency list.
     * @param graph adjacency list: graph.get(u) returns a list of edges from node u
     * @param source the starting node
     * @return array of shortest distances from source to every node
     */
    public static int[] dijkstra(List<List<Edge>> graph, int source) {
        int n = graph.size();
        int[] dist = new int[n];
        Arrays.fill(dist, Integer.MAX_VALUE);
        dist[source] = 0;

        PriorityQueue<int[]> pq = new PriorityQueue<>(Comparator.comparingInt(a -> a[1]));
        pq.offer(new int[]{source, 0});

        while (!pq.isEmpty()) {
            int[] curr = pq.poll();
            int u = curr[0];
            int d = curr[1];

            if (d > dist[u]) continue; // Already found a better path

            for (Edge edge : graph.get(u)) {
                int v = edge.target;
                int weight = edge.weight;
                if (dist[u] + weight < dist[v]) {
                    dist[v] = dist[u] + weight;
                    pq.offer(new int[]{v, dist[v]});
                }
            }
        }
        return dist;
    }
}

Tuesday, June 24, 2025

Complexity basics

 From the earliest days of computing, a central question has been how we can solve a problem as efficiently as possible. That has meant several things over time to different people, but in modern times, we’ve settled on efficiency over two different parameters (which generally require tradeoffs with each other): the number of steps something takes, and the amount of memory it takes to do it.

 In computer science, we group things into complexity classes based on the worst-growing term, ignoring any coefficients. Think about it for a second, and pull up Desmos or another graphing calculator to follow along:

  • A^2+3A-25: it looks like A^2, because eventually, the A^2 term will be so much bigge than 3A or 25 that A^2 will dominate
  • X^3+2^X-1000000: it looks like 2^X because, eventually 2^X will be much bigger than X^3 or 1000000
  • x!-10^x-x^100-1000000000: it looks like x! because, eventually, x! will dominate all of 10^x, x^100, and 1000000000

In general, we prefer functions at the top of this list more than at the bottom, in terms of describing both space and time:

  • constant is better than
  • logarithmic is better than
  • linear is better than
  • log-linear is better than
  • polynomial is better than
  • exponential is better than
  • factorial is as bad as it gets

Let’s look at each one, with an example of each

  • “Go to this place in memory” is constant
  • “Do binary search through an already-sorted array” is logarithmic
  • “Do linear search” is linear
  • “Do merge-sort” is log-linear
  • “Do traditional, grade-school, digit-by-digit multiplication” is polynomial
  • “Count the number of nodes in a binary tree of a certain height” is exponential
  • “Tell me all the ways I can form a line out of students in my class” is factorial

Another way of looking at things:
  • If a problem always requires the same number of steps, it is constant
  • If there’s a single loop in the code, that moves by i++ or i--, it’s linear
  • If there's a single loop in the code, that moves by i*=2 (or i/=2) or something similar, it's logarithmic
  • If your problem splits things in half recursively and brings them back together at the end like MergeSort, it's log-linear
  • If there are some number of nested loops: i inside j inside k…, then it’s polynomial, and the degree is the depth of the nesting: i inside j inside k is 3 deep, so this is n^3
  • If one problem causes there to be some number of problems, which creates that many sub-sub problems for each sub problem, and so on (1 problem becomes 2 becomes 4 becomes 8…), then it’s exponential by the rate at which the problems are multiplying: 1 problem into 3 into 9 into 27 is 3^n, but 1 problem into 2 into 4 into 8 into 16 is 2^n, but either way 2^n or 3^n are both exponential
  • If you have no other choice but to calculate every possible arrangement of something, in order to get to a solution, then you’re looking at n! (like, for example, the hilariously bad “bogosort”: randomize a list, check if it’s sorted; if not, randomize again; keep randomizing and checking until a shuffle results in a sorted list)

Monday, June 23, 2025

Red-Black Trees

There exists a special kind of binary search tree called a red-black tree that implements certain extra rules to ensure it stays balanced. Balancing is the process of repositioning nodes to decrease the height of a tree, thereby preventing it from degenerating into a list. Red-black trees, because they impose these conditions to self-balance, perform better, and it takes longer for their performance to degrade. 


Normal binary trees have 4 important relations:

  • You: wherever you are
  • Your parent: whoever you are connected to, which is one level closer to the root than you
  • Your child(ren): whoever you are connected to, which are one level further from the root than you
  • Your sibling(s): whoever may or may not have, as the nearest common ancestor with you, as a parent, the same node that is your parent

Red-black tree rules force us to care about 2  more relations:

  • Your uncle: The sibling of your parent
  • Your grandparent: the parent of your parent

The following coloring rules apply:

  • The root is black
  • Null nodes are black by default
  • Red nodes’ children are not red
  • Every path from the root to a particular node (or, equivalently, up from that node to the root) passes through equal numbers of red and black nodes


Both inserting and deleting nodes might cause violations of these rules to occur. Here are possible violations and how to fix them:

If you insert as red:

  • And your parent is black
    • You’re fine!
  • And your parent is red
    • Keep reading and check on your uncle
    • And your uncle is red
      • Color your parent black
      • Color your uncle black
      • Color your grandparent red
      • Make sure that doesn’t cause any problems, and if there are any, fix them
    • And your uncle is black
      • If you’re a right child
        • Rotate left, i.e., counter-clockwise, about your parent
      • If you’re a left child
        • Rotate right, i.e., clockwise, about your grandparent and recolor


A left rotation looks like:

Before Rotation:

    x                                              

     \                                             

      y                                                         

     / \                                                     

    a   b                                                     


After Left Rotation:

      y

     / \

    x   b

     \

      a

And has these steps:

  • Set y to be the right child of x.
  • Move y’s left subtree to x’s right subtree.
  • Update the parent of x and y.
  • Update x’s parent to point to y instead of x.
  • Set y’s left child to x.
  • Update x’s parent to y.


And a right rotation looks like this:


Befor Right Rotation:    


      x

     /

    y

   / \

  a   b


After Right Rotation:


    y

   / \

  a   x

     /

b


and has these steps:

  • Set y to be the left child of x.
  • Move y’s right subtree to x’s left subtree.
  • Update the parent of x and y.
  • Update x’s parent to point to y instead of x.
  • Set y’s right child to x.
  • Update x’s parent to y.

After inserting as red, you might need to recolor:

  • If your parent and uncle are red, make them black and your grandparent red
  • If the new node’s uncle is black (and the new node is the right child of a left or left child of a right—but not left of a left or right of a right) rotate appropriately
  • If the new node’s uncle is black and this time the new node is a right child of a right child or left child of a left, rotate and recolor the parent and grandparent

After deleting:

  • If you have a black-black parent-child, this can cause problems
    • If you have a black sibling
      • Recolor your parent and sibling, and rotate
    • If your black sibling has black children
      • Recolor the sibling red, and now your parent has a problem. Fix it appropriately.
    • If your black sibling has any red children
      • Rotate and recolor


Sunday, June 22, 2025

Depth- and Breadth-First Searches

 Suppose now you’re looking for a particular value in a graph. (We’ll demonstrate things with a tree, but the graph does not need to be.) You essentially have two ways to go about this search (for example, to solve a maze).

Let’s say you have 10 paths before you, and each is 10 steps deep. You could either go 10 steps down #1, then maybe 10 steps down #2, then maybe 10 steps down #3, etc., until you find what you want or exhaust all options; or 1 step down paths 1-10, then a second step down paths 1-10, then a third step down paths 1-10, and so on, until, again, either tou find what you want or you exhausted all options.

If what you’re looking for is in the graph—an exit from a maze, for example—both of these methods will find the answer. Both are fundamental graph algorithms.

The algorithm by which you go as deep as you can, until you can’t anymore, is called a depth-first search: it prioritizes maximum depth on a path, before moving to another path. The alternative is the breadth-first search, by which you explore as many paths as possible at a certain level, before going to the next level for any of them.

Here's the pseudocode for depth-first:

procedure DFS(node, visited):

    mark node as visited

    for each neighbor in node.neighbors:

        if neighbor is not visited:

            DFS(neighbor, visited)

Note that DFS() is recursive and uses a stack.

And here’s the BFS pseudocode, which, this time, is iterative and uses a queue:

procedure BFS(Graph G, start_node):

    create an empty queue Q

    mark start_node as visited

    enqueue start_node into Q

 

    while Q is not empty:

        node = dequeue Q

        for each neighbor in node.neighbors:

            if neighbor is not visited:

                mark neighbor as visited

                enqueue neighbor into Q

Saturday, June 21, 2025

Full, Complete, Balanced, and Degenerate Trees

 We’ve looked at general binary trees—those graphs without cycles in a single component for which there is one root, and each node has either 0, 1 or 2 children.

Let’s now look at 3 subcategories of binary trees that impose more rules, each one getting closer to a “proper” binary tree by a stricter application of one of those rules.

  • First, full: A general binary tree in theory can have nodes with 1 child. A full tree is one in which none of these one-child nodes exist; all nodes either have 2 children or none
  • Next: complete: a general binary tree can have gaps. A complete tree is one in which, except (maybe) in the last level, all nodes have 2 children. If at the widest level, there are situations with fewer than 2 children, they occur as far to the right as possible
  • Finally: balanced: a balanced tree is one in which, looking at the left and right subtrees (those rooted at the children of the global root), the heights of those two subtrees are different by 0 or 1.  
  • In fact, it might be good to add a fourth category as an anti-pattern: the degenerate tree. Suppose for a moment that from the root, one always moves the same direction when adding a child: add the root, move right, add a child; move right, add a child; move right, add a child; and so on. In this case, this tree is no better than a linked list (and it effectively is one, called a tree by mistake)

                                                                                                                                                               
Try to draw these four groupings of trees. 

Binary Search Trees

 We’ve covered that a binary tree is a special kind of tree, in which each node splits into exactly 0, 1, or 2 others—and never more. We’ve also initially seen binary search done on an array: go to the middle index, decide which half, discard the “wrong” one, go to the middle, decide which half, etc., and keep doing this recursively until either you can prove that you’ve found the element you want, or that it definitely isn’t in your collection.

Now, let’s take those ideas and merge them: create a tree purporpse-built to execute binary search.  

Remember that one of the conditions for binary search to work—as opposed to the simpler linear search—is that the contents of the array (now a tree) need to be sorted.

When something is sorted, it helps to keep an eye on 3 reference points: the first element, the median element (half are “less/before” and half are “greater/after”), and the last element. (Which of the first versus last is the biggest and which is the smallest, of course, depends on the direction of the sort—ascending or descending.)

Knowing how a binary tree works, where would these three values go?

·       The median has half the data set to its left and half the data set to its right

·       The min has all the rest of the data set to its right

·       The max has all the rest of the data set to its left

Take a moment to pause, draw out a tree, and think about where these values should go.

Let’s start with the median, because this value actually occupies a special place. Where is the “center of gravity” (this isn’t a real term) of the tree? Naturally, the root—half of the data less than it is in the left subtree, and half of the data greater than it is in the right subtree!

Where then do the min and max live? Let’s start with the min. The min occupies the place that takes a left at every fork from the root (graphically, as far to the bottom left as possible). Likewise, the max takes a right at every fork from the root (and therefore is found as far to the bottom right as possible).

Suppose this is our tree:
Data Structure: Binary Search Tree | Learn. Share. Improve.


How did we build it?

  • 8 becomes the root
  • Then we add 3
    • 3 < 8, so go left
  • Then we add 1
    • 1 < 8, so go left
    • 1 < 3, so go left
  • Then we add 6
    • 6 < 8, so go left
    • 6 > 3, so go right
  • Then we add 4
    • 4 < 8, so go left
    • 4 > 3, so go right
    • 4 < 6, so go left
  • Then we add 7
    • 7 < 8, so go left
    • 7 > 3, so go right
    • 7 > 6, so go right
  • Then we add 10
    • 8 > 10, so go right
  • Then we add 14
    • 14 > 8, so go right
    • 14 > 10, so go right
  • Then we add 13
    • 13 > 8, so go right
    • 13 > 10, so go right
    • 13 < 14, so go left

To add 0 and 20:

Adding 0:

  • 0 < 8, so go left
  • 0 < 3, so go left
  • 0 < 1, so go left

Adding 20:

  • 20 > 8, so go right
  • 20 > 10, so go right
  • 20 > 14, so go right

Here is that process in pseudocode:

function insert(node, key):

    if node is null:

        return new Node(key)

 

    if key < node.value:

        node.left = insert(node.left, key)

    else if key > node.value:

        node.right = insert(node.right, key)

    // If key == node.value, do nothing, or handle duplicates as needed

 

    return node

 

Adding nodes in this way is easy, but how about removing, with the caveat that a binary search tree, as its name implies, is used for binary search, which requires a sorted data set?

We find ourselves in one of 3 cases:

  • The node to remove is a leaf, i.e., has no children—this is trivial
  • The node to remove is an interior node, i.e., having both children and parents
  • The node to remove is the root, i.e., having no parents, but having children

 

Here is the pseudocode for the removal of an interior node:

function deleteNode(root, key):

    if root is null:

        return null

 

    if key < root.value:

        root.left = deleteNode(root.left, key)

    else if key > root.value:

        root.right = deleteNode(root.right, key)

    else:

        // Node found

        if root.left is null:

            temp = root.right

            free(root)

            return temp

        else if root.right is null:

            temp = root.left

            free(root)

            return temp

        else:

            // Node has two children

            successor = findMin(root.right)

            root.value = successor.value

            root.right = deleteNode(root.right, successor.value)

    return root

 

function findMin(node):

    while node.left is not null:

        node = node.left

    return node

and here is the pseudocode to remove the root:

function deleteRoot(root):

    if root is null:

        return null

 

    // Case 1: Root has no children (tree becomes empty)

    if root.left is null and root.right is null:

        free(root)

        return null

 

    // Case 2: Root has only one child

    if root.left is null:

        temp = root.right

        free(root)

        return temp

    if root.right is null:

        temp = root.left

        free(root)

        return temp

 

    // Case 3: Root has two children

    // Find inorder successor (smallest in right subtree)

    successor = findMin(root.right)

    root.value = successor.value

    root.right = deleteNode(root.right, successor.value)

    return root

 

function findMin(node):

    while node.left is not null:

        node = node.left

    return node

Friday, June 20, 2025

Trees and Traversal Basics

 Trees are a special kind of graph—one of the most common data structures you’ll ever come across. The requirements to be a tree are quite straightforward: one single component, no cycles, and a clear sense of directional hierarchy, but in which movement is allowed in both directions through that hierarchy: up and down.

A tree has exactly one “root” node. In plain English, the root is the source, and everything branches out from it. We’ll need some more terms to create a more rigorous definition. If X is connected to Y, and the path from the root to X is shorter than the path from the root to Y, then X is the “parent” of Y, and Y is the “child” of X. Because trees cannot have cycles, a node has exactly one parent but may have several children. I will leave it as an exercise for the reader to prove to yourself that the way trees are defined guarantees that from the root to any given node, the path will always be unique.

One of the most common trees you’ll see is the binary tree. (“Binary,” as with the related search algorithm, does not mean 1100010101001, but “splitting things in half,” or, in this case, “no more than 2 children.”) A node that has no children is called a leaf.

Moving through a tree—to read its values or to do something with it—is called “traversing” it, and any set of all the nodes which traverses a tree is a “traversal.” To explain these, let’s stick with a binary tree, which, naturally, has a root, a left half, and a right half.

There are many common types of traversal:

  • Level-order
  • In-order
  • Pre-order
  • Post-order

Let’s move one by one, starting with a level-order traversal of this tree:

112

In a level-order traversal, we start at the root, then move down to the next level and read off left to right, then the next level, then the next level, and so on. A level-order traversal of this tree would read 5, 12, 13, 7, 14, 2, 17, 23, 27, 3, 8, 11.

Now let’s look at in-order, the first recursive traversal. In plain English, traverse the left, traverse the root, traverse the right.  

Let’s now look at this tree:
Inorder Traversal of Binary Tree - Recursive

To traverse this in-order, we look at the left side first, then the root, then the right.
The root is 2, and it’s left child is 5. 5 becomes the new “root” of a smaller subset of the original tree, which we call a “subtree” of the original.  But 5 is now treated as a root, and the order is left-root-right, so we go left again, and we hit a leaf. (Therefore, this process is recursive.) So we traverse the left subtree as 6, 5, 7. Then we come back up to the “real” root, and get the 2. The same process occurs in the right subtree: the subtree’s root is 10, so the left side, 20, comes first, then the 10, then the 30. So, put together, the whole in-order traversal of this tree would look like 6, 5, 7, 2, 20, 10, 30. Notice the recursive nature of this algorithm, and how innately recursive trees really are. Awareness of this fact now will serve you well when looking at pre- and post-order.

To look at pre-order, we can, in fact, use the same tree. (This will be good for you, because you can then see an apples-to-apples comparison of different traversals on the same tree.) Pre-order moves in the order “root, left, right.” Therefore, the order is: 2, left subtree recursively (5, 6, 7), right subtree recursively (10, 20, 30)—giving us a complete traversal of 2, 5, 6, 7, 10, 20, 30.

Finally, post-order is also recursive, and, again, using the same tree gives us the benefit of the comparison. Post-order processes the root last, so it moves through the left and right subtrees first. The root of the whole is the 2, and the left subtree is rooted at 5, and the right rooted at 10.  The traversal of the left subtree then is 6, 7, 5; the right subtree is 20, 30, 10; and the root is 2. Thefore, the whole postorder traversal is 6, 7, 5, 20, 30, 10, 2. 

Switch

 Other than if/if-else/if-else if-else and the ternary operator, there is yet another common and important conditional expression in Java th...