Tuesday, June 3, 2025

The Tower of Hanoi: a Classic Problem

You’ve now been thinking about recursion for a few days, so it’s a good time to introduce one of the classic problems in early-education computer science: the Tower of Hanoi. Now, fair warning. When I learned the solution, about 2 years ago, it took me days of struggle and hours of literally playing with plates from the kitchen to figure it out. I will give the solution, but I cannot stress enough how important it is that you not read it until you too, have played with your own plates and come to it yourself. On a per-line basis, I think this was the code that was most thought-dense; that is, that required the most time thinking per line written, that I have ever written. 

The story is the following. You’re a priest in a temple in Hanoi, and there are 3 giant pillars in the courtyard. On one of those disks is some number of increasingly sized disks as you work toward the base: the higher you go, the smaller the disks, and the lower you go, the bigger the disks. 

Your task is the following:
  • Given where the pile is now, move it all to another (specified) rod in such a way that
  • You only move one disk at a time and
  • No bigger disk ever is on top of a smaller one and 
  • Hopefully you get there in as few moves as possible because there’s a really important festival that cannot start until you’ve finished moving the temple’s disks

After factorials and Fibonacci, this is one of the most classic, basic recursive problems out there. Because it’s so basic, there are hundreds, thousands of videos and articles out there explaining the solution (and this is another). Resist the temptation to consult them until after you’ve found the solution yourself. 

The solution comes in two parts:
  • What pseudocode solves the problem? And
  • What sequence of numbers counts the number of steps it takes, as the number of disks changes? 
The most important problem-solving skill in the beginning of tackling a problem like Hanoi is knowing how to reduce a problem to something as simple as possible. You are guaranteed to have at least one disk, so what’s the simplest problem? One disk, of course!

All the screenshots here are from Tower of Hanoi | Math Playground. I encourage you to use it (or other games like it; there are plenty online) to work through the solution. But really, if you have them, literal physical dinner plates of various sizes are even better. 
Now, the game state is such that there’s one disk on A. I want it to go to B. 
 
I can move it from A to B directly without violating any of the rules, so I will. 
 
So I’ve solved the (trivial) problem of moving one disk, when there is only one disk—just move it. 

Recursion always has a base case—a stopping condition to prevent it from going on forever. We’ve just found it. This “move one disk” case will actually be very important at a key time (always at the same time) in the more general solution. 

The thing that took me days to figure out was how the Tower problem is recursive. If the tower is n high, wouldn’t you say that moving all but one disk, then moving one disk, would solve it? Surely, right? Yes! This is the key insight into the problem—the “aha!” moment I did  not want stolen earlier.
So, generally, get the top (n-1) disks out of the way, deal with the biggest (bottom) disk with the trivial move, and that should be enough, right?

Think about this. The whole pile starts on A and wants to end up on B. So… can we move the top (n-1) from A to B, and then move the big disk from A to B? No! We’d break the rules, because this would mean that the biggest disk would end up on top of the smallest. So we can’t just go A->B in one fell swoop. We need to use C as a go-between.

What we must do is in fact this (to move everyone from A to B):
  • Almost the whole pile from A to C. 
  • The big disk directly from A to B in one move
  • The whole pile on C back to B. 
But notice how this is recursive. Steps 1 and 3 involve moving whole (slightly-smaller) piles to or from an intermediary rod, which isn’t the initial source or final destination.

We use C as a go-between to get most of the pile out of the way, to allow direct movement from A to B of the biggest disk. But to move most of the pile from A to C, B (the real destination) must have actually been a go-between, for an (n-1)-1 pile, move the biggest disk of the reduced pile, move the (n-1)-1 pile again, etc. 

Then, by the same logic, once we’ve gotten to the go-between Pole C, we need to get from there to the real destination Pole B—using the original origin Pole A as the go-between this time. 

Every one of these “move the big sub-pile” sub-problems is self-similar to the original problem, until the problem becomes simple enough that the answer is “just move one disk.”

A helpful observation is that you only ever have 3 possible moves: one is correct, one is useless, and one is illegal. 

The correct one gets us closer to accomplishing at least a sub-goal (or sub-sub, or sub-sub-sub-sub… depending on how many disks there are). The useless one undoes what we just did; we just wasted a move for no reason, so it will never be in our best interest to take this route if we want to minimize the number of moves. The last option isn’t really even an option. It’s illegal because it puts a bigger disk on top of a smaller one.   
Let’s go frame by frame to solve it for 3. Then, seeing this approach, you should in theory, be able to play with any number of disks. 

We start all on A, trying to go to C:
 









Move 1: Disk 1 from A to C

 


 Move 2: Disk 2 to B










Move 3: Disk 1 to B

 










Move 4: Disk 3 to C


 









Move 5: Disk 1 to A












Move 6: Disk 2 to C



 






Move 7: Disk 1 to C


 









The game on this website actually tells us we were as efficient as we could have hoped to be, with our 7 perfect moves (the modal might make it difficult to see, but, as intended, the whole pile of 3 disks is now on Peg C).

Even the 1-disk case has this 3-part structure, but it’s a little sneakier:
  • Move *no* disks from A to B. 
  • Move 1 disk from A to C. 
  • Move *no* disks from B to C. 
0, then, really is the most trivial of base-cases.

In any case, notice when (and how often) the biggest disk moved. In a 2-disk case, it moves on move 2 of 3; in this case, it moved on move 4 of 7; in a 4-disk case, it would move on move 8 of 15; in a 5-disk case, it would move on move 16 of 31. 

We’ve done the hardest part, which is discovering that “move the top n-1 to a go-between; move the biggest disk directly; move the top n-1 to the final destination” is an efficient process. The next part is figuring out the rules for determining when the big disk moves, and how many moves there are, in a general, efficient solution. 
Do you see a pattern, first in the move numbers of the big disks? (Other than that they only move once—which is key!). 
  • The big-disk move number is always even, or 1 in the 1-disk case
  • The big-disk move number doubles with each additional disk
  • The big disk move number is as close as possible (4/7, 8/15, 16/31, 32/63, etc.—getting even closer with more disks) to ½ of the total number of moves
  • If n-1 disks can move in X total, the big disk in the n-disk case, will move on move X+1; outside the trivial 1-disk case, X is always odd, and so X+1 is always even
  • If you’ve made it this far, you like computers, so you probably know about powers of two and binary. If you do, you probably recognize the sequence 1, 2, 4, 8, 16, 32, 64, etc.—as more than just “when the big disk moves.” These are the powers of two: start with 1, and for each additional term, double it to get the next one. Related to this sequence is 1, 3, 7, 15, 31, 63, etc.
Notice that both of these sequences play important roles in counting things. The pure power sequence counts when the big disk moves, when there are (power+1) disks: 2^0 is 1, so (power+1) = 1, so when there is 1 disk, the big disk moves on move 1; the big disk moves on move 2 when there are 2 disks (2 is 2^1 and 1+1 is 2 is the number of disks); on move 4 when there are 3 disks (4 is 2^2 and 2+1 is 3 is the number of disks), etc.

And the total number of moves is 2^(disks)-1: when there’s 1 disk, 2^1-1 is 1; when there are 2 disks, 2^2-1 is 3; when there are 3 disks (as we played out), 2^3-1 is 7; etc. For N disks, there’s exactly a one-to-one correspondence between efficient moves in the N-disk case and N-bit binary numbers.


  

Monday, June 2, 2025

Inheritance

 Everyone agrees that, for example, a Dog is an Animal, biologically speaking. In Java, just as in biology, this is called “inheritance.”


Without giving too many specifics, let’s say we want for the Dog class to receive some behaviors from Animal, or we want to slightly tweak Animal’s behavior as it comes across in a Dog.

Before we go any further, it’s important to realize the obvious fact that all Dogs are Animals, but not necessarily all Animals are Dogs. This matters when we are talking about inheritance.

Let’s say an Animal has:

  • a name—this is a String
  • an age—this is an integer
  • a type—this is a String
  • a color—this is a String
  • a munch(int) method—this represents that the Animal can eat, and takes in a number; “much” is then printed that many times, each time on a new line
  • and a makeSound(int) method—this represents that the animal can make a sound, and takes in the number of times that sound should happen; the default sound is “moo”

We like all that in Animal, but obviously, Dogs don’t “moo”, they “woof.” Through inheritance—linking Animal and Dog in a way that gives Dog Animal’s properties for free—we can then say “in the special case that an Animal is a Dog, change the makeSound() behavior to ‘woof’ from ‘moo.’”

We imply the “is a” relationship (as in, “a Dog is an Animal”) with the extends keyword in Java.

  • Implement Animal
  • Then start working on Dog, declared as public class Dog extends Animal
  • Then, when you create a Dog object—let’s say, by Dog lucky = new Dog();, you can call lucky.makeSound(3); and you will see
    Moo
    Moo
    Moo


But Lucky (my actual dog) is a Golden Retriever, not a cow, so he does not moo.

Inside Dog, we can then change this behavior by declaring the following method: one that has the same signature as makeSound() did in Animal, but appears exclusively in Dog.

Inside that, we can print “Woof” instead, and when lucky.makeSound(n) is called, we will see many “Woof”s instead of many “Moo”s. 

There are two things we can do with those method signatures: overload and override.

  • Overloading is when you use the same name with different parameters, like two versions of an add() method, one that takes in ints and another that takes in doubles
  • Overriding, on the other hand, is when Dog extends Animal and Dog’s makeSound() says “Woof” when the generic Animal makeSound()  says “Moo”—you change the behavior in the more specific class to fit your needs, from what it was in the more generic.
It is a fact that everything you ever create—a MathLesson, a HelloWorld, a Puppy, a Student—is a child of Object. A MathLesson is an Object; a HelloWorld is an Object; and so on. This is so universally true that it is never explicit.

Since you get access to all the public fields and methods of an object upon inheritance from that—this is why Dog objects can, for free, get the default behavior of makeSound() and the other Animal fields and methods—every object ever has several common methods before anything is ever explicitly defined.

Two of the most common methods from Object are:

  • hashCode(): which feeds the object in question (that is, one particular FluffyUnicorn) through a function that is only easily computable in one direction (and nearly impossible in the other, with the caveat that checking the original direction’s result is just as easy), to get a unique identifier for that object
  • toString(): which finds some way—by default, with where the object is in memory—of converting the object to a String for visual representation

If I have a CollegeStudent who extends Student (explicitly, and, of course, Object, implicitly), where CollegeStudent samJones is in memory is of next to no value to anyone. I would much rather know what samJones.major, samJones.year, samJones.college, samJones.GPA, etc., are. So, many times, people override the toString method to something much more useful, like a “{“ plus something like “GPA:“ + samJones.GPA, and then comma-separating the name of the property and its value, before closing out with a matching “}”

like, for instance

    @Override

    public String toString() {

        return "CollegeStudent{" +

                "name='" + name + '\'' +

                ", idNumber=" + idNumber +

                ", age=" + age +

                ", creditHours=" + creditHours +

                ", points=" + points +

                ", GPA=" + String.format("%.2f", getGPA()) +

                ", courses=" + courses +

                '}';

    }

which might produce an output something like CollegeStudent{name='Alice Smith', idNumber=12345, age=20, creditHours=30, points=90, GPA=3.00, courses=[Math 101, English 201]}

The @Override line is called an “annotation.” They’re good practice but omitting them is not an error that will trip the compiler. They simply tell the compiler that some default behavior is being replaced by more desirable behavior.  

Overloading and overriding are dealt with by the compiler at different times: overloading at compile-time and overriding only at runtime.

Extending a parent class into a child (in “Dog extends Animal,” Animal is the parent class of Dog, since all Dogs are Animals but not vice versa) gives you access to anything public in the parent class, for free. It also means that, for instance

Animal myAnimal = new Dog();

Is valid, since every Dog is an Animal, but that

Dog myDog = new Animal() is not allowed, since not every Animal is guaranteed to be a Dog.

This (correct usage, like Animal myAnimal = new Dog()) is an example of polymorphism in Java, which is a really important concept. I cannot stress enough how critical it is to understand this relationship, and why the first of these two last code snippets works, and why the second one does not.

Finally, let’s say I have a class Percheron. There are some properties in Horse I would like to get, and some other particular properties in DraftHorse I would like to get. In some languages, you could certainly say

public class Percheron extends DraftHorse, Horse

and be totally fine.

However, this is called “multiple inheritance” and, except for the fact that everything implicitly inherits from Object, this (inheriting from both Horse and DraftHorse) is not allowed at all in Java. You must pick whether Percheron should inherit from Horse or DraftHorse, and then if it extends any of them, it must extend at most one of them. 


Sunday, June 1, 2025

MergeSort

 We have come now to my favorite sorting algorithm, and, for the first time, one that will need more space than the array we want to sort. This time, the algorithm is mergeSort, which works recursively. This is, incidentally, one of the reasons I like it more than the others—the other reason being that it’s faster, in a way we’ll quantify soon.


MergeSort in plain English is as follows:

  • Split the array in half until you’re left with a bunch of individual elements
  • Now, take the old arr[0] and old arr[1] and compare them. Put them in the right order, in an array with two elements
    • Eventually, you’ll form a bunch of pairs, which, as a pair, will be sorted
    • Then, combine those pairs into 4s
    • Then combine those 4s into 8s
    •  And so on, until you have a sorted array

Again, let’s look at our example:

We have

3

7

2

4

1

5

6


Which we will now split into

3

 

7

 

2

 

4

 

1

 

5

 

6

Now, let’s make couples.

3 and 7 are sorted, so we are fine with:

3

7


2 and 4 are sorted, so we’re fine with

2

4


1 and 5 are sorted, so we’re fine with

1

5

 

6 is sorted by itself so we’re fine with

1


Now, let’s merge (and sort) groups of couples into 4s:

From (3,7) and (2, 4):
The minimum of 2 and 3 is 2, so 2 goes first

2

 

 

 


From (3, 7) and (4):
The minimum of 3 and 4 is 3, so 3 goes next

2

3

 

 


From (7) and (4):
The minimum is 4, so 4 goes next

2

3

4

 


From (7) and ():
All we have left is the 7, so it gets the last spot

2

3

4

7



And now the other half:
From (1, 5) and (6), the minimum is 1, so 1 goes first

1

 

 


From (5) and (6) the minimum is 5, so 5 goes next:

1

5

 


From () and (6), all we have left is the 6, so it takes the last place:

1

5

6


And now we have halves of the array sorted, so we just need one more merge:

From (1, 5, 6) and (2, 3, 4, 7), the minimum is 1:

1

 

 

 

 

 

 

 
Then we take the 2 from the other list:

1

2

 

 

 

 

 


Then we stay in the second list to get 3:

1

2

3

 

 

 

 


Then we stay in the second list (again) to grab the 4:

1

2

3

4

 

 

 


Now we go back to the first list for the 5:

1

2

3

4

5

 

 


The 6 also comes from there:

1

2

3

4

5

6

 


And there’s nothing in one of the lists now, but the other still has just the 7, so it’s our last element, and now everything is properly sorted:

1

2

3

4

5

6

7


We call these extra arrays we need to hold the smaller parts of the set we’re sorting “subarrays” or “auxiliary arrays” or “helper arrays.”


Before you go on, do what I had to do when I first learned this algorithm: make a video or live demo for your family of you mergeSort-ing about that many red Solo cups marked with numbers on their bottoms. If you can correctly sort the cups, move on to the pseudocode.

function mergeSort(A):

    if length(A) <= 1:

        return A

 

    mid = length(A) / 2  

    left = mergeSort(A[0 : mid]) // this creates a left half, from 0 to mid-1

    right = mergeSort(A[mid : length(A)]) // this creates a right half, rom mid to length(A)-1

 

    return merge(left, right)

 

function merge(left, right):

    result = empty array

    while left is not empty and right is not empty:

        if left[0] <= right[0]:

            append left[0] to result

            remove left[0] from left

        else:

            append right[0] to result

            remove right[0] from right

 

    append any remaining elements of left to result

    append any remaining elements of right to result

 

    return result                                 

 

Spend some time with this pseudocode and understand why it is equivalent to the example, or to what you did for your family with the cups. Once you are confident you understand the pseudocode—and only then—move on to the real Java code, found below:

public class MergeSort {

    // Main function to call mergeSort

    public static void main(String[] args) {

        int[] array = {12, 11, 13, 5, 6, 7};

        mergeSort(array, 0, array.length - 1);

        System.out.println("Sorted array:");

        for (int i : array) {

            System.out.print(i + " ");

        }

    }

 

    // Recursive mergeSort function

    public static void mergeSort(int[] array, int left, int right) {

        if (left < right) {

            int mid = (left + right) / 2; // Use division to find the midpoint

            mergeSort(array, left, mid);

            mergeSort(array, mid + 1, right);

            merge(array, left, mid, right);

        }

    }

 

    // Merge function to combine two sorted halves

    public static void merge(int[] array, int left, int mid, int right) {

        int n1 = mid - left + 1;

        int n2 = right - mid;

 

        int[] L = new int[n1];

        int[] R = new int[n2];

 

        for (int i = 0; i < n1; ++i)

            L[i] = array[left + i];

        for (int j = 0; j < n2; ++j)

            R[j] = array[mid + 1 + j];

 

        int i = 0, j = 0, k = left;

        while (i < n1 && j < n2) {

            if (L[i] <= R[j]) {

                array[k] = L[i];

                i++;

            } else {

                array[k] = R[j];

                j++;

            }

            k++;

        }

 

        while (i < n1) {

            array[k] = L[i];

            i++;

            k++;

        }

 

        while (j < n2) {

            array[k] = R[j];

            j++;

            k++;

        }

    }

}

Saturday, May 31, 2025

SelectionSort

 Selection sort is yet another simple sorting algorithm:

  • Start with an unsorted array
  • Find the smallest element in the unsorted part
  • Swap it with the first unsorted element
  • Keep growing the sorted part of the array (and correspondingly shrinking the unsorted part) until the whole array is sorted

Again, imagine we have as an example the starting state:

3

7

2

4

1

5

6

The smallest unsorted element is 1, and the first is 3. So we swap them.

1

7

2

4

3

5

6

Next, the unsorted part has 6 elements, and the 1 is sorted. The first unsorted element is 7, and the smallest is 2. Swap them.

1

2

7

4

3

5

6


The unsorted part now has 5 elements, the smallest of which is 3, and the first of which is 7. Swap them.

1

2

3

4

7

5

6

                                    

The unsorted part now has 4 elements, the smallest of which is 4, and the first of which is (by coincidence) also 4. Swap them. (It will appear as if nothing changed, but after this step, we can call the 4 officially sorted.)

1

2

3

4

7

5

6


The unsorted part now has 3 elements, the smallest of which is 5, and the first of which is 7. Swap them.

1

2

3

4

5

7

6


The unsorted part now has 2 elements, the first of which is 7 and the smallest of which is 6. Swap them.

1

2

3

4

5

6

7


The unsorted part now has 1 element, the first of which is 7 and the smallest of which is 7. Swap it. (Nothing will change, but now the 7 is considered sorted.)

1

2

3

4

5

6

7


Before you go on, do what I had to do when I first learned this algorithm: make a video or live demo for your family of you selectionSort-ing about that many red Solo cups marked with numbers on their bottoms. If you can correctly sort the cups, move on to the pseudocode.

for i from 0 to n - 2 do

    min_index = i

    for j from i + 1 to n - 1 do

        if array[j] < array[min_index] then

            min_index = j

    if min_index ≠ i then

        swap array[i] and array[min_index]

Spend some time with this pseudocode and understand why it is equivalent to the example, or to what you did for your family with the cups. Once you are confident you understand the pseudocode—and only then—move on to the real Java code, found below:

public class SelectionSort {

    public static void selectionSort(int[] array) {

        int n = array.length;

 

        // Traverse through all array elements

        for (int i = 0; i < n - 1; i++) {

            // Find the minimum element in unsorted array

            int minIndex = i;

            for (int j = i + 1; j < n; j++) {

                if (array[j] < array[minIndex]) {

                    minIndex = j;

                }

            }

 

            // Swap the found minimum element with the first element

            if (minIndex != i) {

                int temp = array[i];

                array[i] = array[minIndex];

                array[minIndex] = temp;

            }

        }

    }

    // Example usage

    public static void main(String[] args) {

        int[] arr = {3, 7, 2, 4, 1, 5, 6};

        selectionSort(arr);

 

        // Print sorted array

        for (int num : arr) {

            System.out.print(num + " ");

        }

        // Output: 1 2 3 4 5 6 7

    }

}

Switch

 Other than if/if-else/if-else if-else and the ternary operator, there is yet another common and important conditional expression in Java th...