To get any deeper into algorithms on lists of numbers like we’ve seen, we have to at some point tackle the elephant in the room that is left unexplained in the last article: how we sort. But to get to sorting, we need to undergo a fundamental shift in how we think. Normally, a teacher will say it’s very bad form to define a thing in terms of itself; that may certainly be true in English class, but not in math or computer science. In math and computer science, we define problems in terms of themselves all the time with no issues at all: no wrong answers on account of that style (although mistakes can happen), no angry teachers, nothing like that.
We call these self-referential problems “recursive.”
In math, for instance, the function:
- f(0) = 0
- f(1) = 7
- f(2) = 27
- f(n) = f(n-1) / 2 + 3 * f(n-2)
In explaining recursion, I’ll point to math for two very famous sequences which are both classically defined this way: using addition, the Fibonacci sequence; and using multiplication, the factorial.
You have probably heard of Fibonacci in the context of breeding rabbits: given that you start with a pair of newborn rabbits, and they never die; and pregnancies last one month, and rabbits can breed after their first month, how do you figure out how many rabbits will exist in a certain amount of time?
- At time 0, there is 1 pair, the original immature pair
- At time 1, there’s still 1 pair, but now they’re mature and have just bred
- At time 2, there are 2 pairs
- The original
- Their baby offspring
- At time 3, there are 3 pairs
- The original
- Their first offspring
- Their new babies
- At time 4, there are 5 pairs
- The original
- Their first offspring pair
- The offspring of this pair
- Their second offspring pair
- The offspring of this pair
- And so on
How do you formulaically state how many pairs exist?
First, you have the initial conditions that at months 0 and
1, rabbit(x) = 1; there is 1 pair in month 0, and one pair in month 1. Then, notice
that for months 2 and beyond, the number of rabbits is rabbit(x-1) + rabbit(x-2)—that
is, how many pairs there were last month, plus how many there were two months
ago. This is because a rabbit is pregnant for one month, and then baby rabbits
can breed once they’re one month old (and they never die). Classically, the problem
is about a year later. Let’s check it out!
- When we start, there’s 1 pair
- A month later, there’s 1 pair
- 2 months later, 2 pairs
- 3 months later, 3 pairs
- 4 months later, 5 pairs
- 5 months later, 8 pairs
- 6 months later, 13 pairs
- 7 months later, 21 pairs
- 8 months later, 34 pairs
- 9 months later, 55 pairs
- 10 months later, 89 pairs
- 11 months later, 144 pairs
- A year later, 233 pairs!
This method of calculating the number of pairs works, but it’s
inefficient for a reason we’ll cover in a later article.
Now, let’s do something similar, but with multiplication.
Let’s say a class includes no students. How many ways are there
to form a line including no students? One way—[]. Now add Alice; how many ways
can Alice stand in line? One way—[Alice]. Now add Bob. Two ways—[Alice, Bob] or
[Bob, Alice]. Now add Charlie. How many ways? 6—[A, B, C], [A, C, B], [B, A,
C], [B, C, A], [C, A, B], [C, B, A]. If Dave
joins the class, there are 24 ways; if Ellie joins the class, 120; if Finn
joins, 720; if George joins, 5040. By the time Jenny joins the class, there are
more than 3 million ways to stand in line. By the time Zayden joins, the class, there are
about 2000 times as many ways to stand in line as there are total cells in all people
alive.
Mathematically, the function that counts the number of
orderings is called the factorial function, written with ! (so n! is read as “n
factorial” or “the factorial of n”) and means 1 times 2 times 3 times 4 times…
up to n. We can define this as n! = n*(n-1)!
Notice how in both the additive Fibonacci case and the multiplicative factorial,
we define the problem (factorial of 7, 12th Fibonacci, etc.) in
terms of smaller versions of that same problem. Factorial of 7 is 7 times factorial
of 6; and Fibonacci of 12 is the sum of Fibonacci of 10 and Fibonacci of 11. Factorial
of 6, in turn, depends on knowing factorial of 5; and Fibonacci of 10 requires knowing
F(8) and F(9), and Fibonacci of 11 requires knowing F(9) and F(10). We keep
building and solving self-similar problems, until we get to a problem we solve
without this self-similarity because it’s small or easy enough.
For Fibonacci, it’s the initial-month conditions:
- F(0) is 1, the initial baby rabbits
- F(1) is still 1, the now-grown rabbits
For Factorial, either of the following works:
- 0! Is 1
- 1! Is 1
One place where beginning coders fail—and then get welcomed
as “real developers” because you’ve suffered through this rite of passage—is in
forgetting that all recursive problems need to have at least one stopping condition
for which the answer is known without recursion, and that each successive problem
(Fact(10) needs Fact(9) needs Fact(8) needs Fact(7) and so on, until Fact(1) or
Fact(0) is given as 1) must in some way get closer to the stopping condition(s)
than problems that came before it. Failure to both define and approach a
stopping condition will produce the ceremonial first “StackOverflowError,” which
every programmer must experience. (This is, by the way, the reason the website
has that name.)
Make sure you have a great understanding of this concept before
you ever move on to sorting. Call your friends or family into the room and
explain it. Until they understand it, don’t move on. (This is because a lot of
sorting is, at a high level, “to sort, divide, sort each piece, and put them together”—that is, the
definition of, say, “sort(10)” requires you to “sort(5)” twice and put them
together. In other words, a lot of sorting is, fundamentally, recursive.)