Wednesday, May 28, 2025

A much better search

 Imagine for a second we have an array, and we want to find a particular number, and we want to do it more efficiently than the linear search: there’s a row of lockers, and is it in locker #0? No. Ok, is it in locker #1? No. Ok, is it in locker #2? No. Ok, is it in locker #3? Yes? Ok great, you found it!


To do better, we first need to guarantee the array is sorted. There are many sorting algorithms which we’ll go over soon, but for now, let’s just be hand-wavy, and assume that somehow, the array was sorted. You should not care in which direction, ascending or descending, it was sorted.

This method that looks for something through a sorted collection is called “binary search”, not because of 100100110101101, but because it splits things in half.

You probably do this (or your parents did), if you ever use a phone book. Imagine you’re looking for someone whose last name is “Kelly.” K is kind of near the middle of the alphabet, so you open the phone book in what you think the middle is. You’ve landed on a page full of “Ingleton” records. Not far enough into the Ks, but you do know something. Your “Kelly” friend is not in the part of the book from the beginning until “Ingelton,” since “K” comes after “I.” So you flip to the next page after “Ingleton,” and treat that as “page 1”. From there, you flip forward some amount trying to find “Kelly”, and you land in “Martinez.” Oops, too far. “Kelly” lies somewhere between “Ingelton” and “Martinez,” so you know two things: you shouldn’t waste time looking earlier in the alphabet than “Ingleton,” or later than “Martinez.” From “Martinez,” you flip backward and get to “Jones.” Now, again, you’re before “Kelly,” but your region to search through keeps shrinking. Now, it only makes sense to search between “Jones” and “Martinez.” You flip one page after Jones, and you find your friend’s number.

Harvard’s Dr. David Malan, in their CS50 introductory class, does this with a physical phone book every time he teaches the class, literally tearing the book in half (as opposed to just ignoring the irrelevant parts, where the answer cannot be) in the alphabetized phone book because he wants to find someone.

Imagine you have 1000 items, and you need to find item 371 to figure out what it is. You could go to item 1, then 2, then 3, then 4… then 370, then 371, and you’d know the answer, but this other way is much quicker. You go to item 500 in the middle right away. 371 is less than 500, so you immediately give yourself permission (since items are somehow sorted) to ignore 501-1000. The middle of your new search space is 250 (from 1 to 500), so you look there. You’ve looked at 2 items.  Now, you look just between 251 and 500, at the middle of that—376. Now, you’ve looked at 3 items. Your item is somewhere between 250 and the middle, so you ignore the top half. Now, the middle of the new range is 313. Your item is after 313, but before 376, so you ignore everything else. Then, the new middle is 345. Your item is between 345 and 376, so you ignore everything else. You look at the new middle—361—and decide your item is above it, so you ignore everything else. Your new range is now just 369 to 376. The middle is 373, so your item is in the lower half just from 369 to 373. You look in the middle, and you find your item.

Look how much quicker that was—500, 250, 376,  313, 345, 361, 373, found 371—than actually examining 370 previous items and failing to find it. This is why binary search is so powerful, and almost every “find me XYZ in a huge collection” in technology these days uses it. My best friend’s phone number is saved under her last name, “Elwood.” If I didn’t know that number by heart and still needed to look her up to call her, if I were to look for “Elwood” in my phone’s contact list, then examine the code Android is using to give me back her number, I am nearly certain that it would be doing this.

Don’t start thinking about binary search in code terms just yet. First, look at the pseudocode.

function binarySearch(array, target):

    low ← 0

    high ← length(array) - 1

    while low ≤ high do:

        mid ← low + (high - low) / 2

        if array[mid] == target then:

            return mid   // Target found at index mid

        else if array[mid] < target then:

            low ← mid + 1   // Search right half

        else:

            high ← mid - 1  // Search left half

    return -1   // Target not found

Understand how the pseudocode works—match it to finding “Kelly” or “371” in those stories—and then try it with a very small (maybe 10 items, if not less) sorted array. Try it out when you both can guarantee you’ll find the item and when you can guarantee you won’t. Call a friend, a parent, a sibling, a significant other into the room, and demonstrate the algorithm to them. Only once you’re confident you can explain it, implement this Java:

public class BinarySearch {

    public static int binarySearch(int[] arr, int target) {

        int low = 0;

        int high = arr.length - 1;

 

        while (low <= high) {

            int mid = low + (high - low) / 2;

 

            if (arr[mid] == target) {

                return mid; // Target found

            } else if (arr[mid] < target) {

                low = mid + 1; // Search right half

            } else {

                high = mid - 1; // Search left half

            }

        }

        return -1; // Target not found

    }

 

    public static void main(String[] args) {

        int[] arr = {1, 3, 6, 8, 10};

        int target = 8;

        int result = binarySearch(arr, target);

 

        if (result != -1) {

            System.out.println("Value found at index: " + result);

        } else {

            System.out.println("Value not found");

        }

    }

}

No comments:

Post a Comment

Switch

 Other than if/if-else/if-else if-else and the ternary operator, there is yet another common and important conditional expression in Java th...