Monday, November 19, 2012

Coin Change Kata in Racket and Clojure

There's a Code Kata that is being discussed in the Clojure community right now -- the Coin Change Kata.  As someone who programs regularly in both Racket and Clojure, I thought it would be fun to demonstrate how to tackle this problem in both languages, highlighting the differences.  The Racket version is a little more straightforward, so let's begin with that.

The basic idea of the Coin Change Kata is that you are given a list of coin denominations and an amount of money, and you need to find the way to achieve that amount of money in the smallest amount of coins.  For example, if you want to make 18 cents out of standard U.S. coin denominations, the best way would be one dime, one nickel, and three pennies.  The official description of the kata gives some latitude about how to express the output of your function.  I think a dictionary (aka hash table) is a perfectly sensible way to do it.

So, in Racket terminology, we're looking for a function that behaves like this:
(make-change '(1 5 10 25) 18) produces #hash((1 . 3) (5 . 1) (10 . 1))

Most beginners, when faced with this problem, immediately reason about the problem using the most obvious example they are familiar with: U.S. coin denominations.  In our experience counting out money, we know that the best strategy is to first use as many quarters as possible, then as many dimes as possible, then as many nickels as possible, and finally, finish off with pennies.

So the strategy seems straightforward.  Sort the denominations in descending order, then work your way down the list, using as many as possible of the big coin denominations first.  Many of the solutions that others have provided for this kata employ this strategy, known as the "greedy strategy".  Those solutions are wrong. 

To see this, consider (make-change '(1 20 25) 80).  If you start by counting out quarters, you'll get to 75 cents, and then you make up the difference with pennies, for a total of 8 coins.  Clearly, you're better off just using four 20 cent coins.  So a more thorough search algorithm is required.

There's one other confounding factor that is often overlooked by beginners -- what if the problem is impossible?  For example, what should (make-change '(2 10) 13) return?  With that in mind, let's refine our contract for make-change and say that it either returns a dictionary of how to make change, or it returns false if the problem is impossible.

Before we get into the meat of the problem, there are a couple helper functions that we're going to want to have.  First, it seems clear that a big part of what we'll be doing is comparing solutions to see which one uses the fewest coins.  So we need a way to count the coins in a "change dictionary".

;; count-coins: dict -> nat
;; counts how many coins are in the change dictionary
(define (count-coins change)
  (apply + (dict-values change)))


It would also be helpful to have a way to increment the coin count of a single denomination in a change dictionary.  For example,
(change-increment #hash((10 . 1) (25 . 2)) 25) gives #hash((10 . 1) (25 . 3))

This is also straightforward:
;; change-increment: dict nat -> dict
;; Takes a change dictionary and a coin, returns the 
;; change dictionary with that coin's count incremented
(define (change-increment change coin)
  (dict-update change coin add1 0))


Now we're ready to dive in.  I am aware of two basic recursive strategies for tackling this problem.

Strategy 1:  Include the first coin denomination, or not

To illustrate this idea, let's go back to U.S. denominations.  If I want to find the best way to make 15 cents with the denominations '(1 5 10 25), I have two scenarios to consider.

The first scenario is to include the first denomination, i.e., find the best way to make 14 cents with '(1 5 10 25) and then increment the penny count of that solution by 1.  The best way to make 14 cents with '(1 5 10 25) is #hash((1 . 4) (10 . 1)).  Increment the penny count, and you get #hash((1 . 5) (10 . 1)).  So this is one of our candidate solutions.

The second scenario is that I ignore the penny completely and find the best way to make 15 cents out of '(5 10 25).  The best way to do this is #hash((5 . 1) (10 . 1)).

In this example, the second scenario yields the superior solution, so that's the answer.

An outline of this algorithm would look like this:
(define (make-change denominations amount)
  (cond
    ;; Insert base cases here 
    [else
     ;; Case 1: Use the first denomination
     (define option1 (change-increment
                      (make-change denominations 
                                   (- amount (first denominations)))
                      (first denominations)))
     ;; Case 2: Or not
     (define option2 (make-change (rest denominations) amount))
     ;; Which is best?
     (argmin count-coins (list option1 option2))]))


It's not entirely obvious what the base cases are, but if you have experience with recursive algorithms, you can see that this algorithm either reduces the length of the denomination list, or reduces the target amount of money with each step.  So there are really two separate base cases to consider: either the denomination list gets to empty or the amount reaches zero (or maybe even becomes negative!).

Filling in the base cases looks like this:
(define (make-change denominations amount)
  (cond
    [(negative? amount) false]
    [(zero? amount) #hash()]
    [(empty? denominations) false]
    [else
     (define option1 (change-increment
                      (make-change denominations (- amount (first denominations)))
                      (first denominations)))
     (define option2 (make-change (rest denominations) amount))
     (argmin count-coins (list option1 option2))]))


Uh oh.  There's a problem here.  make-change can potentially return false, and the helper functions we created (i.e., change-increment and count-coins) don't handle false values.  There are several possible ways to tackle this, but I think the simplest way is to just modify the helper functions to handle the false value gracefully.

change-increment is easy to modify -- false in should mean false out.
;; change-increment: dict-or-false nat -> dict-or-false
(define (change-increment change coin)
  (if change
      (dict-update change coin add1 0)
      false))


It's a little less obvious how to modify count-coins.  However, we're using count-coins within an argmin in order to find the solution with the fewest coins.  To make this work, we just need to ensure that when count-coins is passed a false value, the output is something that can never be a "minimum".  A standard trick to accomplish this is to set the output to "infinity".

;; count-coins: dict-or-false -> nat-or-infinity
(define (count-coins change)
  (if change
      (apply + (dict-values change))
      +inf.0))


Now, we have a working make-change function that passes all test cases.  Unfortunately, as you test the make-change function with larger and larger amounts, it gets really, really sloooooow.  Functional programmers know that there's a simple trick to get around this -- memoization.  A quick and easy way to add memoization is to add the line:
(require (planet dherman/memoize:3:1))
to the top of your file and change define to define/memo in the definition of make-change.

[The first time you include the memoization library, Racket will churn for several minutes and print out hundreds of error messages as it downloads and compiles the library locally on your machine.  After that, the library will be available on your system, and future inclusions will be speedy.]

The final Racket version of Strategy 1:
#lang racket
(require rackunit)
(require (planet dherman/memoize:3:1))

;; count-coins: dict-or-false -> nat-or-infinity
;; counts how many coins are in the change dictionary
;; or returns infinity if input is false
(define (count-coins change)
  (if change
      (apply + (dict-values change))
      +inf.0))
  
;; change-increment: dict-or-false nat -> dict-or-false
;; Takes a change dictionary and a coin, returns the 
;; change dictionary with that coin's count incremented
;; or false if change input is false
(define (change-increment change coin)
  (if change
      (dict-update change coin add1 0)
      false))

;; make-change: list-of-nats nat -> dict-or-false
;; Takes a list of coin denominations and an amount that
;; needs to be broken up into change.  Returns a
;; "change dictionary", i.e., a hash table that maps
;; coin denominations to counts, corresponding to the 
;; most efficient way that sums to the desired amount.
;; Returns false if impossible.
(define/memo (make-change denominations amount)
  (cond
    [(negative? amount) false]
    [(zero? amount) #hash()]
    [(empty? denominations) false]
    [else
     (define option1 (change-increment
                      (make-change denominations (- amount (first denominations)))
                      (first denominations)))
     (define option2 (make-change (rest denominations) amount))
     (argmin count-coins (list option1 option2))]))
       
(check-equal? (make-change '(1 5 10 25) 18)
              #hash((1 . 3) (5 . 1) (10 . 1)))
(check-equal? (make-change '(1 20 25) 80)
              #hash((20 . 4)))
(check-equal? (make-change '(1 24 25) 98)
              #hash((24 . 2) (25 . 2)))
(check-equal? (make-change '(2 10) 13) false)



Strategy 2: Which coin to use next?

Going back to our example of finding the best way to make 15 cents out of '(1 5 10 25), there's a completely different strategy we could take.  The first coin I use could either be a penny, nickel, dime, or quarter.  So if I knew the best way to make 14 cents, 10 cents, 5 cents, and -10 cents out of '(1 5 10 25) [note that these values are 15-1, 15-5, 15-10, and 15-25, respectively], then I take the best of those ways, and increment the count of the corresponding coin to get the best way to make 15 cents.

Using the same helper functions as Strategy 1, here is the final Racket implementation of make-change, Strategy 2:
(define/memo (make-change denominations amount)
  (cond
    [(negative? amount) false]
    [(zero? amount) #hash()]
    [(empty? denominations) false]
    [else
     (define options (for/list ([coin (in-list denominations)])
                       (change-increment
                        (make-change denominations (- amount coin))
                        coin)))
     (argmin count-coins options)]))


In my benchmarking, the Racket version of Strategy 1 performed 10x better than the Racket version of Strategy 2 for large target amounts.


Clojure versions

Now, let's take a look at the Clojure versions of the same code.  We'll begin with Strategy 1.  In Clojure, hash maps are written using curly braces and we can insert commas anywhere we like to increase readability, so the change dictionaries look like {1 3, 5 1, 10 1}.  Keeping with Clojure's conventions, the input list of denominations is now a vector rather than a list.  Also, in Clojure we use nil as the output when there is no solution, rather than false. Overall, the code is almost the same (although this is not quite the final version):

(use 'clojure.test)

(defn count-coins [change]
  (if change
    (apply + (vals change))
    Double/POSITIVE_INFINITY))

(defn update [m k f default]
  (assoc m k (f (get m k default))))

(defn change-increment [change coin]
  (when change
    (update change coin inc 0)))

(defn make-change [denominations amount]
  (cond
   (neg? amount) nil
   (zero? amount) {}
   (empty? denominations) nil
   :else
   (let [option1 (change-increment
                  (make-change denominations (- amount (first denominations)))
                  (first denominations)),
         option2 (make-change (rest denominations) amount)]
     (min-key count-coins option1 option2))))

(def make-change (memoize make-change)) 

(deftest make-change-tests
  (are [x y] (= x y)
       (make-change [1 5 10 25] 18)  {1 3, 5 1, 10 1}
       (make-change [1 20 25] 80)    {20 4}
       (make-change [1 24 25] 98)    {24 2, 25 2}
       (make-change [2 10] 13)       nil))

Let's enumerate some of the differences:
  1. count-coins is essentially the same -- infinity has a longer name.
  2. Clojure does not have a built-in update function for hash-maps, so we have to roll our own.  Frankly, I find this to be one of the most glaring omissions in Clojure's core library.  Given the richness of Clojure's core, I find it baffling that Clojure is up to version 1.5, and this function still has not been added.
  3. In change-increment, we can leverage the fact that Clojure's when returns nil when the condition is false/nil, thus saving a line relative to Racket.  This is a common idiom in Clojure for writing "nil in means nil out" functions.
  4. No internal define in Clojure, so instead in the else clause, we use let (which is similar to Racket's let*).
  5. In Racket, argmin takes a function and a list.  In Clojure, the corresponding function is min-key which takes a function and a variable number of arguments. 
  6. memoize is built-in to Clojure.  The above technique of defining the function normally and then rebinding the function name to the memoized version is a common idiom, somewhat different than Racket's define/memo.
Not a whole lot of differences.  Even the unit test code looks similar.  However, there's a very big difference lurking beneath the surface: if you try make-change on a large target amount, you get a StackOverflow error.  This is a definite problem.

Racket uses a very clever technique to simulate an unlimited stack (or more to the point, limited only to the overall memory you've allocated to Racket, rather than some arbitrary stack limit).  My understanding is that Racket achieves this trick by catching the error thrown when the stack overflows, moving the stack to the heap, and continuing.  Even if those details aren't quite right, the main point here is that in Racket, you just don't spend any time worrying about stack overflows.  In Clojure, it's a very salient issue that you absolutely must contend with.

So how to deal with it?  One option is to switch the code over to memoization's bottom-up cousin, dynamic programming.  The idea is that you allocate an array large enough to store the computations of make-change for all possible values up to amount, and fill up this array in sequence using the values that have been computed before.  This will solve our stack overflow problem, but requires writing the code in a very different style.  It would be nice if we could solve the problem without changing much of our existing code.

Any other options?  How about we do a CPS transform on the entire program so that it doesn't consume any stack space, only heap?  Blech.

Fortunately, there is a quick trick that I like to call "priming the pump".  We just call the memoized function on all numbers up to the target amount, achieving a similar bottom-up effect as the dynamic programming approach in a way that allows us to build off our existing code.  For clarity, I've left the original function alone, and created a new prime-the-pump version called make-change-fast.  This new function, make-change-fast, is the one we would expose publicly; the original make-change function would become private.

Final Clojure version of Strategy 1:
;;Basic helper functions remain unchanged

(defn count-coins [change]
  (if change
    (apply + (vals change))
    Double/POSITIVE_INFINITY))

(defn update [m k f default]
  (assoc m k (f (get m k default))))

(defn change-increment [change coin]
  (when change
    (update change coin inc 0)))

;; This is now the private helper function, don't use this directly
(defn ^:dynamic make-change [denominations amount]
  (cond
   (neg? amount) nil
   (zero? amount) {}
   (empty? denominations) nil
   :else
   (let [option1 (change-increment
                  (make-change denominations (- amount (first denominations)))
                  (first denominations)),
         option2 (make-change (rest denominations) amount)]
     (min-key count-coins option1 option2))))

;; This is the new public function we use to make change. 
(defn make-change-fast [denominations amount]
  (binding [make-change (memoize make-change)]
    (last
     (for [i (range (inc amount))]
       (make-change denominations i)))))
When writing make-change-fast, I decided to demonstrate an alternative memoization idiom in Clojure.  Rather than rebinding make-change with (def make-change (memoize make-change)), we can declare make-change to be a dynamic var, and then in make-change-fast, we achieve memoization within a binding construct. Why do it this way?  Well, I actually prefer this idiom for this particular use case because it means that every make-change-fast creates its own memoized version of make-change with its own cache, and this cache can be garbage collected when the computation is complete.  This becomes important if you use make-change in a long running process with very different denomination lists.  It also ensures that if you run make-change in multiple threads, the caches will stay isolated from one another.  Of course, if you are always using the same denomination list, the other way would be better because you'd want to keep the cache around and share it across threads.

Strategy 2:

No real surprises here, and only a few lines differ from Strategy 1, but for completeness, here is the final Clojure version of Strategy 2:
(defn count-coins [change]
  (if change
    (apply + (vals change))
    Double/POSITIVE_INFINITY))

(defn update [m k f default]
  (assoc m k (f (get m k default))))

(defn change-increment [change coin]
  (when change
    (update change coin inc 0)))

(defn ^:dynamic make-change [denominations amount]
  (cond
   (neg? amount) nil
   (zero? amount) {}
   (empty? denominations) nil
   :else
   (let [options (for [coin denominations]
                   (change-increment
                    (make-change denominations (- amount coin))
                    coin))]
     (apply min-key count-coins options))))

(defn make-change-fast [denominations amount]
  (binding [make-change (memoize make-change)]
    (last
     (for [i (range (inc amount))]
       (make-change denominations i)))))


Interestingly, in Clojure, Strategy 2 is twice as fast Strategy 1 for large target amounts (whereas in Racket, Strategy 1 was substantially faster).


Final Thoughts

On my computer, both Clojure versions were faster than all the Racket versions I came up with (both the ones I displayed above, and other ones I tried, for example, employing the priming-the-pump strategy on Racket even though it is not needed).  Specifically, my slowest Clojure version was twice as fast as the fastest Racket version.

I find Racket's #hash notation to be awkward to work with relative to Clojure's simple {} syntax for hash tables.

I wrote the Racket versions first, and they were error-free the first time.  I wrote the Clojure versions second, and they took me longer to write, even though I spend more time programming Clojure than Racket, in part because there were two errors I needed to debug.  First, I made a mistake when writing the update helper function (I initially tried to use Clojure's related update-in function, not realizing that it doesn't allow for a default value when the key is not found).  Second, I always forget that min-key is a variable argument function, rather than a list function like in Racket -- to me it seems far more intuitive to have that kind of function behave on a list, since that is the most common use case.  In both cases, Clojure's error messages were spectacularly unhelpful, which sadly, is par for the course in Clojure.

The other reason the Clojure code took longer to write is that I needed to figure out how I wanted to deal with the stack overflow, and that ate up some time.  I can't stress enough how freeing it is to not have to worry about those sorts of issues on Racket.

Clojure gives you better control over memoization strategies and even more options are available in clojure.core.memoize.  Racket doesn't even have a built-in memoization library, and the one on planet is weak relative to Clojure.

Usually, when comparing Clojure to Racket, I find that Clojure has the richer set of built-ins: more versatile data structures and functions that operate over them.  For this particular example, however, most of the built-in functions I relied on had counterparts in both languages, so the code looks nearly identical between the two languages.  Oddly enough, this time it was Clojure that was missing something I wanted, namely the hash-map update function.

Looking at this as a Clojure vs Racket battle, there's not a clear winner on this example since both allowed me to compactly express the approaches I had in mind and there were some advantages and disadvantages on both sides.  I'd probably give Clojure the nod because Clojure gave me better speed and more control over the memoization caching behavior.  (Yes, I realize that if I drop down to C, I'd get even better speed and control over caching strategies, but that's not my point.  My point is that if you can get better speed and control for a similar level of effort, why not?)  Nevertheless, with no stack overflows and better error messages, writing the code in Racket was a noticeably more pleasant experience.

For those who have not tried Code Katas, I encourage you to do so.  Code Katas usually are simple enough that they exercise fundamental skills that every programmer should possess, but are just tricky enough that there are several viable solutions and it is therefore interesting to compare those solutions with one another and across languages.

For those who enjoyed this particular Kata, I would recommend reading Doctor Ecco's Cyberpuzzles by Dennis E. Sasha.  The first chapter deals with a fascinating, related problem of trying to find the optimal denomination list of a given list that minimizes the average number of coins needed across a range of possible target amounts.

Thursday, November 15, 2012

Clojure makes quines way too easy

A popular programming challenge is to, in your favorite programming language, write a Quine -- a program/function that prints its own source code.  Usually, writing a Quine is an incredibly mindbending effort.  However, yesterday, it was pointed out to me that in Clojure, this task is ridiculously simple:

(defn self-source
  "prints the source of itself"
  []
  (source self-source))

Where's the challenge in that?  :-)

Sunday, June 17, 2012

I Feel Sorry For Computer Science Departments


There's a popular meme that it takes ten years of effortful study to become an expert at something. I'm sure that in reality, the number of years varies a bit from subject to subject and from one individual to the next, but one thing is clear – expertise takes time.

Therein lies the problem. Most students who enter college and decide to take computer science have minimal, if any, prior exposure to computer science. Colleges have 4 years to try to instill some meaningful level of expertise in students, but that's simply not enough time. Compounding the problem, many students are hoping to go out and get internships after their first year. This leads to a series of unfortunate, yet inevitable compromises.

CS departments are forced to choose: do we focus on foundational skills and the big picture of what computer science is all about, or do we focus on technical training to try to produce graduates who have skills with immediate appeal to companies? Talk to any CS professor and you'll hear plenty of stories of bitter, divisive debates about this very issue within departments and across the entire community of computer science educators.

The very best schools are constantly reassessing this question and retooling their program. Matt Might published a great wishlist of topics that every computer science student should know. I also have a special admiration for Carnegie Mellon in this regard. Despite being consistently ranked as one of the very top universities for computer science in the country, they refused to rest on their laurels and recently did a complete overhaul of their introductory classes.

My own two cents on the topic is that if a choice has to be made, it is better to err on the side of teaching foundational subjects. I admire curricula such as Program by Design, which takes a bold stand, teaching introductory classes in a non-mainstream programming language (Racket, a dialect of Scheme). They do this because the language is a particularly good choice for teaching design and program construction at a deep level. Knowledge of this language isn't likely to be immediately useful for a summer internship, but colleges who use this curriculum report that down the road, their students come out much stronger and much more sought after by companies.

But the sad truth is that no matter how many times CS departments debate this issue and retool, there are no good answers. Four years is simply not enough time to become an expert in computer science. Colleges are stuck between a rock and a hard place and it's a bad situation all around. Companies are distinctly unimpressed and disappointed with the vast majority of graduates that colleges are producing. Ideally, companies want to hire someone with the exact skills for a given job (one can argue that this is a bad hiring strategy for long-term growth, but often, it's what makes the most short-term economic sense). However, there aren't enough of those to go around, so companies try to make the best of the situation by just trying to find the smartest students they can, figuring the smart ones can hopefully compensate for their lack of experience by picking things up quickly on the job. More often than not, the knowledge gained from a CS education is viewed by companies as being so insufficient as to be almost irrelevant – nevertheless, graduating from a well-known school can be seen as a kind of proxy for the kind of drive and innate smarts they really are looking for.

Flipping this around and looking at it from the perspective of students, many graduating students are finding out the hard way that they lack the real-world skills companies are seeking. If they can get that first job, sometimes they discover that they lack the background necessary to keep up with the tectonic shifts in the industry; when that first job goes away, it can be very tricky to make the transition to something new. If you don't have the exact skills companies are looking for, and you're no longer in that bucket of entry-level, fresh-out-of-school applicants that companies might be willing to take a chance on, hunting for that second job can be especially tough.

How do other departments solve this problem? Well, many domains are able to leverage the significant number of years that students have already invested in grade school in English, math, and science. For example, most students who go into mechanical engineering have already had the opportunity to learn math up through calculus and have learned physics as well. Imagine how many years it would take to become a mechanical engineer with absolutely no prior math or science instruction, and you'll begin to appreciate the problem that CS departments face. Also, many other disciplines require significant post-graduate study and apprenticeships in a way that computer science does not. Arguably, computer science has one of the greatest disparities between the demand for expertise, and the level of expertise that is actually attained before one goes into the business.

But wait a second... computer science is an engineering discipline. Shouldn't computer science benefit from kids' math and science education as much as any other science/tech subject? Unfortunately, no. Calculus, the pinnacle of grade school math education as it is currently structured, is the least relevant type of math for computer scientists. Computer scientists need a strong background in Discrete Math and these topics are poorly covered in grade school, if at all.

To further illustrate the point, there is not a single programming class offered in the elementary and middle schools near my home. At the closest high school, most of the tech ed classes are about how to use Microsoft Office and Powerpoint to write reports; programming offerings are fairly lightweight. Keep in mind that I live in a part of the country that is fairly rich with tech companies, less than 30 miles from Microsoft, Amazon, Facebook, Google, Nintendo, and Boeing. Despite my complaints about the meager offerings in my school district, there's no doubt in my mind that most places probably have it much worse in terms of providing kids with early exposure to programming.

So for the most part, computer science curricula start from scratch. However, Program by Design, the intro CS curriculum I mentioned earlier, stands out from the pack. Unlike most other approaches to teaching CS, they intentionally try to leverage students' existing math knowledge by portraying programming as a kind of executable algebra. This is a clever strategy for trying to maximize how far students can get towards expertise in just four years of college.

Once the problem has been laid bare like this – four years provides insufficient preparation for a career in computer science – it is obvious that there are only a couple long-term solutions. One possibility is to extend the duration of CS education, another possibility is to incorporate more CS topics and exposure to programming into the grade school curriculum.

I think a strong case can be made that our society would benefit from more CS in grade school, so that's the direction I would be inclined to go. Programming is rapidly becoming a foundational skill that has value across a wide range of disciplines. An understanding of data, functions, and algorithms can play an important role, right alongside mathematics and the scientific method, for developing problem-solving skills that are essential in our modern world.

Another helpful change would be to incorporate more discrete math into the grade school curriculum. A stronger background in discrete math would make it much more feasible for computer science students to make rapid progress in just four years of college. Why should we make a change in the math curricula that just benefits future computer science students? Well, the short answer is that it wouldn't just benefit future computer science students. Arthur Benjamin makes the case in his TED talk that discrete math (logic, statistics, etc.) is far more relevant to most walks of life than, say, calculus.

So in the abstract sense, it's relatively clear what needs to change in order to solve the problem. But we all know that implementing such a solution may well be intractable. Even if grade schools were motivated to incorporate more discrete math and programming into their classes, how do you go about finding and recruiting qualified teachers?

Therefore, this is likely to remain a problem for a long time to come. In the meantime, I feel sorry for college CS departments, I feel sorry for CS students, I feel sorry for the people who would love programming but never get exposed to it, and I feel a sense of loss that there's so much more our society could achieve if we could narrow the gap between supply and demand in computer science expertise. My hat's off to everyone who works hard at making the best out of this bad situation.