2008-09-02 14:20 in /tech/erlang
During my time off between jobs, I decided to spend some time doing fun hacking. For the last couple days, that’s meant doing a little benchmarking with Erlang. Recently, there was a challenge posted by Cedric Beust. In the wrap-up he lamented that the one Erlang solution posted was “a bit frightening” and neither concise nor fast. Part of this conclusion, I think, was due to the poor formatting of blog comments, and part because the comment submitter included a whole lot of test code and also reimplemented some library functions for no obvious reason. I modified it to use the standard libraries where appropriate, and made a couple other minor optimizations and got about a 3x speed-up. It’s still much, much slower than the top submission, which is because the top submission is clever, and this is still a brute-force implementation.
I then set out to port the “crazybob” solution to Erlang to see how much better we can do. The Java version uses a hand-rolled linked list to keep track of the available digits. I decided to use a gb_set for my version. For small N it’s a bit slower than the brute-force version, but the scaling is much better. For N=10^10, it runs in about 26s on my laptop. This is still something like 50-100 times slower than the Java version, but I’m also not using HiPE, which might close the gap a bit.
I suspect this is not the fastest possible Erlang implementation, because the heavy use of gb_sets is undoubtedly causing a lot of memory churn. It’s been pointed out that this is basically a problem of generating permutations, which ought to be possible to do with near constant memory usage.
Here are the results of my benchmarking, and the code. For each data point, I ran the code 3 times and took the middle value. For the brute force versions the timings were very consistent, while the “fast” version had 10-20% variation between the run, presumably due to vagaries of memory allocation. I did not actually run the brute force algorithms for 10^10; the values are generated by extrapolating linearly.
Update: Hynek (Pichi) Vychodil wrote a faster version, bringing us within a factor of 10 or so of Java (and still not using HiPE). Oddly, there’s a bigger speedup for this version on my machine (26s vs 5s) than on Hynek’s (18s vs 8s).
See important updates in the following post
log(Max) Original stdlib crazybob pichi 4 8ms 5ms 48ms 3ms 5 65ms 36ms 85ms 14ms 6 632ms 295ms 597ms 62ms 7 6.7s 2.6s 3.0s 272ms 8 72s 25s 8.8s 945ms 9 18m 6m 16s 2.8s 10 (3h) (1h) 26s 5.3s