-
More Erlang Beust Challenge Results (incl. HiPE)
2008-09-04 17:21 in /tech/erlang
I’ve been tinkering a little more with the Beust Challenge, following up on my previous post. There are a couple significant new developments to report.
First, Hynek (Pichi) Vychodil wrote a faster version using a permutation generator.
Second, I also ported the first “CrazyBob” solution using a bit mask to Erlang.
Third, I discovered that my overly literal ports were actually slowing things down. The CrazyBob code uses an unspecified Listener class that receives the numbers in the series, and presumably computes the actual results from there. (Aside, I cannot actually benchmark against the Java code because I haven’t found what this implementation is.) I fudged this in my original ports by simply spawning a process, which then discarded all the messages it received. After I noticed that the code was pegging both of my CPUs, though, I realized that message passing might actually be the bottleneck in my code. Turns out this was the case, and removing the listener process and just computing the results in the main process actually sped things up substantially.
Finally, I got access to a machine with HiPE enabled.
So... here’s the results. First on my MacBook Pro, without HiPE:
log(Max) Original bitmask crazybob pichi 4 8ms 2ms 3ms 3ms 5 65ms 11ms 13ms 14ms 6 632ms 52ms 69ms 62ms 7 6.7s 253ms 303ms 272ms 8 72s 1.0s 1.0s 945ms 9 18m 4.7s 3.6s 2.8s 10 (3h) 13s 7.8s 5.3s The bitmask solution starts out the fastest, but loses out in the end to the more clever solutions. pichi edges out the crazybob solution by about a third.
Now on Linux 2.6 with HiPE:
log(Max) Original bitmask crazybob pichi 4 4ms <1ms 1ms 2ms 5 50ms 1ms 6ms 7ms 6 608ms 7ms 34ms 37ms 7 6.9s 35ms 160ms 174ms 8 78s 147ms 619ms 563ms 9 (18m) 460ms 1.8s 1.4s 10 (3h) 1.1s 4.2s 2.4s And our new winner is... bitmask! HiPE barely helps the original brute-force solutions at all, while crazybob and pichi gain about a factor of two. bitmask, on the other hand, picks up an order of magnitude, and is now only a factor of 3 slower than the posted results for the Java crazybob solution (with unknown differences in hardware).
Conclusion: Erlang doesn’t suck!
-
Erlang & the Beust Challenge
2008-09-02 14:20 in /tech/erlang
During my time off between jobs, I decided to spend some time doing fun hacking. For the last couple days, that’s meant doing a little benchmarking with Erlang. Recently, there was a challenge posted by Cedric Beust. In the wrap-up he lamented that the one Erlang solution posted was “a bit frightening” and neither concise nor fast. Part of this conclusion, I think, was due to the poor formatting of blog comments, and part because the comment submitter included a whole lot of test code and also reimplemented some library functions for no obvious reason. I modified it to use the standard libraries where appropriate, and made a couple other minor optimizations and got about a 3x speed-up. It’s still much, much slower than the top submission, which is because the top submission is clever, and this is still a brute-force implementation.
I then set out to port the “crazybob” solution to Erlang to see how much better we can do. The Java version uses a hand-rolled linked list to keep track of the available digits. I decided to use a gb_set for my version. For small N it’s a bit slower than the brute-force version, but the scaling is much better. For N=10^10, it runs in about 26s on my laptop. This is still something like 50-100 times slower than the Java version, but I’m also not using HiPE, which might close the gap a bit.
I suspect this is not the fastest possible Erlang implementation, because the heavy use of gb_sets is undoubtedly causing a lot of memory churn. It’s been pointed out that this is basically a problem of generating permutations, which ought to be possible to do with near constant memory usage.
Here are the results of my benchmarking, and the code. For each data point, I ran the code 3 times and took the middle value. For the brute force versions the timings were very consistent, while the “fast” version had 10-20% variation between the run, presumably due to vagaries of memory allocation. I did not actually run the brute force algorithms for 10^10; the values are generated by extrapolating linearly.
Update: Hynek (Pichi) Vychodil wrote a faster version, bringing us within a factor of 10 or so of Java (and still not using HiPE). Oddly, there’s a bigger speedup for this version on my machine (26s vs 5s) than on Hynek’s (18s vs 8s).
See important updates in the following post
log(Max) Original stdlib crazybob pichi 4 8ms 5ms 48ms 3ms 5 65ms 36ms 85ms 14ms 6 632ms 295ms 597ms 62ms 7 6.7s 2.6s 3.0s 272ms 8 72s 25s 8.8s 945ms 9 18m 6m 16s 2.8s 10 (3h) (1h) 26s 5.3s -
Talking at PDXfunc
2008-02-12 11:50 in /tech/erlang
Last night I talked at PDXfunc a bit about Erlang. The slides were pretty minimal, so I don’t think I’m going to post them publicly at this point. We spent more time looking at code and demonstrating features that way. I showed a very similar IRC bot as in the talk I gave to PDX.pm on POE and Erlang, although I had actually made a change that introduced a small bug (and failed to test the code before my live demo). But, this worked out to be a positive because I got to demonstrate hot code updates and fixed the bug while keeping the bot up and running. I also showed some code I’ve been working on in the last couple days, once again writing a web spider, this time in Erlang. Unfortunately, I found some problems in
xmerl
that required heinous workarounds, and I still had some bugs by the time of the talk, but I got to demonstrate the important bit, at least, which was that by replacing a singlemap
withpmap
, I could make my sequential program concurrent, proven by the fact that it hit one of the remaining bugs and crashed much faster :-).At some point soon, I’ll write more about the spider code and my experiences with
xmerl
, but I want to work out the remaining bugs first. -
Forcing a Code Update in a Running Erlang Process
2007-11-23 23:10 in /tech/erlang
A couple days ago I wrote my first IRC bot, to echo updates from an RSS feed into an IRC channel. It seemed like a fun thing to do in Erlang, and the bot ended up being basically a splice of Jonathon Roes’ bot skeletor with Sam Ruby’s post about parsing Atom.
While I was writing the bot I put in lots of debugging statements, but once it was running nicely, they were a bit tiresome. Of course, I could have simply stopped the bot, removed the statements, and restarted it; but this is Erlang and hot code updates are one of the heavily advertised features. The relevant docs explain how to load new code versions which will be picked up by new processes, but don’t really address how to update an already running process. I asked on #erlang and got the pointer to add a message to the main server loop for doing a code upgrade. This calls
?MODULE:loop
rather than simplyloop
, which turns it into an “external function call” and allows the new code version to be used. So, now my main loop looks like:loop(Sock, Seen) -> receive code_upgrade -> ?MODULE:loop(Sock, Seen); {tcp, Sock, Data} -> io:format("[~w] Received: ~s", [Sock, Data]), parse_line(Sock, string:tokens(Data, ": ")), loop(Sock, Seen); ...
And I can update the running code by just doing:
> c("bot.erl"). > Bot ! code_upgrade.
(Aside: I’m a little curious about how hot code loading and tail recursion work together. When I call
?MODULE:loop
after an upgrade, it seems like the normal tail-call optimization shouldn’t work. I suspect there’s something clever going on to take care of this, though.) -
First Impressions of Erlang
2007-04-10 14:20 in /tech/erlang
Just to see if I can make my head explode, I’m also learning a bit of Erlang at the moment. In fact, I just finished writing my first Erlang program. (Or, at least, the first program I started. I did write a Hello World in the middle, just to make sure I wasn’t totally confused about some issues with the compiler and run time.) It’s a simple URL checker, that checks the status of any url you send it. My predominant positive first impression is that it’s pretty cool that you can write something like this so easily upon just learning the language, and that the finished product is short, simple, and readable. Here it is:
-module (urlchecker). -export ([main_loop/2, fetch/2]). -import (ibrowse). main_loop (Seen, Pids) -> receive {url, Url} -> case sets:is_element(Url, Seen) of true -> io:fwrite ("~s seen previously.~n", [Url]), main_loop(Seen, Pids); false -> Pid = spawn(urlchecker, fetch, [self(), Url]), main_loop(sets:add_element(Url, Seen), [Pid|Pids]) end; {reply, Pid, Url, RC} -> io:fwrite ("~s : ~s~n", [Url, RC]), main_loop(Seen, lists:delete(Pid, Pids)); done -> case Pids of [] -> io:fwrite ("Exiting~n", []); _ -> io:fwrite ("Waiting for children to finish~n", []), timer:send_after (1000, done), main_loop(Seen, Pids) end end. fetch(Pid, Url) -> case ibrowse:send_req(Url, [], get, [], [], 30000) of {ok, Status, _, _} -> Pid ! {reply, self(), Url, Status}; {error, Reason} -> io:fwrite ("Error fetching ~s : ~w~n", [Url, Reason]) end.
That’s a total of 36 lines, including copious debugging output, graceful shutdown, and rudimentary politeness. And, of course, concurrency. So, this is a pretty high-level language.
(Looking over the code, you might wonder why I use sets for the seen URLs and a list for the pids. The answer is, I wrote the pid handling first and figured that the list would probably stay smallish, so it didn’t need to be particularly efficient. Only when I added politeness did I go to look for an efficient data structure. Of course, for real use, you’d want something like a timeout cache, not a set. I also suspect the child pid tracking might be better done with process groups.)
So, that’s the good. The bad is that the documentation is fairly minimal, and the error messages make Haskell look good. Some examples I encountered:
{"init terminating in do_boot",{undef,[{set,new,[]},{urlchecker,main,0},{init,start_it,1},{init,start_em,1}]}} — That’s a fancy way of saying “you made a typo!”, i.e., the module is ‘sets’, not ‘set’.
Error in process <0.75.0> with exit value: {undef,[{urlchecker,fetch,[<0.73.0>,"http://kevin.scaldeferri.com/adsf"]}]} — That means, “you spawned a process using a function you didn’t export”. I find this particularly annoying because I don’t want to export “fetch”, since it’s an implementation detail.
** exited: {{badmatch,<0.123.0>},[{erl_eval,expr,3}]} ** — That means, “you tried to redefine a variable you already assigned to”. I ran into this in the normal cycle of making a change in the editor, then reloading it in the REPL and running some test code. It took me a while to realize that the problem wasn’t the code in my module, it was that I was reusing the same variable in the REPL.
Finally, when I downloaded ibrowse, I just got a directory of files. There was no configure script, no install script, no install directions, and no top-level Makefile. I found a Makefile in the src directory, and used it to compile the code, but it didn’t have an install target. It turns out that you can just copy the src and ebin directories into /usr/local/lib/erlang/lib by hand, but I had to go to #erlang to find that out.
I’m hoping that I’m past the nasty part of the learning curve now. We’ll see. I do really like the conciseness of the language, and how quickly you can prototype something moderately sophisticated. And, in theory, I like the idea that you can trivially distribute something like this across multiple nodes, although I haven’t gotten to seeing how that works in practice. I just wish it was a little more helpful when you mess up.