RSS
 

Archive for February, 2009

Back from FOWA Miami

26 Feb

Several of us Grooveshark developers went down to FOWA Miami this year (and Barcamp Miami too!), which was a fun and educational experience. We got to see my personal hero Joel Spolsky, whom Katy and I briefly met after his talk, and we got to learn about some cool new and upcoming technologies.

There were also, of course, some great pre and post conference parties, so we got to meet lots of cool people and network.

Aside from meeting Joel Spolsky, my favorite experience of the entire trip was being accosted by fans of Grooveshark several times over the course of the weekend. It’s really incredibly gratifying to know that Grooveshark has fans, and they’re real people!

Over the next week or so, if I’m not too lazy, I will be posting my thoughts on the FOWA talks, and about some of the conversations I had while I was there, so stay tuned.

 
 

Pretty soon they’ll have to let gays marry

19 Feb

I’m taking a minor diversion from the normal theme of my posts, and my implicit policy of leaving controversial topics for my personal blog, because what I’m about to talk about should not be controversial, but I’m sure it will be.

One of the sillier arguments I have heard against allowing gay marriage is that it is a slippery slope, ultimately leading us to be forced to accept, among other things, marriage to animals.

Well, recently in eastern India, a boy was married to a dog in order to “ward off tigers.” Does the hypothetical slippery slope work both ways? Will the members of his tribe soon be accepting gay marriages?

 
No Comments

Posted in life

 

The Art of Elegant Code: Eliminating special cases that aren’t

18 Feb

One of my personal pet peeves is code that contains a bunch of conditional logic to handle seemingly special cases that really aren’t.

For example, let’s say we have a page that shows a user their playlists if they are logged in, but this page also has other useful information on it, and users aren’t required to log in. In our original design for our authentication class, if authentication “failed” (i.e. the user was not logged in), a call to Auth::getUserID() would return an empty string.

With that implementation, every piece that might have something to display to a logged in user has to have a bunch of conditional logic checking to see if the user is logged in, doing one thing i they are and another if they are not, adding unnecessary complexity and, of course, more potential for bugs to crop up.

This type of logic is completely unnecessary. I changed Auth to return a userID of 0 if the user is not logged in, and now it is not necessary to have any special handling for that case. If the user is logged in, they get playlists. If the user is not logged in, userID 0 does not have any playlists so they do not get playlists. I estimate that this simple change made over 100 lines of code obsolete. Not a whole lot of code in the grand scheme of things, but how many bugs can hide in 100 lines of code?

Another example of the same principle involves exceptions (which, as regular readers of Raymond Chen know, are hard to deal with). Under certain circumstances, recommendations from an external source can be included with our normal set of recommendations. The author of the recommendation-supplementing code originally had it throwing exceptions whenever it had problems. Of course, the case of not having supplemental recommendations because of an error is not really a special case. As far as my code is concerned, you just don’t have supplemental recommendations. In this case modifying the original code eliminates unnecessary handling for a special case and eliminates the potential for an uncaught exception to slip through.

A general rule of thumb to help avoid unnecessary special cases, at least with PHP, is to always return what you say you are going to return (and don’t throw exceptions). If your method processes some data which results in an array, return an array even if the processing has no results. This is the way most native methods in PHP already work, if you think about it. count(array()) doesn’t return an empty string or null or raise an exception, nor does count(null). Really, neither of these are special cases. In either case, the number of elements is zero, and developers are not required to care about the differences.

 
 

Lisa Hannigan

11 Feb

Sometimes you find out about things happening at your own company in the strangest ways. I discovered Lisa Hannigan through a friend, who discovered Lisa Hannigan because we are promoting her music. I had no idea. It’s actually really good, which is why I’m posting it here.

 
 

Optimize for concurrency, not throughput

06 Feb

When it comes to disks, raid and otherwise, there’s a lot of confusion about how to optimize for different server configurations, and unfortunately much more emphasis seems to be placed on sustained throughput, with little mention going to IOPS or concurrency.

Here are grooveshark we use Sun’s X4500, aka “Thumper” for housing your MP3s. Lately we’ve been hitting some buffering issues during peak hours, where it seemed like the 4500 was not able to keep up with the requests coming in. Although we are growing rapidly, we should still be nowhere near saturating the IOPS of 48 drives, but iostat -Cx was showing that the drives dedicated to serving up content had transactions waiting 100% of the time, and the disks were busy 100% of the time. Service time was in the triple digits. Insanity. Something was obviously misconfigured.

We were using 3 zpools for our content, each configured in raidz with about 8 drives (if I remember correctly). Ok, so we actually only have the IOPS of 24 drives in that configuration, but we still should not be anywhere near saturating that. After a fair amount of digging, I discovered that raidz is probably the worst configuration possible for serving up thousands of 3-5mb files concurrently. That’s because raidz causes every drive to engage in every read request, no matter how small the file. That means your theoretical transfer rate for any given file is N times the transfer rate of each drive, but your IOPS is the same as one drive, no matter how many drives you have in your pool. In other words, adding more drives to a raidz increases disk transfer bandwidth, but does nothing to alleviate the overhead associated with random access seeks. This configuration is ideal if you are transferring multi-gigabyte files frequently, but at 3.5-5MB, seek time makes up almost all of the overhead, and it would be much better to have N drives all seeking to N files simultaneously.

I brought up these concerns to our sysadmins, and they set up a small mirrored pool to handle our most popular content, and the performance difference is quite astounding even with just two disks in that pool. In a mirrored configuration like this, each drive can respond to requests for data separately because they each have the full set of data. Adding 2 more drives to the 24 drives already in production nearly doubled our IOPS capacity, because we went from the IOPS of 3 drives to the IOPS of 5, nearly doubling our capacity with only two hard drives. Our system admins will be adding more disks to the mirror to give us even more breathing room. We’ll have to wait and see tomorrow if the added IOPS capacity eliminates the buffering issues that users have been running into lately, but I bet it will.

 
 

Serpentine by Chris Bathgate

03 Feb