Grooveshark Playlists now in MongoDB

06 Mar

As of about 5:30am last night (this morning?) Grooveshark is now using MongoDB to house playlist information.

Until now playlists have lived in MySQL, but there were some big problems that occasionally lead to data loss due (mostly) to deadlocks. Needless to say, users don’t like it when you lose their data. Moving to Mongo should resolve all of these issues.

Grooveshark has been using MongoDB for sessions and feed data for a while now, so we are comfortable with the technology and know that it is capable of handling massive amounts of traffic. while it’s certainly not perfect, we are confident that it will be easy to scale out to maintain reliability as our user base continues to grow rapidly.


Grooveshark IE bug

29 Jan

I hate the idea that this blog might be turning into nothing but a journal of all the things at Grooveshark that have ever broken, but some of the most interesting challenges we face are when things go terribly awry, so I’m not going to avoid talking about it just because it involves something breaking at Grooveshark, again.

What happened was, out of the blue IE8 could no longer run the site. Users were getting a message about making sure they did not have flash block enabled, which means the swf was failing in some way. We determined that the swf was in fact loading, so why was it lying to us? There is one file that swfs need in order to talk to other domains: crossdomain.xml. If that file fails to load, the swf isn’t going to work. I suspected that was happening in this case, so I loaded up and IE complained that it wasn’t valid XML. View source showed me that IE was right. It was in fact what looked like binary garbage. Loading the same file in Firefox and Chrome worked perfectly fine, but IE8 on 4 different computers all showed the invalid XML.

Some months ago, we switched from serving pages up directly from Apache, to running Nginx in front of Apache as a reverse proxy with caching. The difference that made on our front end servers in terms of memory usage and CPU load is phenomenal. Although Nginx serves 30% of requests from cache now, the drop in server load was much more than 30%. Nginx is truly a wonderful addition to our http stack…but as you’ve guessed by now, it played a key role in the latest breakage at Grooveshark.

Force clearing the cache in IE8 and in nginx would sometimes fix the file, but not always. I then turned to wget and found the same thing: whenever the file was broken for IE8, it was identical in wget. wget was showing the exact same file size that Firebug was showing, which was the biggest clue: Firefox received the file gzipped because it supports deflate, but wget also received the file gzipped even though it doesn’t support deflate. My theory, which proved correct, was that IE8 was for some reason asking for the non-gzipped version, but receiving the gzipped version and barfing.

Why would that happen? Well, remember that we are using nginx as a reverse proxy cache. It turns out that we just recently added some auto-gzipping for certain file types to apache. What was happening was, nginx would get a request for a file not in its cache, and forward along the request (with all headers intact) to Apache. If this request came from a client that supports deflate, Apache would respond with a gzipped file. Nginx would store that gzipped file in its cache, and the next request that came in asking for that file, with or without deflate support, would get the gzipped version served up.

The fix was relatively simple: add a variable in nginx conf tracking whether or not the current client supports deflate. Append the value of that variable to the proxy key, meaning that gzipped and non-gzipped versions of the files will be cached separately, and served appropriately depending on what the client supports.

What’s not clear to me at this time is why IE8 would refuse to accept gzipped content for that file, and whether that applies to all .xml files in IE8…but at least it helped us catch what would have otherwise been an extremely obscure issue!


Controlling Arduino via Serial USB + PHP

28 Dec

Paloma bought me an Arduino for Christmas, and I’ve been having lots of fun with it, first following some tutorials to get familiar with the platform but of course the real excitement for me is being able to control it from my computer, so I’ve started playing around with serial communication over the USB port.

I messed around with Processing a bit, and while Processing seems pretty cool and pretty powerful for drawing and making a quick interface, I want to be able to get up and running fast communicating with all of our existing systems, so I thought PHP would be ideal.

I’ve never used PHP to communicate over a serial port before, so I did some digging and actually came across an example of using PHP to communicate with an Arduino, but it wasn’t working properly for me. My LED was blinking twice every time, and if I changed it to a value comparison (i.e. sending 1 in PHP and checking for 1 on Arduino) it was never matching. Turns out for some reason by default the com port wasn’t running in the right mode from PHP. The solution is a simple additional line:

exec("mode com3: BAUD=9600 PARITY=N data=8 stop=1 xon=off");

…but make sure you adjust to match the settings appropriate for your computer.

Of course I had to write my own little demo once I got it working. This PHP script takes command line arguments and passes them along to the arduino, which looks for ones and zeroes to turn the LEDs off or on. Unlike the example I linked to, I’m working with a shift register to drive 8 LEDs, but you’ll get the idea, and it should be obvious how to convert it to work with a single LED.

ini_set('display_errors', 'On');
exec("mode com3: BAUD=9600 PARITY=N data=8 stop=1 xon=off");
$fp = fopen("COM3", "w");
 foreach ($argv as $i => $arg) {
    if ($i > 0) {//skip the first arg since it's the name of the file
        print "writing " . $arg . "\n";
        $arg = chr($arg);//fwrite takes a string, so convert
        fwrite($fp, $arg);
print "closing\n";

And for the Arduino:

//Pin connected to latch pin (ST_CP) of 74HC595
const int latchPin = 4;
const int clockPin = 3;
const int dataPin = 2;
void setup() {
    pinMode(latchPin, OUTPUT);
    pinMode(dataPin, OUTPUT);  
    pinMode(clockPin, OUTPUT);
    //reset LEDs to all off
    digitalWrite(latchPin, LOW);
    shiftOut(dataPin, clockPin, MSBFIRST, B00000000);
    digitalWrite(latchPin, HIGH);

void loop() {
    byte val = B11111111;
    if (Serial.available() > 0) {
        byte x =;
        if (x == 1) {
            val = B11111111;
        } else {
            val = B00000000;

        digitalWrite(latchPin, LOW);
        shiftOut(dataPin, clockPin, MSBFIRST, val);
        digitalWrite(latchPin, HIGH);

For bonus debugging points, if you’re using a shift register like this, send x instead of val to shiftOut, and you can actually see which bytes are being sent, very handy if you’re not getting what you are expecting (like I was)!

1 Comment

Posted in Coding, fun


Old sphinx bug

21 Nov

I’m posting this mostly as a note to myself.

When trying to build the pecl sphinx extension, ran into problems because I could not build libsphinxclient.

It looks like the bug causing that problem has been around for at least a year, but at least the fix is simple:
change line 280: void sock_close ( int sock ); to static void sock_close ( int sock );

Note: this applies to sphinx 0.9.9, the last “stable” release at the time of this writing.


Google AJAX Support: Awesome but Disappointing

18 Oct

Google has added support for crawling AJAX URLs. This is great news for us and any other site that makes heavy use of AJAX or is more of a web app than a collection of individual pages.

We have long worked around the issue of AJAX URLs not being crawl-able by having two versions of our URLs, with and without the hash. Users who are actually using the site will obviously get AJAX URLs like, but if a crawler goes to they will get content for that page as well, while real users will be automatically redirected to the proper URL with the hash. Crawlers aren’t smart enough to go there on their own of course, but we provide a sitemap, and all links we present to crawlers are absent of the hash. Likewise, when users post links to Facebook via the app, we automatically give them the URL without the hash so they can put up a pretty preview of the link in the user’s news feed. The problem is, users also like to share by copying URLs from the URL bar. If users post those links anywhere, crawlers don’t know how to crawl them, so they either don’t or they just count it as a link to which isn’t great for us and is lousy for users too.

Google’s solution is for sites like ours to switch from using # to using #! and then opting in to having those URLs crawled. The crawler will take everything after the #! and convert the “pretty” URL into an ugly one. For example, /#!/user/jay/42 presumably becomes something like /?_escaped_fragment_=%2Fuser%2Fjay%2F42 when the crawler sends the request to us.

This is annoying and frustrating for several reasons:

  1. All our URLs have to change
    We have to change all URLs to have a #! instead of just a #. This not only requires developer effort but makes our URLs slightly uglier.
  2. All links that users have shared in the past will continue to be non-crawlable forever.
  3. We now have to support 3 URL formats instead of 2
  4. One of those URL formats no human will ever see; we are building a feature solely for the benefit of a robot.

Again, we greatly appreciate that Google is making an effort to crawl AJAX URLs, it’s a huge step forward. It’s just not as elegant as it could be. It seems like we could accomplish the same goals more simply by:

  1. Requiring opt-in just like the current system
  2. Using robots.txt to dictate which AJAX URLs should not be crawled
  3. Allowing webmasters to specify what the # should be replaced with
    In our case it would be replaced by nothing, just stripped out. For less sophisticated URL schemes that just use #x=y, webmasters could specify that the # should be replaced by a ?

That solution would have the same benefits of the current one, with the additional benefits of allowing all crawling permissions to be specified in the same place (robots.txt), automatically making links already in the wild crawl-able, without requiring the addition of support for yet another URL format.


Improve Code by Removing It

17 Oct

I’ve started going through O’Reilly’s 97 Things Every Programmer Should Know, and I plan to post the best ones, the ones I think we do a great job of following at Grooveshark and the ones I wish we did better, here at random intervals.

The first one is Improve Code by Removing It.

It should come as no surprise that the fastest code is code that never has to execute. Put another way, you can usually only go faster by doing less. And of course, code that never runs also exhibits no bugs. :)

Since I started at Groovshark, I’ve deleted a lot of code. 20% more code than I’ve ever written. More code than all but one of our developers has contributed. Despite that, I think we’ve only actually ever removed one feature, and we’ve added many more.

One of the things I see in the old code is an over enthusiastic attempt to make everything extremely flexible and adaptive. The original authors obviously made a noble effort to try to imagine every possible future scenario that the code might some day need to handle and then come up with an abstraction that could handle all of those cases. The problem is, those scenarios almost never come up. Instead, different features are requested which do not fit with the goals of the original abstraction at all, so then you end up having to work around it in weird ways that make the code more difficult to understand and less efficient.

Let me try to provide a more concrete example so you can see what I’m talking about. We have an Auth class that really only needs to handle 3 things:

  • Let users log in (something like Auth->login(username, password)
  • Let users log out (something like Auth->logout()
  • Find out who the logged in user is, if a user is logged in. (Auth->getUser())

Should be extremely straightforward, right? Well, the original author decided that the class should allow for multiple authentication strategies over various protocols in some scenarios that could never possibly arise (such as not having access to sessions) even though at the time only one was needed. Instead of ~100 lines of code to just get the job done and nothing else, we ended up with 1,176 lines spanning 5 files. The vast majority of that code was useless; our libraries are behind other front-facing code so the protocol and “strategy” for authenticating is handled at one level higher up, and we always use sessions so that no matter how a user logged in, they are logged in to all Grooveshark properties. When we finally did add support for a truly new way to log in (via Facebook Connect), none of that code was useful at all because Facebook Connect works in a way the author could never have anticipated 2 years ago. Finally, because the original author anticipated a scenario that cannot possibly arise (that we might know the user’s username but not their user ID), fetching the logged-in User’s information from the database required a less efficient lookup by username rather than a primary key lookup by ID.

Let’s step back a moment and pretend that the author had in fact been able to anticipate how we were going to incorporate Facebook Connect and made the class just flexible enough and just abstract enough in just the right ways to accommodate that feature that we just now got around to implementing. What would have been the benefit? Well, most of the effort of implementing that feature is handling all of the Facebook specific things, so that part would still need to be written. I’d say at best it could have saved me from having to write about 10 lines of code. In the meantime, we would have still been carrying around all of that extra code for no reason for a whole two years before it was finally needed!

Let’s apply YAGNI whenever possible, and pay the cost of adding features when they actually need to be added.

Closely related: Beauty is in Simplicity.


PHP Autoload: Put the error where the error is

13 Oct

At Grooveshark we use PHP’s __autoload function to handle automatically loading files for us when we instantiate objects. In the docs they have an example autoload method:

function __autoload($class_name) {
    require_once $class_name . '.php';

Until recently Grooveshark’s autoload worked this way too, but there are two issues with this code:

  1. It throws a fatal error inside the method.

    If you make a typo when instantiating an object and ask it to load something that doesn’t exist, your error logs will say something like this:require_once() [function.require]: Failed opening required 'FakeClass.php' (include_path='.') in conf.php on line 83 which gives you no clues at all about what part of your code is trying to create FakeClass.php. Not remotely helpful.

  2. It’s not as efficient as it could be.

    include_once and require_once have slightly more overhead than include and require because they have to do an extra check to make sure the file hasn’t already been included. It’s not much extra overhead, but it’s completely unnecessary because __autoload will only trigger if a class hasn’t already been defined. If you’re inside your __autload function, you already haven’t included the file before, or the class would already be defined.

The better way:
function __autoload($class_name) {
    include $class_name . '.php';

Now if you make a typo, your logs will look like this:
PHP Warning: include() [function.include]: Failed opening ‘FakeClass.php’ for inclusion (include_path=’.') in conf.php on line 83
PHP Fatal error: Class ‘FakeClass’ not found in stupidErrorPage.php on line 24

Isn’t that better?

Edit 2010-10-19: Hot Bananas! The docs have already been updated. (at least in svn)


Why You Should Always Wrap Your Package

30 Jul

Ok, the title is a bit of a stretch, but it’s a good one isn’t it?

What I really want to talk about is an example of why it’s a good idea to make wrappers for PHP extensions instead of just using them directly.

When Grooveshark started using memcached ever-so-long-ago, with the memcache pecl extension, we decided to create a GMemcache class which extends memcache. Our main reason for doing this was to add some convenience (like having the constructor register all the servers) and to add some features that the extension was missing (like key prefixes). We recently decided that it’s time to move from the stagnant memcache extension to the pecl memcached extension, which is based on libmemcached, which supports many nifty features we’ve been longing for, such as:

  • Binary protocol
  • Timeouts in milliseconds, not seconds
  • getByKey
  • CAS
  • Efficient consistent hashing
  • Buffered writes
  • Asyncronous I/O

Normally such a transition would be a nightmare. Our codebase talks to memcached in a million different places. But since we’ve been using a wrapper from day 1, I was able to make a new version of GMemcache with the same interface as the old one, that extends memcached. It handles all the minor differences between how the two work, so all the thousands of other lines in the app that talk to memcached do not have to change. That made the conversion a <1 day project, when it probably would have otherwise been a month long project. It also has the advantage that if we decide for some reason to go back to using pecl memcache, we only have to revert one file.


Grooveshark Keyboard Shortcuts for Linux

21 Jun

Attention Linux users: Grooveshark friend Intars Students has created a keyboard shortcut helper for Linux. He could use some help testing it, so go help him! Once a few people have tested it and we’re pretty confident that it works well for users, we will add it to the links inside the app as well.

Intars is also the author of KeySharky, a Firefox extension that allows you to use keyboard shortcuts in Firefox to control Grooveshark playback from another tab, even if you’re not VIP! Pretty cool stuff.


Keyboard Shortcuts for Grooveshark Desktop

05 Jun

In a further effort to open Grooveshark to 3rd party developers, we have added an External Player Control API. (Side note: yes, it’s a hack to be polling a file all the time, but it’s also the only option we have until AIR 2.0 is out and most users have it installed.) Right now that means that we have support for keyboard shortcuts for OSX and Windows computers. To enable:

First open up desktop options by clicking on your username in the upper-right corner and selecting Desktop Options.

Notice the new option: Enable Global Keyboard Shortcuts (requires helper application)

Check that box (checking this box turns on polling the file mentioned in the External Player Control API) and select the proper client for your machine. In my case I chose windows, and I’ll go through setting that up now.

When you click on the link to download the keyboard helper, it should download via your browser. Once it is downloaded, run the app.

If prompted, tell Windows not to always ask before opening this file and choose Run.

The first time the application runs, it shows a list of keyboard shortcuts.

At this point the keyboard shortcuts listed should work! If they don’t, switch back to the desktop options window and make sure you click Apply or OK, then try again.

In your system tray you should notice a new icon (note: subject to change in future versions), that looks like the desktop icon with a blue bar in the lower-right corner. Rumor has it that is supposed to look like a keyboard.

If you right-click on the tray icon, you will see that there are a few options. The default configuration is pretty good, but I recommend setting it to start with Windows, so that it always is ready.

Big thanks go to James Hartig for hacking together the external player control API and the Windows keyboard shortcut helper, and to Terin Stock for the OSX helper. You can learn more about both keyboard shortcut helpers here. Hackers/developers: please feel free to extend functionality, create your own keyboard helpers (especially for Linux) or add integration to your current apps. Just show off what you’ve done so we can link to it!

Edit: Terin has supplied some Mac screenshots as well: