RSS
 

Archive for the ‘grooveshark’ Category

Optimize for concurrency, not throughput

06 Feb

When it comes to disks, raid and otherwise, there’s a lot of confusion about how to optimize for different server configurations, and unfortunately much more emphasis seems to be placed on sustained throughput, with little mention going to IOPS or concurrency.

Here are grooveshark we use Sun’s X4500, aka “Thumper” for housing your MP3s. Lately we’ve been hitting some buffering issues during peak hours, where it seemed like the 4500 was not able to keep up with the requests coming in. Although we are growing rapidly, we should still be nowhere near saturating the IOPS of 48 drives, but iostat -Cx was showing that the drives dedicated to serving up content had transactions waiting 100% of the time, and the disks were busy 100% of the time. Service time was in the triple digits. Insanity. Something was obviously misconfigured.

We were using 3 zpools for our content, each configured in raidz with about 8 drives (if I remember correctly). Ok, so we actually only have the IOPS of 24 drives in that configuration, but we still should not be anywhere near saturating that. After a fair amount of digging, I discovered that raidz is probably the worst configuration possible for serving up thousands of 3-5mb files concurrently. That’s because raidz causes every drive to engage in every read request, no matter how small the file. That means your theoretical transfer rate for any given file is N times the transfer rate of each drive, but your IOPS is the same as one drive, no matter how many drives you have in your pool. In other words, adding more drives to a raidz increases disk transfer bandwidth, but does nothing to alleviate the overhead associated with random access seeks. This configuration is ideal if you are transferring multi-gigabyte files frequently, but at 3.5-5MB, seek time makes up almost all of the overhead, and it would be much better to have N drives all seeking to N files simultaneously.

I brought up these concerns to our sysadmins, and they set up a small mirrored pool to handle our most popular content, and the performance difference is quite astounding even with just two disks in that pool. In a mirrored configuration like this, each drive can respond to requests for data separately because they each have the full set of data. Adding 2 more drives to the 24 drives already in production nearly doubled our IOPS capacity, because we went from the IOPS of 3 drives to the IOPS of 5, nearly doubling our capacity with only two hard drives. Our system admins will be adding more disks to the mirror to give us even more breathing room. We’ll have to wait and see tomorrow if the added IOPS capacity eliminates the buffering issues that users have been running into lately, but I bet it will.

 
 

Indexes Matter (or: Memcache Will Only Take You So Far)

28 Jan

About a week ago, I was doing some work on the DB in the middle of the night and noticed that my simple queries were running a bit sluggish. I dropped out of mysql and ran top, and noticed that load averages were way higher than I was used to seeing. I ran SHOW FULL PROCESSLIST a bunch of times, and noticed two queries popping up frequently, one was a backend processing query which did not belong on the production database, and the other was the query used to build Widget objects. My first suspect was the backend process, since it did not belong, so we took that off and moved it to a more appropriate server, which brought down the load average by 1; a significant improvement, but the load averages were still pretty high, however the server was usable and responsive enough again, so I forgot about it.

A couple of days later, I noticed our load averages were still pretty high and the main recurring query was still the widget one, so I ran an explain on it, and although the query looked innocent enough, it was missing an index, so instead of a quick lookup it was a full table scan across millions of rows. Ouch.

I knew we wouldn’t have a chance to have some downtime to run the necessary ALTERs to get the indexes in there until after the weekend, so I asked Chanel to put in memcache support so that widgets would only need to be loaded once from SQL. Chanel got that done on Sunday, and on Monday night we were able to get the proper indexes added.

Because of the time span involved, combined with the fact that we monitor server metrics with Zabbix, means that we can look back at a nice little graph of our performance before and after each of the changes.

The days with the grey background are Saturday and Sunday, before memcache was added. The next day, with memcache added the peak load is cut in half. The day after that, with proper indexes, the peak load is barely perceptible, roughly 1/4 of what the load was with just memecache.

The lesson to be learned from this is that while memcache can help quite a bit, there’s a lot to be said for making sure your SQL queries are optimized.

 
 

Grooveshark is growing!

14 Jan

I’m probably not at liberty to speak about specific numbers, but I have to share my elation at Grooveshark’s current growth rate.
Last month I predicted that given our current growth rate, we should double our total number of users every 3.6 months.

We are one day away from hitting a big round number, so I thought I’d look back and see how long ago we were at half of that. What do you know, 3.5 months ago. A growth rate slightly better than I had projected.

This, my friends, is exponential growth, and it turns out that when you have exponential growth it is insanely easy to calculate how long it will take to double your numbers. It’s called the rule of 72 and it basically states that you simply divide 72 by your growth rate to get your doubling time. For example, 72/(20% per month) = 3.6 months.

I must credit Dr. Albert Bartlett for teaching me about the rule of 72. According to Dr. Albert Bartlett, “the greatest shortcoming of the human race is our inability to understand the exponential function,” and I highly recommend watching his lecture on the topic. (sorry, it’s a .ram file. here is an alternate version on google video, which I haven’t watched)

 
 

Grooveshark just got more album art

20 Dec

I’m still getting used to not calling it “Grooveshark Lite.” For those who haven’t noticed, it’s been re-branded as just plain old Grooveshark. On one hand, it’s an awesome testament to just how successful the project has been. On the other hand, it’s a little harder to talk about it now. I can take credit for a large portion of Grooveshark Lite, but I can’t really take credit for a large portion of Grooveshark; Grooveshark represents so much more to me than just our player.

Anyway, thanks to a nifty little Perl script that Travis helpfully rewrote for me to be more compatible with our hosting environment, I was able to grab art for an extra 57,000 or so albums that we had not been able to get from our partners previously. What does this mean for you, the user? You should see less of this:

in your queue and song info panels, and more real album art, like this:

What’s the magic trick? Simple, the script looks for art embedded inside the mp3s that we have associated with an album, guesses at which piece of art is best. It’s surprising how many mp3s actually have art embedded in them.

Hopefully this will make Grooveshark more usable. I know I find it very frustrating when I have a large queue filled mostly with the default album art; it’s very hard to tell where you are when everything looks the same.

 
 

How I Broke Grooveshark Lite – update

10 Dec

After some poking around and scratching our heads a bit, Katy and I were able to discover the source of the problem. Turns out what I hit is a compiler/language/optimization bug. Katy has more about it here, including the generic code we were able to distill it down to in order to cause the breakage.

We thought it was a permutation of this bug which is fixed, but apparently this is a new one, so we are going to file it.

I find it a bit disturbing that the related bug was brought to their attention in 2006 and the limited fix was only released to the general public 2 months ago. Over two years to release a fix to a language bug?!? From the comments on the bug, they do not seem to take it very seriously:

Fixed in the avmplus mainline. I’d suggest a 10.0 target date, since the change will need soaking time, there is a work-around and it is rare to encounter this problem.

Change 243008 by rreitmai@rickr-dev on 2006/09/08 12:37:23

Severity: important
Summary: Verifier error bugfix

Detailed Description:

An optimization in the verifier allows us to avoid checking for null in various circumstances. Unfortunately, we were being a little too aggressive and we missed a case. If a block is a target of a backwards branch then we need to assume upon entry of the block that no check has been performed.

The other way to fix this would be to emit null checks just prior to the branch for any values that have notNull true, but this could also create a bunch of unnecessary checks.

Aren’t obscure language bugs the absolute worst kind to have? In my case, it first went undetected because it didn’t trigger any errors/warnings with the debug version of flash (although our sample condensed code does), and then it was extremely difficult to track down because THE CODE WAS CORRECT, and when the production version of flash fails, it does so silently.

 

How I broke Grooveshark Lite

07 Dec

I am a PHP and MySQL developer, primarily, and my main responsibility is the server-side Grooveshark Lite components. A couple of days ago I got to write my first ever Actionscript for the Grooveshark Lite front end; an ad rotator to handle ads more statefully and therefore more intelligently.

First I ran into a horrible language bug that causes a nasty runtime crash wherein Flash complains that it can’t resolve a class with itself. I wasn’t doing any weird voodoo with class definitions and Katy, our primary Flex developer, couldn’t find anything wrong with my code, but rewrote some of it anyway. Poof, problem fixed.

Everything looked fine so we thought we were ready to launch. So we did last night. Skyler complained that album art wasn’t loading for him, but he had just reinstalled 64 bit flash on 64 bit Linux, and he’s had weird problems with that before. We were unable to reproduce, attributed it to Flash+Linux weirdness and ignored the issue. Today, more people complained about it. We were not able to reproduce. Eventually I figured out that the bug was only affecting non-debug versions of Flash, but I couldn’t imagine what would be causing album art to not display: my code has nothing to do with displaying images, and I could see in Firebug that the art was being fetched properly, nothing was wrong with the headers, etc.. But it definitely worked fine in the debug version of Flash. Katy looked at it, and through process of elimination was able to determine that the problem was somewhere in my code, but couldn’t tell where exactly. Everything looked valid, and from the perspective of the debug version of Flash, it was, but my code wasn’t very “actionscripty” – because I’m used to writing PHP and haven’t had any sort of training on actionscript beyond looking at existing code and occasionally looking at a language reference book. Katy couldn’t figure out what was wrong with my code, so she rewrote it, and now everything is back to normal again.

I don’t yet know what part of my code was causing the problem, so on Monday I plan to reinstate my code, downgrade to the non-debug version of flash, and figure out what the heck is causing the bug. My instinct tells me that it’s probably a combination of an optimization setting in non-debug Flash combined with my use of Actionscript in a way that is technically valid but that no Actionscript coder would ever normally use it. For example, Katy says I was using a generic Object as a Dictionary, because I was actually trying to replicate associative arrays in PHP, I didn’t know about the Dictionary object, and when I had asked previously how to do that, I was told to use an Object. :P

The good news is that I don’t think many people outside the company paid much attention to this bug. They were too busy being excited about the new features that Katy introduced: broadcast to twitter, broadcast to facebook, and deep linking support: When you “byte” a song (open the info panel), your URL bar changes. You can simply copy the URL to share with a friend, *and* clicking back and forward work, so if you want to go from the song you opened back to the playlist you found it on, just click back in your browser. Freaky, but it works.

 
 

If everything is top priority, then nothing is top priority

05 Dec

One of my heroes, Raymond Chen, writes about how he dealt with the annoying habit of management to give conflicting things top priority (here).

In the past we’ve had the same problems with management at Grooveshark. I was not clever/important enough to single handedly shape the behavior of management, however, so the end result here was that I would pick from the “top priority” list, which ended up being a list of every single thing that needed to be done, ever, and just do the ones that most interested me.

Grooveshark management is taking a new approach now, and we lowly devs are now more empowered to decide what our priorities should be. Essentially, it’s a lot like the old system, but lest frustrating for both parties, because there’s no pretense. :)

 

Leaky Abstractions

12 Sep

As with nearly every issue in Software Engineering worth thinking about, Joel Spolsky has written an article about leaky abstractions that is very relevant to some problems I ran into tonight.

Abstractions do not really simplify our lives as much as they were meant to. [...] all abstractions leak, and the only way to deal with the leaks competently is to learn about how the abstractions work and what they are abstracting. So the abstractions save us time working, but they don’t save us time learning [...] and all this means that paradoxically, even as we have higher and higher level programming tools with better and better abstractions, becoming a proficient programmer is getting harder and harder

In my case, I was not working with programming tools per se, more of an abstraction of an abstraction built into the Grooveshark framework that is meant to make life as a programmer easier. And normally it does. But in this case, that abstraction was wrapped in a couple more layers of abstraction away from where my code needed the information, and somewhere in there the information was, for lack of a better description, being mangled. The particular form of mangling is actually due to a lower level abstraction at the DB layer, but is not handled in the higher level of abstraction because for most uses it doesn’t matter.

From the level of abstraction where my code was sitting, the information needed in order to un-mangle the data was simply not available. I went through the various layers to find a convenient place to put the un-mangling, but by the time I found a place, everything was so abstract that I couldn’t confidently modify that code and see all potential ramifications, so I just did an end-run around many of the layers of abstraction. This is certainly not ideal, but it works. The point is, an abstraction intended to make life as a programmer slightly easier most of the time, can easily make life as a programmer significantly more difficult on edge cases. Unfortunately, once a code base has been established, most cases are edge cases: feature additions, new products built on top of the old infrastructure, etc., all or most unforeseen, and therefore unaccounted for during the original design process.

 

Quick searching in Google Chrome

02 Sep

Google released their Chrome Browser today. I’ll refrain from giving it a full review as many others have already manged to do so.

I want to comment on the ease of adding search engines and some minor tweaks you can do to make the search functionality more powerful, specifically in regards to Grooveshark Lite and TinySong.

To add a search engine to Chrome, just visit the site. If they have properly implemented their search extensions by embedding information about them in the page (and we have), then all you have to do is visit the site and Google will add it to its list of search engines. Each search engine has a “keyword” that you type before your search term to indicate which search engine you wish to use. The default keyword is the domain of the search engine, but you can change it by clicking on the wrench icon, and then Options | Basics | Default Search | Manage

I changed mine so that I can search lite by typing in “listen artistname” so if I want to listen to Soltero I type “listen Soltero.”
Likewise I modified the keyword for TinySong to be ts. So if I want to search for Camera Obscura, I type “ts Camera Obscura”

Another thing about Google Chrome that is really cool as it relates to Grooveshark Lite is the built in support for making any website an application. I installed Grooveshark Lite on my start menu, and now it runs as an independent application with the lite Favicon showing up in the taskbar. Very cool.

 
 

On being a DJ

18 Aug

When I was in college (oh so long ago…) I was a DJ for our radio station, and then I was a music director. I loved being a DJ: having lots of new, interesting and unreleased music on tap, from Smashing Pumpkins to Underwater Boxer; having a channel to share that music with other people; being able to make a small band’s day by playing their stuff and reporting it to CMJ. Well, there was one part I didn’t care for so much: talking on the radio. I’m a bit shy, which is why although I loved being a DJ and music director at Eckerd College, I knew it wasn’t ever going to be a career path for me.

It’s interesting, then, that I work at Grooveshark where much of that dream is being fulfilled by participating in this movement. The one piece that is missing is having a channel to share music with other people and subsequently helping small bands by making them more discoverable. Well, now with the release of Autoplay in Grooveshark Lite, it’s kind of like I get to be everybody’s DJ. Of course a computer scientist would write a DJing program rather than doing the manual labor of DJing.

As Professor Fishman, the best professor who ever lived, was fond of saying in our classes, a computer scientist isn’t satisfied with just using computers to put other people out of a job, they won’t settle until they manage to put themselves out of a job too. To be fair, he usually talked about that in the context of AI and specifically programming languages such as LISP, where the program can rewrite itself, but I think it applies here as well.

Now I get to be everyone’s DJ, but with everyone’s help too. If the system is currently a bad DJ, keep giving it feedback and it will learn. Imagine if you got to call up your local radio station and yell at them every time they played something you didn’t like, and congratulate them every time they played something you liked. If they didn’t block your phone number, you’d end up with the ultimate radio station for you, and that’s what Grooveshark aims to be, although we admit it will take some time to get there.

Check out Autoplay, and let me know what you think.