Archive for May, 2010

Redis: Converting from RDB to AOF

27 May

This is just a quick note about a question we had that we couldn’t find an easy answer to.

We decided to switch from redis’s default behavior of background saving (bgsave) a .rdb file to using append-only-file (AOF) mode. We thought we could just change the conf and restart, however, it created an empty AOF and was missing all the data from our .rdb.

Apparently the correct way to transition between the two is:
-While running in normal bgsave mode, run:

When that finishes, shut down the server, change the conf and start it back up. Now all your data should be present, and it should be using the AOF exclusively.


Last Year

19 May

It was almost a year ago that I made a post about hitting the 500,000 user mark and talked about some of our anticipated scaling issues.

In the past year we’ve grown to over 3.5 million registered users. In the past 30 days we added nearly 500,000. It’s absolutely incredible to me that it took us several years to reach 500,000, and now we add that many without even blinking.

I won’t rehash all the capacity issues we’ve had lately, but needless to say things have been at least as bad as I worried about a year ago. Partially that is because our server capacity has not grown nearly as quickly as our user base has up until now. Fortunately all that is changing, and we are getting a bunch more servers in, and should be growing our capacity more regularly from this point on. We still have infrastructure work to do to scale horizontally better, but at least we will soon have servers to scale horizontally to, a fairly critical piece of the puzzle, I would say. :)

Oh, and here’s what the office was like a year ago:

Thanks for voting for us at Grooveshark! from ben westermann-clark on Vimeo.


Redis Saves, and Ruins, the Day

16 May

Redis saves the day

Recently, we had to make an an emergency switch from MySQL to Redis for all of our PHP session handling needs. We’ve been using MySQL for sessions for years, literally, with no problems. Along the way we’ve optimized things a bit, for example by making it so that calls made by the client don’t load up a session unless it’s needed, and more recently by removing an auto increment id column to prevent the need for global table locks whenever a new session is created.

But then we started running into a brick wall. Connections would pile up on the master while hundreds of queries against sessions would sit in a state of ‘statistics’, each connection storm only lasting for a second, but long enough to cause us to run out of connections, even if we doubled or tripled the usual limits. Statistics means that the optimizer is trying to come up with an execution plan, but these are queries that interact with a single row based on the primary key, so something else was obviously going on there. As far as we’ve been able to tell, it’s not related to load in any way, iostat and load averages both show calm and steady loads when the connection storms happen, and they happen at seeemingly random times even when thraffic is at the lowest points of the day.

Our master DB still runs 5.0, so we thought maybe the combination of giving sessions their own server and running on a Percona build of 5.1 would resolve whatever bizarre optimizer issues we were having, but no luck. It definitely seems like a software issue, and it may just be due to the massive size of the table combined with the high level of concurrency that just makes MySQL lose its marbles every so often. Either way, we needed to come up with a solution fast, because the site was extremely flaky while sessions were randomly crashing.

We evaluated our options, what could we get up and running as quickly as possible on our one spare server that would have a chance of handling the load? We considered Redis, Cassandra, Pstgres, Drizzle and Memcached, but decided to go with Redis as a temporary solution because we have been using it successfully for some other high load situations and all the other options besides Memcached are thus far untested by us, and Memcached doesn’t have the durability that we require for sessions (we don’t want everyone to get logged out if the box needs to be rebooted).

Nate got Redis up and running while I spent 20 minutes hacking our session handler to use Redis instead of MySQL. There was no time to copy all the session data to Redis, so instead I made it check Redis for the session first, and then fall back to reading from MySQL if it’s not already in Redis. Quick tests on staging showed that it seemed to be working, so we pushed it live. Miraculously, everything just worked! Redis didn’t buckle from the load and my code was seemingly bug free. That is definitely the least time I’ve ever spent writing or testing such a critical piece of code before deploying, but desperate times call for desperate measures, right?

Since the switch, we haven’t had a single session related issue, and that’s how Redis saved the day.

Redis ruins the day

As I have mentioned in previous blog bosts, we have been using Redis on our stream servers for tracking stream keys before they get permanently archived on a DB server. Redis has been serving us well in this role for what seems like a couple of months now. Starting yesterday, however, our stream servers started going deeply into swap and becoming intermittently unreachable. This was especially odd because under normal circumstances we had about 10GB of memory free.

Turns out, Redis was using twice as much memory as usual every time it went to flush to disk, which is every 15 minutes with our configuration. So every 15 minutes, Redis would go from using 15GB of memory to using 30GB.

After talking to James Hartig for a little while we found out that this was a known issue with the version of Redis we were using (1.2.1), which had been fixed in the very next release. Ed upgraded to the latest version of Redis, and things have been fine since. But that’s how Redis ruined the day.


Our setup with Redis on our stream servers should continue to work for us for the foreseeable future. They provide a natural and obvious sharding mechanism, because storing information about the streams on the actual stream server that handles the request means that adding more stream servers automatically means adding more capacity.

On the other hand, Redis for sessions is a very temporary solution. We have 1-3 months before we’re out of capacity on one server for all session information because Redis currently requires that all information stored must be kept in memory. There isn’t a natural or easy way to shard something like sessions, aside from using a hashing algorithm or some sort which will require us to shuffle data around every time we add a new server, or use another server for keeping track of all of our shards. Redis is soon adding support for virtual memory so it will be possible to store more information than there is memory available, but we feel it still doesn’t adequately address the need to scale out, which will eventually come, just not as quickly as with MySQL. The lead candidate for handling sessions long term is Cassandra, because it handles the difficult and annoying tasks of sharding, moving data around and figuring out where it lives, for you. We need to do some extensive performance testing to make sure that it’s truly going to be a good long term fit for our uses, but I am optimistic. After all, it’s working for Facebook, Digg, Twitter and Reddit. On the other hand, Reddit has run into some speed bumps with it, and I still get the Twitter fail whale regularly, so clearly Cassandra is not made entirely of magic. The clock is ticking, and we still need a permanent home for playlists, which we’re also hoping will be a good fit for Cassandra, so we will begin testing and have at least some preliminary answers in the next couple of weeks, as soon as we get some servers to run it on.


Introducing Activity Feeds

10 May

Grooveshark’s new preloader image doesn’t really have anything to do with the new Activity Feeds feature (aside from also being new), but it sure does look nice, doesn’t it?

Saturday night/Sunday morning, Grooveshark released a pretty major set of features to VIP users onto and the desktop app. The major new feature is activity feeds, and it’s certainly the most interesting one, so I’ll cover that first.

When you visit the site, on the sidebar you should notice a new item called “Community” — next to that, it displays the number of events since the last time you logged in.

The actual community activity page is pretty awesome! It shows you aggregated information about the activity of all your friends! Possible activities include:
-Obsession (playing the same song a lot)
-Being really in an artist or album (listening to a lot of songs on that album or by that artist)
-Adding songs to your favorites or library.
Essentially, the power of this feature lies in being able to find out about new music that your friends are into. This feature turns Grooveshark into the best social music discovery service I’ve ever heard of.

Each user also has a profile page, with activity displayed by default, and links to their music library, community and playlist pages.

If you’re the kind of user who doesn’t want the user to know what you’re listening to at all times, then you have a couple of options in the settings page now. You can temporarily enable an “incognito mode” style setting which turns off logging to feeds for the duration of your session. This setting is perfect for parties or if you’re a hipster but just can’t resist the urge to listen to Miley Cyrus. No one has to know.
The other option is the more extreme “nuclear option” type of setting. It permanently disables logging your feed activity, and it permanently deletes all feed information we might have already stored.

Grooveshark is now available in Portugese! Translated by our very own Brazilian, Paulo. (Note: We will be removing country flags from languages soon, for those who are bothered by that sort of thing)

Shortcuts to playlists you are subscribed to now show up in the sidebar, below playlists you have created. The blue ones are your playlists, and the grey ones are subscribed playlists.

We now have autocomplete or “search suggest” functionality integrated into the search bar on the home screen.

Wondering if an artist is on tour? Want to buy tickets? Well now you can, thanks to our partner Songkick.

The library page has been revamped, and now playlists are contained within it. In the example pictured above, you can see that columns are collapsible: I collapsed the Albums column, while leaving the Artists column open.
Note: you can still get to your favorites by clicking on the Favorites smart playlist, or by going to My Music and then clicking on the button in the header that says “<3 Favorites”


Technology Stack

06 May

Ever wonder what technology powers Grooveshark?

Well that’s too bad, ’cause I’m going to tell you anyway. Most sites these days run on the LAMP stack (Linux, Apache, MySQL, PHP). Grooveshark runs on the LALMMRSPJG stack, more or less. Don’t try to pronounce that, you’ll only end up hurting yourself.

Linux: (CentOS primarily) for all of our servers except one lone Solaris box, which will be taken out back and shot one of these days, I hope.

Apache: All of our front-end nodes run apache. By front end node I mean everything serving up http traffic except for stream servers. For example,,,,, are all hosted on our front end nodes.

Lighttpd: Affectionally called lighty around here, it’s super efficient at serving up static content, so we use it on all of our stream servers instead of apache.

MySQL: We have several database servers, and they all run MySQL, much to my chagrin. We’d be using PostgreSQL if it had been up to me, but it wasn’t so we stick with MySQL. Now that Drizzle is coming along nicely, we are contemplating eventually moving fully or partially onto Drizzle, meaning our stack would be LALDMRSPJG or LALMDMRSPJG. Not much more pronounceable, I’m afraid.

Memcached: Without memcached, we would certainly not be where we are today. At this point nearly everything runs through memcached, reducing database load significantly and increasing site performance at the same time.

Redis: Redis is a new addition to our stack, but a welcome one. Redis is very similar to memcached in that it’s a key-value store, and it’s almost as fast, but it has the advantage of being disk-backed, so if you have to restart the server, you haven’t lost anything when it comes back up. Where memcached helps us save reads from MySQL, Redis helps us save reads and writes, because we can actually use it to store data that we intend to keep around.

Sphinx: MySQL fulltext indexes are absolutely horrible for search, so instead we use a technology called Sphinx. Sphinx recently got moved off of the front end servers and onto its own server, significantly reducing the load on the front end servers and improving the performance of search. Win-win!

PHP: Most of the code that makes Grooveshark work is written in PHP. All the websites I listed above, including the RPC services. Plenty of people hate PHP out there, including (or especially) those of us who program in it. It definitely has its warts, but it’s a language that is quick to develop in and it performs relatively well if you play to its strengths.

Java: Some of the code running on our servers, especially things that need to maintain state, are written in Java. Things that come to mind are the ad server, and some crazy stuff written in scala for keeping our stream servers in sync.

Gearman: Gearman is an awesome piece of the puzzle that we’re just starting to harness, and it’s going to help us scale out even more in the future. Gearmand is an extremely lightweight job queuing server with support for syncronous and asyncronous jobs. Workers can live on different servers and be written in different languages from clients. Gearman is great for map/reduce jobs or for allowing things that might be slow to be processed in the background without slowing down the user experience. For example, if our ad server needs to display an ad as quickly as possible *and* it needs to log the fact that it displayed the ad, it can fire off an asyncronous gearman job for the logging and get right to work on serving up the ad. Even if the logging portion is running incredibly slowly, nothing front-facing has to wait on it.
We have a super secret feature launching in about two weeks that would essentially not be possible without Gearman (and Redis). I’ll update in a couple of weeks to explain how Gearman makes it possible, once I can talk about what it is. :)

Please note that this list only includes the backend of the stack. We also have front-end clients written in HTML+JS, Flash+Flex, J2ME, Java, Objective C and some others on the way. It also doesn’t yet include Cassandra, but I’m hoping we can add that soon.


A long series of mostly unrelated issues

02 May

If you look at my recent posting (and tweeting) history, a new pattern becomes clear: Grooveshark has been down a lot lately. This morning, things broke yet again.

I don’t think we’ve been this unreliable since the beta days. If you don’t know what that means, consider yourself lucky. The point is that this is not the level of service we are aiming to provide, and not the level of service we are used to providing. So what’s going on?

Issue #1: Servers are over capacity

We hit some major snags getting our new servers, so we have been running over capacity for a while now. That means that at best, our servers are a bit slower than they should be and at worst, things are failing intermittently. Most of the other issues on this list are at least tangentially related to this fact, either because of compromises we had to make to keep things running, or because servers literally just couldn’t handle the loads that were being thrown at them. I probably shouldn’t disclose any actual numbers, but our User|Server ratio is at least an order of magnitude bigger than the most efficient comparable services we’re aware of, and at least two orders of magnitude bigger than Facebook…so it’s basically a miracle that the site hasn’t completely fallen apart at the seams.

Status: In Progress
Some of the new servers arrived recently and will be going into production as soon as we can get them ready. We’re playing catch up now though, so we probably already need more.

Issue #2: conntrack

Conntrack is basically (from my understanding) a built in part of Linux (or at least CentOS) related to the firewall. It keeps track of connections and enables some throttling to prevent DOS attacks. Unfortunately it doesn’t seem to be able to scale with the massive number of concurrent connections each server is handling now; once the number of connections reaches a certain size, cleanup/garbage collection takes too long and the number of connections tracked just grows out of control. Raising the limits helps for a little while, but eventually the numbers grow to catch up. Once a server is over the limits, packets are dropped en mass, and from a user perspective connections just time out.

Status: Fixed
Colin was considering removing conntrack from the kernel, but that would have caused some issues for our load balancer (I don’t fully understand what it has to do with the load balancer, sorry!). Fortunately he located some obscure setting that allows us to limit what conntrack is applied to, by port, so we can keep the load balancer happy without breaking everything when the servers are under heavy load. The fix seems to work well, so it should be deployed to all servers in the next couple of days. In the meantime, it’s already on the servers with the heaviest load, so we don’t expect to be affected by this again.

Issue #3: Bad code (we ran out of integers)

Last week we found out that playlist saving was completely broken. Worse, anyone trying to save changes to an existing playlist during that 3 hour period had their playlist completely wiped out.

There were really two issues here: a surface issue that directly caused the breakage, and an underlying issue that caused the surface issue.

The surface issue: the PlaylistsSongs table has an auto_increment field for uniquely identifying each row, which was a 32 bit unsigned int. Once that field is maxed out, it’s no longer possible to insert any more rows.

Underlying issue: the playlist class is an abomination. It’s both horrible and complex, but at the same time incredibly stupid. Any time a playlist is ever modified, the entries in PlaylistsSongs are deleted, and then reinserted. That means if a user creates a playlist with 5 songs and edits it 10 times, 50 IDs are used up forever. MySQL just has no way of going back and locating and reusing the gaps. How bad are the gaps? When we ran out of IDs there were over 3.5 billion of them; under sane usage scenarios, enough to last us years even at our current incredible growth rate.
We’ve known about the horror of this class and have been wanting to rewrite it for over a year, but due to its complexity and the number of projects that use the class, it’s not a quick fix, and for better or worse the focus at Grooveshark is heavily slanted towards releasing new feaures as quickly as possible, with little attention given to paying down code debt.

Status: Temporarily fixed
We fixed the problem in the quickest way that would get things working again — by making more integers available. That is, we altered the table and made the auto increment field a 64bit unsigned int. The Playlist class is still hugely wasteful of IDs and we’ll still run out eventually with this strategy, we’ve just bought ourselves a little bit of time. Now that disaster has struck in a major way, chances are pretty good that we’ll be able to justify spending the time to make it behave in a more sane manner. Additionally, we still haven’t had the chance to export the previous day’s backup somewhere so that people whose playlists were wiped out can have a chance to restore them. Some have argued that we should have been using a 64bit integer in the first place, but it should be obvious that that would only have delayed the problem and in the meantime, it wastes memory and resources.

Issue #4: Script went nuts

This was today’s issue. The details still aren’t completely clear, but basically someone who shall remain nameless wrote a bash script to archive some data from a file into the master database. That script apparently didn’t make use of a lockfile and somehow got spawned hundreds or maybe even thousands of times. The end result was that it managed to completely fill the database server. It’s actually surprising how elegantly MySQL handled this. All queries hung, but the server didn’t actually crash, which is honestly what I expected would happen in that situation. Once we identified the culprit, cut off its access to the database and moved things around enough to free up some space, things went back to normal.

Status: Fixed
The server is obviously running fine now, but the script needs to be repaired. In the meantime it’s disabled. One could say that there was an underlying issue that caused this problem as well, which is that it was possible for such a misbehaving script to go into production in the first place. I agree, and we have a new policy effective immediately that no code that touches the DB can go live without a review. Honestly, that policy already existed, but now it will be taken seriously.

Issue #5: Load Balancer crapped out

I almost forgot about this one, so I’m adding it after the fact. We were having some issues with our load balancer due to the fact that it was completely overloaded, but even once the load went down it was still acting funny. We did a reboot to restore normalcy, but after the reboot the load balancer was completely unreachable because our new switch thought it detected the routing equivalent of an infinite loop. At that point the only way to get things going was to have one of our techs make the 2 hour drive up to our data center to fix it manually.

This issue would have been annoying but not catastrophic had we remembered to reconnect the serial cable to the load balancer after everything got moved around to accommodate the new switch. It also wouldn’t have been so bad if we had someone on call near the data center who would have been able to fix the issue, but right now everyone is in Gainesville. Unless Gainesville wins the Google Fiber thing, there’s no way we can have the data center in Gainesville because there just isn’t enough bandwidth coming into the city for our needs (yes, we’re that big).

Status: Mostly fixed
We understand what happened with the switch and know how to fix the issue remotely now. We don’t yet know how to prevent the switch from incorrectly identifying an infinite loop when the load balancer boots up, but we know to expect it and how to work around it. We now also have the serial cable hooked up, and a backup load balancer in place, so if something happens again we’ll be able to get things working again remotely now. It would still be nice to not have to send someone on a 2 hour drive if there is a major issue in the future, but hopefully we have minimized the potential for such issues as much as possible.

Issue #6: Streams down

This issue popped up this week and was relatively minor compared to everything else that has gone wrong, since I believe users were affected for less than 20 minutes, and only certain streams failed. The unplanned downtime paid off in the long run because the changes that caused the downtime ultimately mean the stream servers are faster and more reliable.

We had been using MySQL to track streams, with a MySQL server running on every stream server, just for tracking streams that happen on that server. We thought this would scale out nicely, as more stream servers automatically means more write capacity. Unfortunately, due to locking issues, MySQL was ultimately unable to scale up nearly as far as we have been able to get our stream output to scale, so MySQL became a limiting factor in our stream capacity. We switched the stream servers over to Redis, which scales up much better than MySQL, has little to no locking issues, and is a perfect match for the kind of key-value storage we need for tracking streams.

Unfortunately, due to a simple oversight, some of the web servers were missing a critical component, or rather they thought they were because Apache needed to be reloaded before it would see the new component. This situation was made worse by testing that was less thorough than it should have been, so it took longer to identify the issue than would be idea. Fortunately, the fix was extremely simple so the overall downtime or crappy user experience did not last very long.

Status: Fixed with better procedures on the way
The issue was simple to fix, but it also helps to highlight the need for better procedures both for putting new code live and for testing. These new changes should be going into effect some time this week, before any more big changes are made. In the meantime, streams should now be more reliable than they have been in the past few weeks.