Ever wonder what technology powers Grooveshark?
Well that’s too bad, ’cause I’m going to tell you anyway. Most sites these days run on the LAMP stack (Linux, Apache, MySQL, PHP). Grooveshark runs on the LALMMRSPJG stack, more or less. Don’t try to pronounce that, you’ll only end up hurting yourself.
Linux: (CentOS primarily) for all of our servers except one lone Solaris box, which will be taken out back and shot one of these days, I hope.
Apache: All of our front-end nodes run apache. By front end node I mean everything serving up http traffic except for stream servers. For example, listen.grooveshark.com, www.grooveshark.com, tinysong.com, widgets.grooveshark.com, cowbell.grooveshark.com are all hosted on our front end nodes.
Lighttpd: Affectionally called lighty around here, it’s super efficient at serving up static content, so we use it on all of our stream servers instead of apache.
MySQL: We have several database servers, and they all run MySQL, much to my chagrin. We’d be using PostgreSQL if it had been up to me, but it wasn’t so we stick with MySQL. Now that Drizzle is coming along nicely, we are contemplating eventually moving fully or partially onto Drizzle, meaning our stack would be LALDMRSPJG or LALMDMRSPJG. Not much more pronounceable, I’m afraid.
Memcached: Without memcached, we would certainly not be where we are today. At this point nearly everything runs through memcached, reducing database load significantly and increasing site performance at the same time.
Redis: Redis is a new addition to our stack, but a welcome one. Redis is very similar to memcached in that it’s a key-value store, and it’s almost as fast, but it has the advantage of being disk-backed, so if you have to restart the server, you haven’t lost anything when it comes back up. Where memcached helps us save reads from MySQL, Redis helps us save reads and writes, because we can actually use it to store data that we intend to keep around.
Sphinx: MySQL fulltext indexes are absolutely horrible for search, so instead we use a technology called Sphinx. Sphinx recently got moved off of the front end servers and onto its own server, significantly reducing the load on the front end servers and improving the performance of search. Win-win!
PHP: Most of the code that makes Grooveshark work is written in PHP. All the websites I listed above, including the RPC services. Plenty of people hate PHP out there, including (or especially) those of us who program in it. It definitely has its warts, but it’s a language that is quick to develop in and it performs relatively well if you play to its strengths.
Java: Some of the code running on our servers, especially things that need to maintain state, are written in Java. Things that come to mind are the ad server, and some crazy stuff written in scala for keeping our stream servers in sync.
Gearman: Gearman is an awesome piece of the puzzle that we’re just starting to harness, and it’s going to help us scale out even more in the future. Gearmand is an extremely lightweight job queuing server with support for syncronous and asyncronous jobs. Workers can live on different servers and be written in different languages from clients. Gearman is great for map/reduce jobs or for allowing things that might be slow to be processed in the background without slowing down the user experience. For example, if our ad server needs to display an ad as quickly as possible *and* it needs to log the fact that it displayed the ad, it can fire off an asyncronous gearman job for the logging and get right to work on serving up the ad. Even if the logging portion is running incredibly slowly, nothing front-facing has to wait on it.
We have a super secret feature launching in about two weeks that would essentially not be possible without Gearman (and Redis). I’ll update in a couple of weeks to explain how Gearman makes it possible, once I can talk about what it is. :)
Please note that this list only includes the backend of the stack. We also have front-end clients written in HTML+JS, Flash+Flex, J2ME, Java, Objective C and some others on the way. It also doesn’t yet include Cassandra, but I’m hoping we can add that soon.