RSS
 

Technology Stack

06 May

Ever wonder what technology powers Grooveshark?

Well that’s too bad, ’cause I’m going to tell you anyway. Most sites these days run on the LAMP stack (Linux, Apache, MySQL, PHP). Grooveshark runs on the LALMMRSPJG stack, more or less. Don’t try to pronounce that, you’ll only end up hurting yourself.

Linux: (CentOS primarily) for all of our servers except one lone Solaris box, which will be taken out back and shot one of these days, I hope.

Apache: All of our front-end nodes run apache. By front end node I mean everything serving up http traffic except for stream servers. For example, listen.grooveshark.com, www.grooveshark.com, tinysong.com, widgets.grooveshark.com, cowbell.grooveshark.com are all hosted on our front end nodes.

Lighttpd: Affectionally called lighty around here, it’s super efficient at serving up static content, so we use it on all of our stream servers instead of apache.

MySQL: We have several database servers, and they all run MySQL, much to my chagrin. We’d be using PostgreSQL if it had been up to me, but it wasn’t so we stick with MySQL. Now that Drizzle is coming along nicely, we are contemplating eventually moving fully or partially onto Drizzle, meaning our stack would be LALDMRSPJG or LALMDMRSPJG. Not much more pronounceable, I’m afraid.

Memcached: Without memcached, we would certainly not be where we are today. At this point nearly everything runs through memcached, reducing database load significantly and increasing site performance at the same time.

Redis: Redis is a new addition to our stack, but a welcome one. Redis is very similar to memcached in that it’s a key-value store, and it’s almost as fast, but it has the advantage of being disk-backed, so if you have to restart the server, you haven’t lost anything when it comes back up. Where memcached helps us save reads from MySQL, Redis helps us save reads and writes, because we can actually use it to store data that we intend to keep around.

Sphinx: MySQL fulltext indexes are absolutely horrible for search, so instead we use a technology called Sphinx. Sphinx recently got moved off of the front end servers and onto its own server, significantly reducing the load on the front end servers and improving the performance of search. Win-win!

PHP: Most of the code that makes Grooveshark work is written in PHP. All the websites I listed above, including the RPC services. Plenty of people hate PHP out there, including (or especially) those of us who program in it. It definitely has its warts, but it’s a language that is quick to develop in and it performs relatively well if you play to its strengths.

Java: Some of the code running on our servers, especially things that need to maintain state, are written in Java. Things that come to mind are the ad server, and some crazy stuff written in scala for keeping our stream servers in sync.

Gearman: Gearman is an awesome piece of the puzzle that we’re just starting to harness, and it’s going to help us scale out even more in the future. Gearmand is an extremely lightweight job queuing server with support for syncronous and asyncronous jobs. Workers can live on different servers and be written in different languages from clients. Gearman is great for map/reduce jobs or for allowing things that might be slow to be processed in the background without slowing down the user experience. For example, if our ad server needs to display an ad as quickly as possible *and* it needs to log the fact that it displayed the ad, it can fire off an asyncronous gearman job for the logging and get right to work on serving up the ad. Even if the logging portion is running incredibly slowly, nothing front-facing has to wait on it.
We have a super secret feature launching in about two weeks that would essentially not be possible without Gearman (and Redis). I’ll update in a couple of weeks to explain how Gearman makes it possible, once I can talk about what it is. :)

Please note that this list only includes the backend of the stack. We also have front-end clients written in HTML+JS, Flash+Flex, J2ME, Java, Objective C and some others on the way. It also doesn’t yet include Cassandra, but I’m hoping we can add that soon.

 
 
  1. Jeff

    May 12, 2010 at 1:33 am

    “Plenty of people hate PHP out there.” Everywhere, there is someone who is going to hate something, but I think in general, PHP is king. It powers the majority of websites.

    Good choice on CentOS. That is my preferred distro to use.

    Would you mind telling us how many servers you have powering GrooveShark? Maybe even a breakdown? Thanks!

     
  2. Tom

    May 15, 2010 at 8:56 am

    I love using your guys service, but I can’t help but wonder: How do you guys avoid legal troubles? You’re basically hosting and playing copyrighted music.

     
  3. Jay

    May 16, 2010 at 6:20 am

    The legal stuff is definitely not my department! But our model is just about identical to YouTube…we’re legal the same way they are.

     
  4. IT.NeverEnds

    July 5, 2010 at 11:44 am

    I was actually wondering what was driving “Grooveshark” (no i wasn’t bored just curious, hehe)
    and that’s how i ended in your blog, thanks for the explanation.

     
  5. André

    September 22, 2010 at 5:49 am

    Really nice post.

    But what about technology related to the process where you collect songs?
    Or Grooveshark relies solely on uploaded content by users?

    Cheers,
    André

     
  6. Jay

    October 13, 2010 at 4:38 am

    Grooveshark relies solely on uploaded content and content from distribution partners. We don’t crawl the web for content or anything like that.

     
  7. xeleema

    March 18, 2011 at 4:53 am

    So what’s this about a lone Solaris box? Is that actually a SPARC-based box, or is it running the x86 build? (I’m a bit of a Solaris fan, myself. But only when running on SPARC).

     
  8. Jay

    March 25, 2011 at 10:01 pm

    Ha. It’s long gone now but I am pretty sure it was running x86, sorry!

     
  9. Aymen

    April 5, 2011 at 3:31 pm

    In fact how much time did it take to build grooveshark as we use it today-2011 :-) ???

     
  10. Jay

    April 14, 2011 at 1:53 am

    I’ve been working at Grooveshark since around 2007, the product existed before that but was pretty crappy and invite-only. But we have a habit of throwing out a lot of the work (especially front end work) we have done and starting over from scratch every 6 months. :P

     
  11. exfromtheleft

    February 27, 2014 at 7:18 am

    i was expecting noSQL database instead of relational mysql, and ruby or python instead of my main language php…