RSS
 

Archive for the ‘grooveshark’ Category

Migrating to github

26 Jan

I needed to migrate a couple of repositories with lots of tags and branches to github, and Github’s instrucitons just weren’t cutting it:

Existing Git Repo?
cd existing_git_repo
git remote add origin [email protected]:[orgname]/[branchname].git
git push -u origin master

That’s great as long as all I care about is moving my master branch, but what I really wanted was just to make github have everything that my current origin did, whether or not I was tracking it. A quick cursory Google search didn’t find any instructions how to do this, so I had to figure it out the old fashioned way, by reading git help pages. The quickest and easiest way I could find to do this after creating the repo on github is:

git clone git@[originurl]:[orgname]/[reponame].git [local temp dir] --mirror
cd [local temp dir]
git remote add github [email protected]:[orgname]/[reponame].git
git push -f --mirror github
cd ..
rm -rf [local temp dir]

that’s it! At this point github should have everything that your original origin had, including everything in /refs. It won’t have any of your local branches, since you did everything from a clean checkout. You might also want to change where origin is pointing:

cd [dir where your repo is checked out]
git remote add oldOrigin git@[oldgitserver]:[orgname]/[reponame].git
git remote set-url origin [email protected]:[orgname]/[reponame].git

Of course, this same procedure should work for moving from any git server to any other git server too.

 

Grooveshark is Hiring (Part 2 – Javascript Edition)

19 May

Grooveshark is looking for talented web developers. We are looking to fill a wide variety of web dev positions. The next few posts I make will be job descriptions for each of the major positions we are looking to fill. For part 2, I’m listing our Javascript developer position. If your skillset is more front-end leaning but you feel like you would be a good fit for Grooveshark, by all means apply now rather than waiting for me to post the job description. :)

Grooveshark is looking for a hardcore JavaScript developer

NOTE: Grooveshark is a web application, written with HTML/CSS, JavaScript, PHP, and ActionScript 3, not a web page or collection of webpages. This position involves working directly with the client-side application code in JavaScript: the “backend of the frontend” if you will.

Must be willing to relocate to Gainesville, FL and legally work in the US. Relocation assistance is available.
Responsibilities:
Maintaining existing client-side code, creating new features and improving existing ones
Writing good, clean, fast, secure code on tight deadlines
Ensuring that client architecture is performant without sacrificing maintainability and flexible enough for rapid feature changes
Striking a balance between optimizing client performance versus minimizing load on the backend
Integration of third-party APIs

Desired Qualities:
Enjoy writing high quality, easy to read, self-documenting code
A passion for learning about new technologies and pushing yourself
A deep understanding of writing bug-free code in an event-driven, asynchronous environment
Attention to detail
A high LOC/bug ratio
Able to follow coding standards
Good written and verbal communication skills
Well versed in best practices & security concerns for web development
Ability to work independently and on teams, with little guidance and with occasional micromanagement
More pragmatic than idealistic

Experience:
Extensive JavaScript experience, preferably in the browser environment
Experience with jQuery
Experience with jQueryMX or another MVC framework
HTML & CSS experience, though writing it will not be a primary responsibility
Some PHP experience, though you won’t be required to write it
Knowledge of cross-browser compatibility ‘gotchas’
Experience with EJS, smarty, or other templating systems
Experience with version control software (especially git or another dvcs)

Bonus points for:
Having written a client application (in any language) that relies on a remote server for data storage and retrieval
Having written a non-trivial jQuery plugin
Experience with JavaScript/Flash communication via ExternalInterface
Experience with integrating popular web APIs (OAuth, Facebook, Twitter, Google, etc) into client applications
Experience with ActionScript 3 outside the Flash Professional environment (ie, non-timelined code, compiling with mxmlc or similar)
Experience developing on the LAMP stack (able to set up a LAMP install with multiple vhosts on your own)
Experience with profiling/debugging tools
Being well read in Software Engineering practices
Useful contributions to the open source community
Fluency in lots of different programming languages
BS or higher in Computer Science or related field
Being more of an ‘evening’ person than a ‘morning’ person
A passion for music and a desire to revolutionize the industry

Who we don’t want:
Architecture astronauts
Trolls
Complete n00bs (apply for internship or enroll in Grooveshark University instead!)
People who want to work a 9-5 job
People who would rather pretend to know everything than actually learn
Religious adherents to The Right Way To Do Software Development
Anyone who would rather use XML over JSON for RPC

Send us:
Resume
Code samples you love
Code samples you hate
Links to projects you’ve worked on
Favorite reading materials on Software Engineering (e.g. books, blogs)
What you love about JavaScript
What you hate about JavaScript (and not just the differences in browser implementations)
Prototypal vs Classical inheritance – what are the differences and how do you feel about each?
If you could change one thing about the way Grooveshark works, what would it be and how would you implement it?

 

If you want a job:  jay at groovesharkdotcom
If you want an internship: [email protected]

 

Grooveshark is Hiring (Part 1 – PHP edition)

16 May

Grooveshark is looking for talented web developers. We are looking to fill a wide variety of web dev positions. The next few posts I make will be job descriptions for each of the major positions we are looking to fill. For part 1, I’m listing our backend PHP position. If your skillset is more front-end leaning but you feel like you would be a good fit for Grooveshark, by all means apply now rather than waiting for me to post the job description. :)

 

Grooveshark is seeking awesome PHP developers.

Must be willing to relocate to Gainesville, FL and legally work in the US. Relocation assistance is available.
Responsibilities:
Maintaining existing backend code & APIs, creating new features and improving existing ones
Writing good, clean, fast, secure code on tight deadlines
Identifying and eliminating bottlenecks
Writing and optimizing queries for high-concurrency workloads in SQL, MongoDB, memcached, etc
Identifying and implementing new technologies and strategies to help us scale to the next level

Desired Qualities:
Enjoy writing high quality, easy to read, self-documenting code
A passion for learning about new technologies and pushing yourself
Attention to detail
A high LOC/bug ratio
Able to follow coding standards
Good written and verbal communication skills
Well versed in best practices & security concerns for web development
Ability to work independently and on teams, with little guidance and with occasional micromanagement
More pragmatic than idealistic

Experience:
Experience developing on the LAMP stack (able to set up a LAMP install with multiple vhosts on your own)
Extensive experience with PHP
Extensive experience with SQL
Some experience with Javascript, HTML & CSS though you won’t be required to write it
Some experience with lower level languages such as C/C++
Experience with version control software (especially dvcs)

Bonus points for:
Well read in Software Engineering practices
Experience with a SQL database and optimizing queries for high concurrency on large data sets.
Experience with noSQL databases like MongoDB, Redis, memcached.
Experience with Nginx
Experience creating APIs
Knowledge of Linux internals
Experience working on large scale systems with high volume of traffic
Useful contributions to the open source community
Fluency in lots of different programming languages
Experience with browser compatability weirdness
Experience with smarty or other templating systems
BS or higher in Computer Science or related field
Experience with Gearman, RabbitMQ, ActiveMQ or some other job distribution/message passing system for distributing work
A passion for music and a desire to revolutionize the industry

Who we don’t want:
Architecture astronauts
Trolls
Complete n00bs (apply for internship or enroll in Grooveshark University instead!)
People who want to work a 9-5 job
People who would rather pretend to know everything than actually learn
Religious adherents to The Right Way To Do Software Development
Anyone who loves SOAP

Send us your:
Resume
Code samples you love
Code samples you hate
Favorite reading materials on Software Engineering (e.g. books, blogs)
Tell us when you would use a framework, and when you would avoid using a framework
ORM: Pros, cons?
Unit testing: pros, cons?
Magic: pros, cons?
When/why would you denormalize?
Thoughts on SOAP vs REST

If you want a job: jay at groovesharkdotcom

If you want an internship: [email protected]

 

Grooveshark Playlists now in MongoDB

06 Mar

As of about 5:30am last night (this morning?) Grooveshark is now using MongoDB to house playlist information.

Until now playlists have lived in MySQL, but there were some big problems that occasionally lead to data loss due (mostly) to deadlocks. Needless to say, users don’t like it when you lose their data. Moving to Mongo should resolve all of these issues.

Grooveshark has been using MongoDB for sessions and feed data for a while now, so we are comfortable with the technology and know that it is capable of handling massive amounts of traffic. while it’s certainly not perfect, we are confident that it will be easy to scale out to maintain reliability as our user base continues to grow rapidly.

 
 

Why You Should Always Wrap Your Package

30 Jul

Ok, the title is a bit of a stretch, but it’s a good one isn’t it?

What I really want to talk about is an example of why it’s a good idea to make wrappers for PHP extensions instead of just using them directly.

When Grooveshark started using memcached ever-so-long-ago, with the memcache pecl extension, we decided to create a GMemcache class which extends memcache. Our main reason for doing this was to add some convenience (like having the constructor register all the servers) and to add some features that the extension was missing (like key prefixes). We recently decided that it’s time to move from the stagnant memcache extension to the pecl memcached extension, which is based on libmemcached, which supports many nifty features we’ve been longing for, such as:

  • Binary protocol
  • Timeouts in milliseconds, not seconds
  • getByKey
  • CAS
  • Efficient consistent hashing
  • Buffered writes
  • Asyncronous I/O

Normally such a transition would be a nightmare. Our codebase talks to memcached in a million different places. But since we’ve been using a wrapper from day 1, I was able to make a new version of GMemcache with the same interface as the old one, that extends memcached. It handles all the minor differences between how the two work, so all the thousands of other lines in the app that talk to memcached do not have to change. That made the conversion a <1 day project, when it probably would have otherwise been a month long project. It also has the advantage that if we decide for some reason to go back to using pecl memcache, we only have to revert one file.

 

Last Year

19 May

It was almost a year ago that I made a post about hitting the 500,000 user mark and talked about some of our anticipated scaling issues.

In the past year we’ve grown to over 3.5 million registered users. In the past 30 days we added nearly 500,000. It’s absolutely incredible to me that it took us several years to reach 500,000, and now we add that many without even blinking.

I won’t rehash all the capacity issues we’ve had lately, but needless to say things have been at least as bad as I worried about a year ago. Partially that is because our server capacity has not grown nearly as quickly as our user base has up until now. Fortunately all that is changing, and we are getting a bunch more servers in, and should be growing our capacity more regularly from this point on. We still have infrastructure work to do to scale horizontally better, but at least we will soon have servers to scale horizontally to, a fairly critical piece of the puzzle, I would say. :)

Oh, and here’s what the office was like a year ago:

Thanks for voting for us at Grooveshark! from ben westermann-clark on Vimeo.

 
 

Technology Stack

06 May

Ever wonder what technology powers Grooveshark?

Well that’s too bad, ’cause I’m going to tell you anyway. Most sites these days run on the LAMP stack (Linux, Apache, MySQL, PHP). Grooveshark runs on the LALMMRSPJG stack, more or less. Don’t try to pronounce that, you’ll only end up hurting yourself.

Linux: (CentOS primarily) for all of our servers except one lone Solaris box, which will be taken out back and shot one of these days, I hope.

Apache: All of our front-end nodes run apache. By front end node I mean everything serving up http traffic except for stream servers. For example, listen.grooveshark.com, www.grooveshark.com, tinysong.com, widgets.grooveshark.com, cowbell.grooveshark.com are all hosted on our front end nodes.

Lighttpd: Affectionally called lighty around here, it’s super efficient at serving up static content, so we use it on all of our stream servers instead of apache.

MySQL: We have several database servers, and they all run MySQL, much to my chagrin. We’d be using PostgreSQL if it had been up to me, but it wasn’t so we stick with MySQL. Now that Drizzle is coming along nicely, we are contemplating eventually moving fully or partially onto Drizzle, meaning our stack would be LALDMRSPJG or LALMDMRSPJG. Not much more pronounceable, I’m afraid.

Memcached: Without memcached, we would certainly not be where we are today. At this point nearly everything runs through memcached, reducing database load significantly and increasing site performance at the same time.

Redis: Redis is a new addition to our stack, but a welcome one. Redis is very similar to memcached in that it’s a key-value store, and it’s almost as fast, but it has the advantage of being disk-backed, so if you have to restart the server, you haven’t lost anything when it comes back up. Where memcached helps us save reads from MySQL, Redis helps us save reads and writes, because we can actually use it to store data that we intend to keep around.

Sphinx: MySQL fulltext indexes are absolutely horrible for search, so instead we use a technology called Sphinx. Sphinx recently got moved off of the front end servers and onto its own server, significantly reducing the load on the front end servers and improving the performance of search. Win-win!

PHP: Most of the code that makes Grooveshark work is written in PHP. All the websites I listed above, including the RPC services. Plenty of people hate PHP out there, including (or especially) those of us who program in it. It definitely has its warts, but it’s a language that is quick to develop in and it performs relatively well if you play to its strengths.

Java: Some of the code running on our servers, especially things that need to maintain state, are written in Java. Things that come to mind are the ad server, and some crazy stuff written in scala for keeping our stream servers in sync.

Gearman: Gearman is an awesome piece of the puzzle that we’re just starting to harness, and it’s going to help us scale out even more in the future. Gearmand is an extremely lightweight job queuing server with support for syncronous and asyncronous jobs. Workers can live on different servers and be written in different languages from clients. Gearman is great for map/reduce jobs or for allowing things that might be slow to be processed in the background without slowing down the user experience. For example, if our ad server needs to display an ad as quickly as possible *and* it needs to log the fact that it displayed the ad, it can fire off an asyncronous gearman job for the logging and get right to work on serving up the ad. Even if the logging portion is running incredibly slowly, nothing front-facing has to wait on it.
We have a super secret feature launching in about two weeks that would essentially not be possible without Gearman (and Redis). I’ll update in a couple of weeks to explain how Gearman makes it possible, once I can talk about what it is. :)

Please note that this list only includes the backend of the stack. We also have front-end clients written in HTML+JS, Flash+Flex, J2ME, Java, Objective C and some others on the way. It also doesn’t yet include Cassandra, but I’m hoping we can add that soon.

 
 

Bypassing Magic

18 Aug

In my post about how we are adding client-side caching to Grooveshark 2.0, I mentioned one of the ways we are taking advantage of the fact that thanks to using Flash, we have a full-blown stateful application.

As Grooveshark evolves and the application becomes more sophisticated, the PHP layer is more and more becoming just an interface to the DB. The application just needs the data; it knows exactly what to do with it from there. It also only needs to ask for one type of data at a time, whereas a traditional webpage would need to load dozens of different pieces of information at the same time. So for our type of application, magic methods and ORM can really just get in the way when all we really need is to run a query, fill up an array with the results of that query, and return it.

Our old libraries employing ORM, magic methods and collections, were designed to meet the needs of a typical website and don’t necessarily make sense for a full-fledged application. On a webpage, you might only show 20 results at a time, so the overhead of having a bunch of getters and setters automatically fire whenever you load up your data is probably not noticeable. But in an application, you often load far more results than can be displayed, and allow the user to interact with them more richly. When you’re loading 500, or 5000 results as opposed to 20, the overhead of ORM and magic can start to really bog you down. I first noticed the overhead issue when testing new method calls for lite2, when in some cases fetching the data would take over 30 seconds, triggering my locally defined maximum execution time, even when the data was already cached.

Like any responsible developer considering making changes to code for performance reasons, I profiled our collections code using XDebug and KCachegrind, and then I rewrote the code to bypass collections, magic and all that stuff, loading data from the DB (or memcache) into an array and returning it. The difference? In the worst case, bypassing magic was an order of magnitude less work, often times far better than that. My > 30 second example took less than 1 second in the new code.

For Grooveshark 2.0 code, wherever makes sense, we are bypassing magic, ORM and collections and loading data directly. This of course means that Grooveshark is faster, but it also means that we can load more data at once. In most cases we can now afford to load up entire lists of songs without having to paginate the results, which in turn means fewer calls to the backend *and* much less work for the database. Whenever you must LIMIT results, you must also ORDER BY the results so they come back in an order that makes sense. Not having to ORDER results means in many cases we save an expensive filesort which often requires a temporary table in MySQL. Returning the full data set also allows the client to do more with the data, like decide how the results should actually be sorted and displayed to the user. But that’s another post…

 
 

Client-side Caching

15 Aug

One of the exciting features coming to Grooveshark 2.0 (yes, VIP only at first) starting on the 24th is client-side caching.

What does that mean?
We are finally taking advantage of the fact that because we use flash, we have a stateful application capable of remembering even dynamic data. In other words, if the client already knows something, it doesn’t have to ask the server again.

For our users this is exciting because navigating back and forth on pages they have already seen should be almost instantaneous. For the backend it’s exciting because the client now acts as another layer of cache in front of the database, and can keep data in memory even when it must be flushed from memcache. Example:
When a user edits their playlist, that data is deleted from memcache (not overwritten due to potential race conditions). The next time that playlist data is requested, it must be loaded from the database. In 2.0, the chances of the client asking for that playlist data again are very slim, because in most cases it will remain in memory, so we should still be saved a round trip to the database in most cases.

In the slightly longer term (before Grooveshark 2.0 is available to the public), we will be taking advantage of Flash LSOs to even remember certain data between reloads, so things like browsing your library will be lightning fast and won’t require loading data from the server at all, unless you changed your library from another computer.

Client-side caching is just one of the many ways in which we are working to improve the user experience while hopefully reducing load on our servers, and I will be posting more details over the next few days, so stay tuned.

 
 

2.0 Screenshots

14 Aug

So far the VIP launch has been a resounding success, but as I mentioned previously, the real gravy for VIP users is yet to come: the ability to try out 2.0 first.

The official date that 2.0 becomes available for VIP users is the 24th. Here’s some of what they have to look forward to:

Select from themes
Choose from cool themes. My favorite is Greenify.

New home screen
Here you can see the home screen, redesigned player and brand new sidebar in action. You’ll notice the little dot on the player. Yes, you can click and drag that to skip ahead and back while a song is playing (finally)!

Song list view
Lists of songs are redesigned to be faster and more usable at the same time. No more slidey panels and awkward navigation heirarchies! Also notice that in 2.0 you can favorite artists and albums in addition to songs.