RSS
 

Archive for the ‘grooveshark’ Category

Client-side Caching

15 Aug

One of the exciting features coming to Grooveshark 2.0 (yes, VIP only at first) starting on the 24th is client-side caching.

What does that mean?
We are finally taking advantage of the fact that because we use flash, we have a stateful application capable of remembering even dynamic data. In other words, if the client already knows something, it doesn’t have to ask the server again.

For our users this is exciting because navigating back and forth on pages they have already seen should be almost instantaneous. For the backend it’s exciting because the client now acts as another layer of cache in front of the database, and can keep data in memory even when it must be flushed from memcache. Example:
When a user edits their playlist, that data is deleted from memcache (not overwritten due to potential race conditions). The next time that playlist data is requested, it must be loaded from the database. In 2.0, the chances of the client asking for that playlist data again are very slim, because in most cases it will remain in memory, so we should still be saved a round trip to the database in most cases.

In the slightly longer term (before Grooveshark 2.0 is available to the public), we will be taking advantage of Flash LSOs to even remember certain data between reloads, so things like browsing your library will be lightning fast and won’t require loading data from the server at all, unless you changed your library from another computer.

Client-side caching is just one of the many ways in which we are working to improve the user experience while hopefully reducing load on our servers, and I will be posting more details over the next few days, so stay tuned.

 
 

2.0 Screenshots

14 Aug

So far the VIP launch has been a resounding success, but as I mentioned previously, the real gravy for VIP users is yet to come: the ability to try out 2.0 first.

The official date that 2.0 becomes available for VIP users is the 24th. Here’s some of what they have to look forward to:

Select from themes
Choose from cool themes. My favorite is Greenify.

New home screen
Here you can see the home screen, redesigned player and brand new sidebar in action. You’ll notice the little dot on the player. Yes, you can click and drag that to skip ahead and back while a song is playing (finally)!

Song list view
Lists of songs are redesigned to be faster and more usable at the same time. No more slidey panels and awkward navigation heirarchies! Also notice that in 2.0 you can favorite artists and albums in addition to songs.

 
 

VIP live!

12 Aug

Grooveshark has launched VIP subscriptions.

Sign up for $3/month or $30/year.
In exchange you get:
-No ads ever (and more screen real estate because of it)
-Access to the latest and greatest features before anyone else.
-More to be announced later
-Locked in at the low rate. If the price goes up, your cost doesn’t.
-A badge next to your name that says you’re VIP.

On the 24th we will be launching a beta version of 2.0 to our VIP users, and trust me, that’s where the real value lies.

Coverage:
Grooveshark Launches Subscription VIP Service

Coverage:

 
 

You don’t know when you’re done if you don’t know what done is

08 Aug

As I mentioned in an earlier blog post, we are working hard on a 2.0 version of the product. One of the questions our team is asked quite frequently is “when will it be ready?” This question is impossible for us to answer, because ready is not defined. It’s one of the dangers of working without a real spec.

There are a lot reasons why we don’t use a real spec, none of which are up to me so I’ll not discuss them here.

Whenever 2.0 gets close to what everyone thinks we have agreed on, people look and poke at it, decide they don’t like things or realize nobody ever asked for critical feature x, or it somehow didn’t make it onto our bug list, and then we have to go back and get new designs, file a bunch of bugs, and set a new milestone. Repeat, repeat, repeat. It’s not just for 2.0 that it works this way, it’s whenever you’re working without a complete spec. When the goal posts for done are constantly moving, the question of “when will it be done?” Is really a question of “when will the goal posts stop moving?”

To break the cycle, we’re picking a done date, and mandating that the goal posts stop moving some time before that date. Working towards that, we’ve submitted our “last chance” milestone, meaning after this milestone any decisions/changes/designs must be final, because we’re going to call it done when those have been implemented, or when we hit our chosen date, whichever comes first.

 

Coming soon to Grooveshark

25 Jul

We’ve been very hard at work here at Grooveshark, working on version 2.0, *and* a new feature that will allow some users to use the site without seeing ads, ever, and the opportunity to try version 2.0 before anyone else.

I can’t say exactly when it will be, but expect more updates soon.

 
 

500,000 Users and scaling

29 May

Grooveshark surpassed the 500,000 registered user mark today.
Ignoring the fact that many of our users never bother to register (it’s not necessary in order to use the site), 500k is an absolutely phenomenal number, especially compared to where we were just a year ago: 33k. The scary thing is that under our current growth rate, we will have over a million registered users in roughly 3 months.

Can we double our capacity in just 3 months? Obviously, history implies that it’s possible; we’ve already done much more than that. In fact we’ve done better than that: with little change in infrastructure and much of the same server capacity we’ve managed to make Grooveshark faster and scale at the same time.

On the other hand, much of the low-hanging scalability fruit has been picked now. We use memcached extensively, use a master/slave DB configuration with a data warehouse for logging or writes that don’t need to be processed in real time, and have begun doing some rudimentary sharding for stream-related activities.

What’s left? Well, we aren’t yet at the point where we can scale linearly simply by adding more servers, except probably for streaming servers. For that we need more sharding, primarily. There are still some SQL optimizations that can be made, like bringing session ids down to 16 bytes from 32 (32 on disk and 96 in memory, thanks to utf8) and ultimately getting them out of the database altogether, and using memcached even more heavily, but really all of those things only buy us time. Not that there is anything wrong with buying time, because we also need time to work on new features like last.fm scrobbling, a super-secret redesign, launching on half a dozen mobile platforms, etc., all with a relatively small dev team, but ultimately there are some fundamental architecture changes coming, and if we’re going to keep doubling our number of users every 3 months, it’s going to have to be very soon.

Update: A lot of people have been looking at this post as evidence that we are working on scrobbling support. I should point out that scrobbling support now exists for VIP users: Enable it here.

 
 

Microsoft + SeeqPod?

09 May

There’s a rumor floating around that Microsoft has bought Seeqpod, mainly fueled it seems by the fact that they have a link to Microsoft live search on their home page.

I may regret saying this, but I think that link is a red herring. Microsoft is the last company I would expect to have an interest in SeeqPod, unless their search technology is incredibly impressive and Microsoft intends to apply it to other forms of search. A possibility, but it seems pretty slim. Besides being a bad fit in terms of corporate culture, SeeqPod is probably under an NDA and would most likely be in big trouble for leaking that sort of information early.

If Microsoft is buying SeeqPod for their search technology, don’t expect to see the free streaming service re-launched after the acquisition, at least not until Microsoft has signed deals with the majors, which as we know is a lengthy and expensive progress. Of course Microsoft can afford it, but can they profit from it?

In the meantime, Grooveshark is still running, still growing, and we have an API as well, for all those developers left out in the cold after SeeqPod shut down.

 
 

Jay does some front-end work

20 Apr

It’s no secret that I am deeply entrenched in the back-end world. I can optimize the hell out of some queries, and write some pretty complex php, but when it comes to html, css and javascript, I am a barbarian. In fact, I stopped learning about html in the 90’s, before css and back when javascript was pretty useless. I just don’t care for working on visual stuff, especially if it’s going to be finicky and inconsistent on different platforms.

Anyway, a longstanding complaint I’ve had when using Tinysong is that I can’t listen to a song before I share it. How do I make sure it’s the song I’m thinking of? Sure, I could copy the URL and paste it in the location bar and load up lite, but by then my other song options are gone, not to mention that I’m lazy.

I finally decided to take a stab at adding playback to Tinysong myself, and I’m happy to report that everything seems to be working quite well. I can’t take credit for the whole thing, or even most of it. Katy wrote the streaming code, and Chanel made some beautiful javascript wrappers for the whole thing, both for other projects. I simply took what they did and wrote the css and javascript to glue it to Tinysong. I’m happy to report that it seems to be working quite well, and I hope I’m not the only one who appreciates this enhancement. Give it a try and let me know what you think.

 
 

Detect crawlers with PHP faster

08 Apr

At Grooveshark we use DB-based php sessions so they can be accessed across multiple front-end nodes. As you would expect, the sessions table is very “hot,” as just about every request to do anything, ever, requires using a session. We noticed that web crawlers like google end up creating tens of thousands of sessions every day, because they of course do not carry cookies around with them.

The solution? Add a way to detect crawlers, and don’t give them sessions. Most of the solutions I’ve seen online look something like this:

function crawlerDetect($USER_AGENT)
{
$crawlers = array(
array('Google', 'Google'),
array('msnbot', 'MSN'),
array('Rambler', 'Rambler'),
array('Yahoo', 'Yahoo'),
array('AbachoBOT', 'AbachoBOT'),
array('accoona', 'Accoona'),
array('AcoiRobot', 'AcoiRobot'),
array('ASPSeek', 'ASPSeek'),
array('CrocCrawler', 'CrocCrawler'),
array('Dumbot', 'Dumbot'),
array('FAST-WebCrawler', 'FAST-WebCrawler'),
array('GeonaBot', 'GeonaBot'),
array('Gigabot', 'Gigabot'),
array('Lycos', 'Lycos spider'),
array('MSRBOT', 'MSRBOT'),
array('Scooter', 'Altavista robot'),
array('AltaVista', 'Altavista robot'),
array('IDBot', 'ID-Search Bot'),
array('eStyle', 'eStyle Bot'),
array('Scrubby', 'Scrubby robot')
);
foreach ($crawler as $c) {
if (stristr($USER_AGENT, $c[0])) {
return($c[1]);
}
}
return false;
}

Essentially, doing a for loop over the entire list of possible clients, and searching the user agent string for each one, one at a time. This seems way too slow and inefficient for something that is going to have to run on essentially every call on a high volume website, so I rewrote it to look like this:
public static function getIsCrawler($userAgent)
{
$crawlers = 'Google|msnbot|Rambler|Yahoo|AbachoBOT|accoona|' .
'AcioRobot|ASPSeek|CocoCrawler|Dumbot|FAST-WebCrawler|' .
'GeonaBot|Gigabot|Lycos|MSRBOT|Scooter|AltaVista|IDBot|eStyle|Scrubby';
$isCrawler = (preg_match("/$crawlers/", $userAgent) > 0);
return $isCrawler;
}

In my not-very-scientific testing, running on my local box my version takes 11 seconds to do 1 million comparisons, whereas looping through an array of crawlers to do 1 million comparisons takes 70 seconds. So there you have it, using a single regex for string matching rather than looping over an array can be 7 times faster. I suspect, but have not tested, that the performance gap gets bigger the more strings you are testing against.

 

A reason to upgrade

24 Mar

I use Flash Player 9 at work, and until recently I also used it at home. Version 10 is out, but most people still have 9, so for testing purposes I wanted to stick with 9, especially since Katy has switched to 10. Someone has to catch old bugs!

Anyway my Flash Player accidentally got upgraded to 10 at home, and I recently discovered this delicious Tom Waits song: Invitation to the Blues.

Alas, when I tried to play it at work, it sounded like a very sad, drunken donkey was wailing at me. That is usually indicative of a sample rate issue (Flash is very picky about sample rates), but close inspection of the file hasn’t turned up anything wrong with it, including sample rate. If I want to listen to that song at work, I’ll have to upgrade to Flash 10.

Tom Waits, or reliability? Hmm, a tough choice indeed.