RSS
 

Denormalizing MySQL

05 Mar

Some of you may have noticed that Grooveshark is a bit sluggish most of the time and an be extremely slow when doing certain things (searching, loading the discover page, looking at large libraries, etc.)

The biggest source of this slowdown comes from three major, related factors:

  1. We need to check whether each song is offline before displaying it.
  2. There are several different ways a song can be off- or online.
  3. Our schema is highly normalized.

The upshot of all this is that our queries to get song info, including online status of a song, require LEFT JOINing across multiple tables, onf of which a the time of this writing has over 10 million rows, another about 8 million.

As part of the major backend overhaul currently in process code named file/song, we have decided to de-normalize information about online status so that it is contained within the Songs table and updated by triggers.

The trick to keeping this fast for inserts and updates where these triggers are fired is to keep a running total for the number of files that are online for any given song, in a field called NumOnline, rather than re-calculating that value for every insert/update. To start out, we populate NumOnline for every single row in the Songs table by using a huge nasty query that aggregates the online statuses using LEFT JOINs and all that fun stuff that currently has to happen every time we want to display a song.

It takes a while to fill in that information (about 6 minutes) but once it’s there, all the hard work has been done. When a file goes on- or offline, we merely need to increment or decrement NumOnline – we can safely skip doing the long counts of everything as long as we make sure to handle every action that might cause that number to change.

That’s not quite as easy as it sounds, of course. Let’s look at the list of ways that number can change:

  • File is added to or removed from a user’s library.
  • User goes offline or comes online
  • User connects/disconnects external drive
  • File is added to the cache
  • Two songs are merged by the editing system

Each of these cases is a different update to a different table or field and we must be certain to handle each one correctly, but if we do that we gain the major advantage of being able to eliminate 3-4 LEFT JOINs from our Song-related queries and these SELECTS are blazing fast again. Definitely worth it.

 
2 Comments

Posted in SQL

 
  1. chanel

    March 5, 2008 at 1:36 pm

    ya, you database guys are always slowing down the site! in all seriousness just looking through all of those sql queries really makes me glad I don’t have to worry about all that stuff.

     
  2. Jay

    March 6, 2008 at 1:19 am

    Ha, well I’m equally glad I don’t have to mess with all that Javascript stuff!