SQL Performance

25 Jan

Skyler’s post about triggers briefly mentioned denormalization to increase SQL performance. Denormalization is just one of the many techniques we have had to resort to in order to keep SQL performance up to snuff.

The funny thing about SQL is that queries perform vastly differently depending on the size of your dataset. Often times the performance is completely non-linear. For example, a query that is blazingly fast when working on 4 million rows of data, might be unbelievably slow when working on 6 million rows. Paradoxically, you might also have a query that is slower than the first query on 4 million rows, but faster than that one on 6 million rows.

From the time I started working at Grooveshark just a couple of months ago until very recently, the number of users and songs on our system has grown fairly steadily, but the performance of our site due to SQL overhead became exponentially worse to the point where some pages were suddenly taking 30 seconds to load. The strange part was that it seemed to happen virtually overnight.

Part of the problem is that we all work on a development copy of the database when we’re experimenting with queries so’s not to risk breaking anything on production, but since our dev box has much more modest specifications than our production server, we work with a much smaller data set, so we don’t always get a good picture of how our queries are going to perform “out in the wild.”

Methods we have used to tweak the performance of our queries range from the aforementioned denormalization techniques to writing and rewriting the same queries over and over again until we find something that works (it’s often hard to predict how queries will perform ahead of time, and mySQL’s EXPLAIN often leaves much to be desired), to making sure proper indexes exist for everything and that they are being used properly (sometimes mySQL guesses wrong about which index is best) to switching certain tables to InnoDB to prevent table deadlock issues that started popping up.

It took us about a week to get everything sorted out and optimized, but it really paid off. I just opened up the page that had been taking 30 seconds to load, and it took 6 seconds, about .5 of which can be attributed to php + mySQL execution. That is a very impressive increase in performance for running on the exact same hardware with no load balancing to speak of. There are certainly more performance improvements to be had and we will soon be making significant improvements to the way our PHP utilizes mySQL. Even so, I’m sure this isn’t the last speed bump we will run into, but at least now we have a pretty good idea of what we need to do to get things running smoothly again.

1 Comment

Posted in SQL


Leave a Reply


  1. Quang

    January 28, 2008 at 2:38 am

    Ah, sexy blog indeed! I’ll be sure to add the comment feature to my tumblr so you can express your hatred for tumblr there :)

    Random observation: your 2 RSS icons give your right column rabbit ears.