Optimize for concurrency, not throughput

06 Feb

When it comes to disks, raid and otherwise, there’s a lot of confusion about how to optimize for different server configurations, and unfortunately much more emphasis seems to be placed on sustained throughput, with little mention going to IOPS or concurrency.

Here are grooveshark we use Sun’s X4500, aka “Thumper” for housing your MP3s. Lately we’ve been hitting some buffering issues during peak hours, where it seemed like the 4500 was not able to keep up with the requests coming in. Although we are growing rapidly, we should still be nowhere near saturating the IOPS of 48 drives, but iostat -Cx was showing that the drives dedicated to serving up content had transactions waiting 100% of the time, and the disks were busy 100% of the time. Service time was in the triple digits. Insanity. Something was obviously misconfigured.

We were using 3 zpools for our content, each configured in raidz with about 8 drives (if I remember correctly). Ok, so we actually only have the IOPS of 24 drives in that configuration, but we still should not be anywhere near saturating that. After a fair amount of digging, I discovered that raidz is probably the worst configuration possible for serving up thousands of 3-5mb files concurrently. That’s because raidz causes every drive to engage in every read request, no matter how small the file. That means your theoretical transfer rate for any given file is N times the transfer rate of each drive, but your IOPS is the same as one drive, no matter how many drives you have in your pool. In other words, adding more drives to a raidz increases disk transfer bandwidth, but does nothing to alleviate the overhead associated with random access seeks. This configuration is ideal if you are transferring multi-gigabyte files frequently, but at 3.5-5MB, seek time makes up almost all of the overhead, and it would be much better to have N drives all seeking to N files simultaneously.

I brought up these concerns to our sysadmins, and they set up a small mirrored pool to handle our most popular content, and the performance difference is quite astounding even with just two disks in that pool. In a mirrored configuration like this, each drive can respond to requests for data separately because they each have the full set of data. Adding 2 more drives to the 24 drives already in production nearly doubled our IOPS capacity, because we went from the IOPS of 3 drives to the IOPS of 5, nearly doubling our capacity with only two hard drives. Our system admins will be adding more disks to the mirror to give us even more breathing room. We’ll have to wait and see tomorrow if the added IOPS capacity eliminates the buffering issues that users have been running into lately, but I bet it will.


Leave a Reply