RSS
 

Archive for March, 2008

Let me drive

12 Mar

One odd quirk that I have noticed about myself as I have been building a reputation as an SQL guru at Grooveshark, and therefore being regularly pulled aside to look at queries, is that I have a hard time thinking about an SQL query when I’m just looking at it over someone’s shoulder. It’s even harder for me to think about how I would change the query.

For some reason, I need to be in the proverbial driver’s seat. Let me sit down in front of the screen, give me a gui text editor that I can use easily (vim does not qualify), and my brain is prepared to evaluate the problem. I may not even need to type anything out in the process of solving the problem, but the brain juice just won’t even start to flow if I don’t have that.

I seem to have that problem much less when looking at PHP (although I prefer to look at the code in my IDE of choice), and I’m not entirely sure why that is. The only thing I can think of is that maybe SQL requires a higher level of abstract thought than PHP does most of the time, so I am more dependent on having the right set up before I can get into the right mode?

Do any other coders out there have this problem?

‡ ‘an’ is the correct usage here because I expect you to read that as S Q L, not Sequel.

 
1 Comment

Posted in Coding

 

Hot Cache

11 Mar

One of the worst things about restarting a MySQL server is losing the cache. Your whole site might be slow for hours and hours before MySQL has a good cache and is loading data from memory instead of disc. How can you quickly get your caches hot, or at least lukewarm? Well, if you know which tables are most important, you can ask MySQL to load them up. How? By taking advantage of MySQL’s terrible query optimizer:
SELECT 1 FROM (SELECT * FROM ImportantTable) x LIMIT 1;
Any intelligent or sane optimizer would look at that query and at the very lease decide that it only needs the first row from ImportantTable, or preferably decide that it doesn’t need to select anything from ImportantTable at all. But MySQL’s query optimizer is neither intelligent nor apparently very sane, so it actually reads every row from ImportantTable, and hopefully sticks it in cache.

In our case we have a table with 10 million rows that nearly everything runs through. The explain shows the results:

EXPLAIN SELECT 1 FROM (SELECT * FROM UsersSongs) x LIMIT 1;
+----+-------------+------------+------+---------------+------+---------+------+----------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+----------+-------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 10111671 | |
| 2 | DERIVED | UsersSongs | ALL | NULL | NULL | NULL | NULL | 10149257 | |
+----+-------------+------------+------+---------------+------+---------+------+----------+-------+

Note: if you wanted to be even more slick, you could craft your inner SELECT to load indexes into memory, which are even more helpful than having actual rows in memory, of course. So for example:
SELECT 1 FROM (SELECT UserID, SongID FROM UsersSongs) x LIMIT 1; assuming UsersSongs has a multi-column index consisting of UserID and SongID, of course.

 
1 Comment

Posted in SQL

 

It’s all about branding

10 Mar

Nearly every successful company, certainly every incredibly successful company, has a branding strategy. Each of these companies tries very hard to associate themselves with one product, one idea, one message. Everything else they do is extra.

For Starbucks, it’s quality coffee, and the feeling of belonging (being the ‘third place’ as they call it). Baristas are supposed to make an effort to learn your name, know your drink, and even chat you up. They also build that sense of belonging by having their own lingo. If you go to a ‘real’ coffee shop after patronizing Starbucks for a while, you will probably be ostracized by the employees there; they’ll recognize you as a Starbucks person as soon as you place your order, and they might even correct you so that you understand that “what Starbucks calls a macchiato isn’t a real macchiato.” Are you going to go back there next time, or are you going to return to Starbucks, where they understand you and are nice to you? Starbucks wants so much to be associated with quality coffee that they closed all of their stores for two hours nationwide, to re-train all of their employees. If you think that was anything other than a PR stunt, think again. It would have been not only easier but certainly less expensive in terms of lost business for them to offer a rotation of training sessions so some employees could always be working while others were receiving training, but the press and image that Starbucks gained from that stunt more than paid for itself.

For Apple, they try hard to associate themselves with being hip, smart and having intuitive user interfaces. Their tactics are, in my opinion, more obvious than Starbucks’ because you can tell that they are trying hard to be hip and look smart just by watching their commercials — but it still works. By making their gadgets sleek and not clunky, some would even say ‘sexy,’ and repeating the word intuitive enough times, users have bought the idea that Apple’s user interface designs are intuitive to use, even when they are not. The interface may have a shallow learning curve, but that’s not the same thing as being intuitive — but it doesn’t matter.

For YouTube, they have obviously become synonymous with online video sharing. They aren’t even completely legal, but they make video sharing incredibly easy, and it’s obvious that that’s where they focus most of their energy, because the rest of YouTube is pretty crappy, but it doesn’t matter because when people think of online video, they think of YouTube, and that’s exactly where they need to be. The same applies for Google and search, Flickr and photo sharing, etc.

At Grooveshark, we obviously need to become synonymous with online music. We want to be your one-stop-shop for everything music related, and when you think music we want you to think of Grooveshark. The first hurdle we face is the name: it’s not very memorable. When I first heard of Grooveshark, I later tried to look it up as “Media shark” and then when that didn’t work, I tried “Music shark.” The next hurdle is the competition: there are dozens of other sites with more resources than us trying very hard to become synonymous with music, too. I believe we still have a chance, because we have a stellar team and a growing resource pool, but we need to stay on track and focus on the music before we get carried away with the “extras” such as the really social-networking heavy stuff. We have to remember our brand and concentrate our effort on that part of the site first and foremost. If we do that, we’ll make it.

This long lecture is my roundabout way of announcing that I have changed the url of my blog from http://wanderr.com/grooveshark/ to http://wanderr.com/jay/. This is of little consequence to most of you because I have my server set up to automatically 301 redirect requests from the old URL to the new one for the foreseeable future. As you should be able to guess by now, I’m doing this for the sake of branding. The insights and ramblings that I share here are more for the Jay Paroline:developer brand than the Grooveshark:online music brand. I started with wanderr.com/grooveshark/ in the hopes that I could coerce my colleagues to all use that as a blogging platform, but of course we are all too independently minded for that, so we all have our own blogs. I think that actually works out better because now we can compete to see who gets the most blog hits. Right now I’m in 2nd place because Travis has been blogging much longer and he writes about Vim, one of the least intuitive tools ever created (too bad it’s so darn useful!), so there’s plenty to write about and plenty of people searching for help with the thing, and quite honestly his blog is a valuable resource in that regard. Nevertheless, I am catching up!

In closing: Now would be a great time to update your bookmarks. The longer you wait, the more likely you are to forget; Especially if you are as forgetful as I am. Losing your memory is not a fun experience, trust me.

 
 

Real-time HTML Editor

07 Mar

Have you ever wanted to try out a little piece of HTML without having to go through all the trouble of creating a file, saving it, and then opening that file?

Way back in the day certain browsers used to allow you to type in HTML in the URL field like about:this text should be bold, but I don’t think that works with any modern browsers.

Fortunately there is something that works even better: the Real-time HTML Editor. As you type HTML into the top pane, it displays in the lower pane.

I found a great use for it when showing Katy an embeddable widget from another site. We couldn’t see what the embeddable widget actually looked like without embedding it somewhere, but we didn’t actually want to embed it anywhere, so I went to the real-time editor and pasted in the code. Very handy.

 
 

Airborne Settles Lawsuit

07 Mar

PSA: Airborne has settled a $23 million class-action lawsuit. If you have bought Airborne any time before 2008 you qualify. You don’t have to have receipts, just a general idea of when you bought it and which version of the product you purchased. If you purchased more than 6 and wish to be reimbursed for each of them, you’ll have to provide receipts.

It just so happens that Grooveshark has stocked the cupboards with Airborne several times when we were all getting sick. It’s probably enough to be worth someone’s time here to file the claim.

 
 

Denormalizing MySQL

05 Mar

Some of you may have noticed that Grooveshark is a bit sluggish most of the time and an be extremely slow when doing certain things (searching, loading the discover page, looking at large libraries, etc.)

The biggest source of this slowdown comes from three major, related factors:

  1. We need to check whether each song is offline before displaying it.
  2. There are several different ways a song can be off- or online.
  3. Our schema is highly normalized.

The upshot of all this is that our queries to get song info, including online status of a song, require LEFT JOINing across multiple tables, onf of which a the time of this writing has over 10 million rows, another about 8 million.

As part of the major backend overhaul currently in process code named file/song, we have decided to de-normalize information about online status so that it is contained within the Songs table and updated by triggers.

The trick to keeping this fast for inserts and updates where these triggers are fired is to keep a running total for the number of files that are online for any given song, in a field called NumOnline, rather than re-calculating that value for every insert/update. To start out, we populate NumOnline for every single row in the Songs table by using a huge nasty query that aggregates the online statuses using LEFT JOINs and all that fun stuff that currently has to happen every time we want to display a song.

It takes a while to fill in that information (about 6 minutes) but once it’s there, all the hard work has been done. When a file goes on- or offline, we merely need to increment or decrement NumOnline – we can safely skip doing the long counts of everything as long as we make sure to handle every action that might cause that number to change.

That’s not quite as easy as it sounds, of course. Let’s look at the list of ways that number can change:

  • File is added to or removed from a user’s library.
  • User goes offline or comes online
  • User connects/disconnects external drive
  • File is added to the cache
  • Two songs are merged by the editing system

Each of these cases is a different update to a different table or field and we must be certain to handle each one correctly, but if we do that we gain the major advantage of being able to eliminate 3-4 LEFT JOINs from our Song-related queries and these SELECTS are blazing fast again. Definitely worth it.

 
2 Comments

Posted in SQL

 

New and improved Sharkbyte

05 Mar

Apparently I was not the only one who was very unhappy with the way Sharkbyte handles updates, so the Sharkbyte team has implemented a way to make sure that users always have the latest version.

It’s not perfect, but it’s a very good start.

The complaints I still have with it are relatively minor compared to silent failure, but will hopefully be addressed soon anyway:

  1. On my machine the alert dialog popped under my open windows, so I didn’t see it for a couple of hours.
  2. When the user does finally see the dialog, it’s not actually clear that it came from Sharkbyte.

I actually prefer that point 1 not interrupt my work by popping up a dialog on top of my other windows, but it would be nice if it popped up if I was trying to listen to music with an outdated version of Sharkbyte, or if it at least flashed in the taskbar to get my attention.

Complaints aside, it seems like everyone at Grooveshark is coming to realize that the user experience is paramount and that is something that I am very pleased about.

To paraphrase Skyler (since I don’t remember exactly what he said) today:

We have to remember the user; we might spend all day wrestling with SQL queries, but we have to remember that’s not what Grooveshark is about. We have to remember the users and make sure that we are making the site better for them with everything that we are doing.

Apologies to Skyler if I completely got that quote wrong. Feel free to correct me.

 
1 Comment

Posted in management

 

Search is better

04 Mar

We at Grooveshark are right in the middle of a major overhaul of the backend. As such we are all pretty loathe to work on the current trunk right now. But when things need fixing, things need fixing. That’s why Travis and I recently spent a couple of days getting search working better.

As it stands now, search results make a lot more sense than they used to, and they return more results. There was a bug relating to how insanely complicated it is to tell whether a song is actually online (more about that tomorrow), and we weren’t performing all of the checks correctly, so sometimes we were hiding a song that was not actually offline.

We had also apparently temporarily forgotten that ORDER BY field orders ascending, as in lowest to highest, so when we ordered by score, we were taking the worst results, rather than the best. No wonder our results were just downright silly sometimes.

Finally, we have completely eliminated stopwords from our fulltext indexes, and we have set our minimum word length to 1, so if you search for U2, that won’t be too short, and if you search for A Static Lullaby, the A will not be ignored.

Things are still not remotely perfect; those extra checks we weren’t doing make search slower, as do the bigger fulltext indexes; sometimes it takes 30 seconds to come back with anything, and sometimes results still don’t make the most sense ever because of the way we build our search tables. These things will all be fixed too, I promise. In the meantime, be patient and enjoy your more-correct results.

P.S. Priority #1 after file/song is done, is overhauling search completely so that it is both fast and accurate. Stay tuned.

 
 

How to keep your users happy

01 Mar

As a sort-of follow-up to the negative example how not to keep your users happy, I’d like to provide what I feel is a positive example of how to keep your users happy.

Before I delve into today’s example, if you haven’t read Joel Spelosky’s seven steps to remarkable customer service I command that you do so now. I command it! Back? Ok, good, now you should understand why good customer service is so very important.

Today, Friday night I received an alert that a customer had a problem adding funds to their account through PayPal. It’s worth noting that I did not receive a complaint from the user. Users generally do not complain about problems with a web site; they just leave. I set up the PayPal system so that all developers would receive an email when there is an error, so that someone, anyone, whoever is in the office, will know to look into it or at least call someone who can, as soon as possible. If a user wants to give us money, gosh darn it, they’d better be able to. So when I received this alert, I ran a test charge through PayPal and was able to add funds to my own account. I looked up the error message and found no meaningful explanation for the cause. Other developers had posed on the paypal forums about this error message with no resolution to their problems.

At this point I had two problems: I needed more information to make sure the problem was resolved on our end so this doesn’t happen to other users, and I needed to make sure that we were not losing a customer as a result of this error, so I looked up his information and emailed him. I apologized for the problems he was having, admitted up front that I had never seen such an error before and asked if he could give me any additional information about what might have caused the problem. He wrote back promptly and explained that he had forgotten his paypal password and had made a couple of failed login attempts before requesting a password reset from PayPal. He finally got that straightened out and authorized the payment through PayPal when we got the error. I thanked him for the explanation, promised that we would be in contact with PayPal to resolve the issue and make sure that it doesn’t happen again, and added $10 (the amount he originally tried to PayPal) to his Grooveshark account. I suggested that if he wanted to help further diagnose the problem, he could try adding funds via PayPal again, now that he knows his password. He did so, and it worked like a charm.

Now instead of a disgruntled user who couldn’t buy songs from us and probably wouldn’t have bothered trying again, we have a happy customer with $20 to spend at Grooveshark, we received the same revenue that we would have if the original payment had gone through, and we got the information we needed to work with PayPal to get this issue resolved (hopefully). Everybody wins. Now yes we will have to pay royalties on all $20 of the dollars that he spends on music, plus the portion that goes to other users who he is downloading the music from, but compared to the cost of getting new customers, that’s actually a very good deal for us.