As you can see, I’m too busy to write down everything that’s floating in my head this week. Hidden on this blog somewhere is a cryptic clue as to why that is.
In the meantime, keep busy with this great rant from Chanel: I Want My Music.
As you can see, I’m too busy to write down everything that’s floating in my head this week. Hidden on this blog somewhere is a cryptic clue as to why that is.
In the meantime, keep busy with this great rant from Chanel: I Want My Music.
I noticed that someone found my blog by searching for “grooveshark peer offline” recently. Apparently that error message is not clear enough, so I’ll try to explain some of the causes of that message. This may not be an all-inclusive list because I’m not directly involved with a lot of the things that cause this problem, but common causes for that message include:
We showed the file as being online even though it wasn’t really (that shouldn’t be happening anymore; it’s the only part I have control over).
The peer said it had the file, but when we tried to access it, the peer said it didn’t have the file.
The peer was flagged as being online, but when we tried to connect to it via the supernode it’s supposed to be connected to, the supernode said that the peer was not connected.
The peer refused the connection (usually happens when people have firewalls and such turned on).
The NAT type of the peer was not compatible with your NAT type. Unfortunately because of the style of communication used by the clients, it’s possible for two peers to both be online but unable to communicate with each other due to incompatible NAT types. There’s no easy fix for this, but eventually we will enhance the site so that users with incompatible NAT types show up as being offline.
If you are getting lots of “peer offline” error messages, then there might be a more serious problem. Contact me and I will get you in touch with the people who can diagnose and solve your problem.
Right now, Grooveshark is slow. There’s no getting around it; pages take a while to start loading, and then take a while to render. We are in the middle of overhauling the backend which should make a huge difference in the time it takes for a page to start loading, but we PHP/SQL devs can’t do much about render times.
Fortunately, crossover Javascript/PHP developer Chanel has been up to the task and has made major improvements to page rendering times. I can’t even begin to explain why the site can be slow to render, but I know that much of it is Javascript related, and it has to do with the number of elements on the page. But even with Chanel’s genius tricks making the site render 4-5x faster, it still feels sluggish to me.
Well, it happens that the front-end is under an overhaul as well, but I recently discovered something else that makes the site much more pleasant to use: Firefox 3.0b4. One of the improvements listed in the release notes was faster Javascript execution. They certainly weren’t kidding! It makes it hard to go back to using Firefox 2.
So far I have not been terribly impressed with the much-touted memory management improvements. Firefox still uses an amazing amount of memory, 122MB (+104MB of virtual memory) with my Grooveshark library page, Chanel’s blog and this wordpress edit window open. The good news is that Firefox seems to be much more aggressive about wiping out unneeded memory usage, but the bad news is that sometimes Firefox seems to hang while it’s frantically deallocating memory (i.e. when closing a tab).
All in all it’s shaping up to be a nice improvement over Firefox 2, and if they can polish up the memory management to give a consistently smooth user experience, it’s going to be an unbeatable browser.
Development timelines are a contract, in many ways.
Contract negotiation happens when developers sit down with management to hash out a release date for a product or feature. As with any other contract negotiation, both sides come to the table with their own demands, and there are concessions on both sides, but hopefully when the negotiation is over, both parties can be happy with the results. It is also essential that the terms of the contract are clear: if developers and management have different understandings of what feature XYZ entails, they might think they have come to an agreement when they haven’t; this will only lead to problems later, usually altered feature sets or later release dates.
It is important to keep in mind that contracts can be breached by either party, and this is certainly the case when dealing with timelines. If either side fails to hold up its end of the bargain, the timeline will slip and the contract will be broken. It’s obvious how developers can be in breach of this contract, and they are certainly usually the ones held responsible for it, but how can management be at fault? Well, it entirely depends on the negotiation process. If, for example, management assures the team that they will have a certain server in place and ready for production N days before release, and server deployment is delayed, development cannot be held accountable for the schedule slippage. Likewise if the development team asks for feature lockdown and management continues changing the specifications throughout the development cycle.
In an ideal situation, both parties of the contract are working towards the same goal: a product that will make the company more successful. What type of product that is exactly depends on the company, but goals that any team would strive for includes an intuitive interface, a useful feature set, and a bug-free user experience. With a set of shared goals, if both parties are able to hold each other accountable and enforce the terms of the contract, the end result is usually an on time release that both parties can be satisfied with.
In situations that are less ideal, where developers cannot expect management to live up to their terms of the contract, compromises will have to be made somewhere internally. A feature will silently go missing, the product will be poorly tested, or release dates will be moved back.
On 3/3/08, beta.grooveshark.com was down for several hours. It took us a few minutes to figure out what was wrong. PHP logs showed that Auth was crashing on a bind_param error. Specifically, bind_param was complaining that hte number of arguments was different from the number of placeholders, which is really bind_param’s way of saying “something is broken, and I don’t know what.” I skimmed through everything Auth related to see if someone had uploaded a file to the live server recently by hand, bypassing the subversion/snapping proces, but all the timestamps were from when we last snapped, a few days prior.
While I was doing that, Colin thought to check the MySQL error log since the errors were SQL related. Sure enough, MySQL had crashed and restarted itself, but it left many of our MyISAM tables in a corrupted state. I ran REPAIR TABLE on all of the tables listed in the log but the site still wasn’t working properly. I dropped into the shell* and ran myisamchk on all of the MyISAM tables‡ to see which ones were corrupted and to my surprise, some of those tables were ones I had already REPAIRed, so I ran REPAIR TABLE … EXTENDED on each of those and then, finally, the site worked again!
It’s worth noting that all of our InnoDB tables survived the crash completely unharmed. Moral of the story: don’t use MyISAM tables unless you absolutely have to. It’s too bad that MySQL uses MyISAM by default and doesn’t have a single fully-featured storage engine available. As a result, everything needing fulltext indexing will remain on MyISAM for the time being. We still don’t know the exact cause of the crash, but it’s been smooth sailing since we moved all those tables over to InnoDB, knock on wood.
*Handy tip: If you’re in MySQL and you need to drop to the shell, ^Z (CTRL-Z) is a quick and easy way to do so. Once you’ve finished what you need to do in the shell, just run fg, and assuming you haven’t backgrounded any other tasks since dropping to the shell, you’ll be back in MySQL, exactly where you left off.
‡That was the first time I have had to run myisamchk, so on the off chance that you’ve never used it before either, here’s a tip that it took me a few minutes to figure out: run myisamchk on the folder containing your DB files, and give it *.MYI. I initially thought myisamchk would be smart enough to find the DB files — it’s not.
A while ago I wrote about my auto-query generator project. I only just recently got around to finishing it up because other things had higher priority, and also because I wasn’t entirely convinced that I was doing things the bes way, and I wanted to take some time to experiment.
Matt sat down with me and analyzed the problem, and we decided that we could use the schema to create a graph with all of the edges (our ID columns are consistently named in each table), and then use a shortest-path-finding algorithm, and then I could write a SQL generator that works off of the path. Getting all of the IDs in our tables in MySQL is pretty easy:
SELECT DISTINCT COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE COLUMN_NAME LIKE '%ID'
Then getting all the tables for a given ID:
SELECT TABLE_NAME, COLUMN_KEY FROM INFORMATION_SCHEMA.COLUMNS
WHERE COLUMN_NAME = '$id'
AND TABLE_SCHEMA = 'yourschemahere'
From that information, I simply built a graph which I represented with an adjacency map.
Unfortunately, that did not work. The path finding algorithm was basically too good, finding paths that were the shortest, but not necessarily the correct way to get from one table to another. For example, two tables might contain a GenreID, but maybe they are actually linked by ArtistID. Ok, so what about only making an edge when that column is a primary key in one of the nodes representing a table? That wasn’t hard to do, either, but it still gave wrong results in some cases. Sometimes it’s just more efficient (but wrong) to route through the Genres table than go the right way.
I considered making a directed graph so that connections would only be one-way to the table with the ID as a primary key, but I realized that wouldn’t work either, because sometimes you do need to join tables based in IDs that are not primary keys. Essentially, our schema does not completely represent the full complexity of the relationships that it contains.
So I went back to my original method, which was to map out the paths by hand. Tedious though it may have been, it’s still a pretty clever solution, in my opinion.
I created two maps. The first simply says “if you have this ID, and you are trying to get to this table, start at this other table,” for every possible ID, and the next one simply says “if you’re at this table, and you’re trying to get to this other table, here is the next table you need to go through.”
The great thing about this is that most of those steps can be reused, but I only had to create them once. For example, it’s always true that to get from Users to Files you must go through UsersFiles, no matter what your starting point is, although you may be trying to find all of the Songs, Albums or Artists that a user has in their library.
Having spelled things out this way, there is no guesswork for a path finding algorithm, because there is literally only one path. In fact it hardly counts as a path finder or an algorithm; it just iterates through the map until it reaches its destination. And it works. I will post as many details as I’m allowed about exactly how the actual SQL building algorithm works, and about how I am able to merge multiple paths, so for example you can have an ArtistID and a PlaylistID to get all of the artists on a given playlist. Stay tuned.
Want to listen to one song of every artist featured at SXSW 2008, but don’t want to bother downloading the torrent?
It’s all on grooveshark in one convenient location for you.
Enjoy! Or, you know, don’t, if the music happens to suck. I haven’t actually given it a listen yet. It’s 741 songs long.
This tip may be obvious to many of you, but I just discovered it recently, and it was news to Skyler when I told him, so I’ll share it here too:
If you need to make a bunch of ALTERs to the same table, it’s much more efficient to make them all in one huge comma-delimited statement rather than running each ALTER separately.
Why? Because every time you make an ALTER, MySQL makes a new table that meets your new altered definition, copies the entire contents into it, deletes the old one and renames the new one to the name of the old one. So if you make 10 alters, MySQL will do that 10 times, but if you make one alter with 10 changes in it, MySQL will only have to copy the table once.
Skyler estimates that trick will probably save us about 10 hours when we port the old (current) database to the new schema for file/song.