RSS
 

Archive for the ‘software engineering’ Category

Always test in context

03 May

When testing your work, it’s essential to always test it in the context that it’s going to be used in.

Last weekend, I helped a friend move her stuff down to the Orlando area. UHaul trailers are incredibly cheap compared to trucks, and my car has a hitch, so we got one of those, loaded up her stuff and headed south. Somewhere along the way, the hookup for the lights became disconnected and then dragged along the ground, getting all nice and melted in the process. I noticed this when we got to her new apartment and called UHaul. Awesomely, they have free roadside assistance for any time something happens to their trailer.

They sent out a guy, he looked at the hookup on the trailer, rewired it, and tested it with his wire testing doohickey before taking off. We had already disconnected the trailer from the car for parking purposes, and when he pulled in to fix the trailer he blocked access to my car, so we didn’t actually hook it up to my car to test it.

A few hours later, it’s time to go home and return the trailer and oh, crap, the lights don’t work. I had to call UHaul and have them come fix it again. Because of the way the trailer was hooked up originally, the tongue of the trailer managed to get on top of one of the wires on my side of the connector and eventually wore through it, something I hadn’t noticed previously. It also managed to blow a fuse at some point so my brake lights didn’t work at all even when the wiring was fixed. Fortunately, the UHaul guy was very friendly and also had all the parts one could ever need for fixing minor car problems, so he easily re-rewired the connections and fixed my fuse.

Still, a few extra minutes of checking the trailer in the context it was going to be used in (i.e. attached to my car) would have revealed the problems and saved UHaul several hours worth of labor (driving to and from the apartment complex; they were far away).

That was a non-technical example of the principle, but there are certainly plenty of technical ones as well:
When I wrote the code to handle our PayPal processing, PayPal helpfully provided a development sandbox for testing with. All my code was thoroughly tested, seemed to be handling every sort of error that I threw at it, processing successful charges properly and everything. Then when it was time to release we pointed it at the ‘real’ PayPal servers and suddenly it didn’t work anymore. A few (real) payments later and some minor differences between how the real servers worked and how the dev servers worked were revealed. No big deal, but again, some testing up front in the correct context would have prevented the issue from ever appearing in the first place.

 

Back

24 Apr

I’m back from my mini-vacation and it sounds like things are crazy at the office, but I think even so that I should have time to start writing again and I look forward to doing so.

My first order of business tonight is to start reading Systemantics: How Systems Work and Especially How They Fail by John Gall which was recommended by Raymond Chen and therefore automatically must be worth reading. Because I’m sure many of you are too lazy to actually follow that link, here’s the relevant RChen review:

Systemantics is very much like The Mythical Man-Month, but with a lot more attitude. The most important lessons I learned are a reinterpretation of Le Chatelier’s Principle for complex systems (“Every complex system resists its proper functioning”) and the Fundamental Failure-Mode Theorem (“Every complex system is operating in an error mode”).

You’ve all experienced the Fundamental Failure-Mode Theorem: You’re investigating a problem and along the way you find some function that never worked. A cache has a bug that results in cache misses when there should be hits. A request for an object that should be there somehow always fails. And yet the system still worked in spite of these errors. Eventually you trace the problem to a recent change that exposed all of the other bugs. Those bugs were always there, but the system kept on working because there was enough redundancy that one component was able to compensate for the failure of another component. Sometimes this chain of errors and compensation continues for several cycles, until finally the last protective layer fails and the underlying errors are exposed.

That’s why I’m skeptical of people who look at some catastrophic failure of a complex system and say, “Wow, the odds of this happening are astronomical. Five different safety systems had to fail simultaneously!” What they don’t realize is that one or two of those systems are failing all the time, and it’s up to the other three systems to prevent the failure from turning into a disaster. You never see a news story that says “A gas refinery did not explode today because simultaneous failures in the first, second, fourth, and fifth safety systems did not lead to a disaster thanks to a correctly-functioning third system.” The role of the failure and the savior may change over time, until eventually all of the systems choose to have a bad day all on the same day, and something goes boom.

Time to hit the books!

 

Development Timeline as a Contract

20 Mar

Development timelines are a contract, in many ways.

Contract negotiation happens when developers sit down with management to hash out a release date for a product or feature. As with any other contract negotiation, both sides come to the table with their own demands, and there are concessions on both sides, but hopefully when the negotiation is over, both parties can be happy with the results. It is also essential that the terms of the contract are clear: if developers and management have different understandings of what feature XYZ entails, they might think they have come to an agreement when they haven’t; this will only lead to problems later, usually altered feature sets or later release dates.

It is important to keep in mind that contracts can be breached by either party, and this is certainly the case when dealing with timelines. If either side fails to hold up its end of the bargain, the timeline will slip and the contract will be broken. It’s obvious how developers can be in breach of this contract, and they are certainly usually the ones held responsible for it, but how can management be at fault? Well, it entirely depends on the negotiation process. If, for example, management assures the team that they will have a certain server in place and ready for production N days before release, and server deployment is delayed, development cannot be held accountable for the schedule slippage. Likewise if the development team asks for feature lockdown and management continues changing the specifications throughout the development cycle.

In an ideal situation, both parties of the contract are working towards the same goal: a product that will make the company more successful. What type of product that is exactly depends on the company, but goals that any team would strive for includes an intuitive interface, a useful feature set, and a bug-free user experience. With a set of shared goals, if both parties are able to hold each other accountable and enforce the terms of the contract, the end result is usually an on time release that both parties can be satisfied with.

In situations that are less ideal, where developers cannot expect management to live up to their terms of the contract, compromises will have to be made somewhere internally. A feature will silently go missing, the product will be poorly tested, or release dates will be moved back.

 

Music 1.0 is Dead

28 Feb

Music exec: “Music 1.0 is dead.”

Five hundred top members of the music business gathered today in New York to hear that “music 1.0 is dead.” Ted Cohen, a former EMI exec who used the phrase, opened the Digital Music Forum East by pleading with the industry to be wildly creative with new business models but not to “be desperate” during this transitional period.

Consider the statements that were made today without controversy:

  • DRM on purchased music is dead
  • A utility pricing model or flat-rate fee for music might be the way to go
  • Ad-supported streaming music sites like iMeem are legitimate players
  • Indie music accounts for upwards of 30 percent of music sales
  • Napster isn’t losing $70 million per quarter (and is breaking even)
  • The music business is a bastion of creativity and experimentation

Just within the last year, we’ve seen an array of experiments that include ad-supported streaming, “album cards” from labels like Sony BMG, and allowing Amazon to offer MP3s from all four majors. Some labels even allow user-generated content to make use of their music in return for a revenue share from sites like YouTube—unthinkable a few years ago to a business wedded to control over its music and marketing.

All of this bodes very, very well for Grooveshark, aside from the fact that we weren’t used as an example. We’ll soon be getting much more attention in that vein, but hopefully not until we’ve had a chance to improve the site in these areas:

  1. Differentiating between files and songs properly
  2. Faster loading times
  3. Better searching
  4. More user friendly interface
  5. Eliminate silent failure

So clearly, we have a lot of work cut out for us in a market that is on the verge of exploding, but if we can focus our resources I think we can be mostly there within a few weeks. Now to get management on board.

 

No Room for Ego

18 Feb

There’s no room for ego in software development, and especially if you are a small startup setting out to change the world, there’s no room for ego on any team.

Here at Grooveshark, we have the smartest bunch of people I’ve ever worked with (and they’re not paying me to say that), but we all make mistakes, have a bad brain day and make less than optimal decisions sometimes. When there is no ego involved, we can all point out to each other when one of us is being stupid, and we can easily own up to our own mistakes and learn from them (and make the site better), and we can ask questions when we don’t know something.

If ego is involved, it’s no longer easy to tell someone they are doing something wrong or question their decisions, because you might hurt their feelings. Conversely, if you are maintaining an ego, you’re going to have a hard time admitting that you don’t know things that you need to.

Either way, precious time and resources end up being wasted, either fixing bugs, redesigning architecture or spinning your wheels trying to figure out something that someone else probably already has the answer to.

I ask my co-workers probably hundreds of questions a day. I’m not an expert on the way every single piece of our site works and it doesn’t hurt my ego to admit that. So when I find a bug in, say, the recommendation engine, I can go over to Travis who seems to know that system inside and out, and ask him exactly what is happening. I don’t think Travis has any less respect for me for it, but now I know more about how that part works and I fixed the bug, so now we have a better product, and it only took me a few minutes because I didn’t have to analyze each line of code tracing through the object hierarchies, etc.

When we have to design an important new feature or revamp the way part of our architecture works, we have a meeting about it and everyone provides their input. We usually start out with a few different ideas and everyone argues and makes a case for their idea. In the process of doing that we change our minds and come to a consensus. Because there’s no ego involved, we’re each prepared to give up our idea if a better one is presented, and we usually end up coming up with a solution that is better than what any of us would have designed on our own.

This concept applies to other teams/departments as well. In my experience, those teams open to constructive criticism from outside the group are the most effective, and those that are least receptive to feedback tend to have lower quality solutions.