RSS
 

Archive for the ‘software engineering’ Category

Migrating to github

26 Jan

I needed to migrate a couple of repositories with lots of tags and branches to github, and Github’s instrucitons just weren’t cutting it:

Existing Git Repo?
cd existing_git_repo
git remote add origin git@github.com:[orgname]/[branchname].git
git push -u origin master

That’s great as long as all I care about is moving my master branch, but what I really wanted was just to make github have everything that my current origin did, whether or not I was tracking it. A quick cursory Google search didn’t find any instructions how to do this, so I had to figure it out the old fashioned way, by reading git help pages. The quickest and easiest way I could find to do this after creating the repo on github is:

git clone git@[originurl]:[orgname]/[reponame].git [local temp dir] --mirror
cd [local temp dir]
git remote add github git@github.com:[orgname]/[reponame].git
git push -f --mirror github
cd ..
rm -rf [local temp dir]

that’s it! At this point github should have everything that your original origin had, including everything in /refs. It won’t have any of your local branches, since you did everything from a clean checkout. You might also want to change where origin is pointing:

cd [dir where your repo is checked out]
git remote add oldOrigin git@[oldgitserver]:[orgname]/[reponame].git
git remote set-url origin git@github.com:[orgname]/[reponame].git

Of course, this same procedure should work for moving from any git server to any other git server too.

 

Grooveshark is Hiring (Part 2 – Javascript Edition)

19 May

Grooveshark is looking for talented web developers. We are looking to fill a wide variety of web dev positions. The next few posts I make will be job descriptions for each of the major positions we are looking to fill. For part 2, I’m listing our Javascript developer position. If your skillset is more front-end leaning but you feel like you would be a good fit for Grooveshark, by all means apply now rather than waiting for me to post the job description. :)

Grooveshark is looking for a hardcore JavaScript developer

NOTE: Grooveshark is a web application, written with HTML/CSS, JavaScript, PHP, and ActionScript 3, not a web page or collection of webpages. This position involves working directly with the client-side application code in JavaScript: the “backend of the frontend” if you will.

Responsibilities:
Maintaining existing client-side code, creating new features and improving existing ones
Writing good, clean, fast, secure code on tight deadlines
Ensuring that client architecture is performant without sacrificing maintainability and flexible enough for rapid feature changes
Striking a balance between optimizing client performance versus minimizing load on the backend
Integration of third-party APIs

Desired Qualities:
Enjoy writing high quality, easy to read, self-documenting code
A passion for learning about new technologies and pushing yourself
A deep understanding of writing bug-free code in an event-driven, asynchronous environment
Attention to detail
A high LOC/bug ratio
Able to follow coding standards
Good written and verbal communication skills
Well versed in best practices & security concerns for web development
Ability to work independently and on teams, with little guidance and with occasional micromanagement
More pragmatic than idealistic

Experience:
Extensive JavaScript experience, preferably in the browser environment
Experience with jQuery
Experience with jQueryMX or another MVC framework
HTML & CSS experience, though writing it will not be a primary responsibility
Some PHP experience, though you won’t be required to write it
Knowledge of cross-browser compatibility ‘gotchas’
Experience with EJS, smarty, or other templating systems
Experience with version control software (especially git or another dvcs)

Bonus points for:
Having written a client application (in any language) that relies on a remote server for data storage and retrieval
Having written a non-trivial jQuery plugin
Experience with JavaScript/Flash communication via ExternalInterface
Experience with integrating popular web APIs (OAuth, Facebook, Twitter, Google, etc) into client applications
Experience with ActionScript 3 outside the Flash Professional environment (ie, non-timelined code, compiling with mxmlc or similar)
Experience developing on the LAMP stack (able to set up a LAMP install with multiple vhosts on your own)
Experience with profiling/debugging tools
Being well read in Software Engineering practices
Useful contributions to the open source community
Fluency in lots of different programming languages
BS or higher in Computer Science or related field
Being more of an ‘evening’ person than a ‘morning’ person
A passion for music and a desire to revolutionize the industry

Who we don’t want:
Architecture astronauts
Trolls
Complete n00bs (apply for internship or enroll in Grooveshark University instead!)
People who want to work a 9-5 job
People who would rather pretend to know everything than actually learn
Religious adherents to The Right Way To Do Software Development
Anyone who would rather use XML over JSON for RPC

Send us:
Resume
Code samples you love
Code samples you hate
Links to projects you’ve worked on
Favorite reading materials on Software Engineering (e.g. books, blogs)
What you love about JavaScript
What you hate about JavaScript (and not just the differences in browser implementations)
Prototypal vs Classical inheritance – what are the differences and how do you feel about each?
If you could change one thing about the way Grooveshark works, what would it be and how would you implement it?

 

If you want a job:  jay at groovesharkdotcom
If you want an internship: internships@grooveshark.com

 

Grooveshark is Hiring (Part 1 – PHP edition)

16 May

Grooveshark is looking for talented web developers. We are looking to fill a wide variety of web dev positions. The next few posts I make will be job descriptions for each of the major positions we are looking to fill. For part 1, I’m listing our backend PHP position. If your skillset is more front-end leaning but you feel like you would be a good fit for Grooveshark, by all means apply now rather than waiting for me to post the job description. :)

 

Grooveshark is seeking awesome PHP developers.

Responsibilities:
Maintaining existing backend code & APIs, creating new features and improving existing ones
Writing good, clean, fast, secure code on tight deadlines
Identifying and eliminating bottlenecks
Writing and optimizing queries for high-concurrency workloads in SQL, MongoDB, memcached, etc
Identifying and implementing new technologies and strategies to help us scale to the next level

Desired Qualities:
Enjoy writing high quality, easy to read, self-documenting code
A passion for learning about new technologies and pushing yourself
Attention to detail
A high LOC/bug ratio
Able to follow coding standards
Good written and verbal communication skills
Well versed in best practices & security concerns for web development
Ability to work independently and on teams, with little guidance and with occasional micromanagement
More pragmatic than idealistic

Experience:
Experience developing on the LAMP stack (able to set up a LAMP install with multiple vhosts on your own)
Extensive experience with PHP
Extensive experience with SQL
Some experience with Javascript, HTML & CSS though you won’t be required to write it
Some experience with lower level languages such as C/C++
Experience with version control software (especially dvcs)

Bonus points for:
Well read in Software Engineering practices
Experience with a SQL database and optimizing queries for high concurrency on large data sets.
Experience with noSQL databases like MongoDB, Redis, memcached.
Experience with Nginx
Experience creating APIs
Knowledge of Linux internals
Experience working on large scale systems with high volume of traffic
Useful contributions to the open source community
Fluency in lots of different programming languages
Experience with browser compatability weirdness
Experience with smarty or other templating systems
BS or higher in Computer Science or related field
Experience with Gearman, RabbitMQ, ActiveMQ or some other job distribution/message passing system for distributing work
A passion for music and a desire to revolutionize the industry

Who we don’t want:
Architecture astronauts
Trolls
Complete n00bs (apply for internship or enroll in Grooveshark University instead!)
People who want to work a 9-5 job
People who would rather pretend to know everything than actually learn
Religious adherents to The Right Way To Do Software Development
Anyone who loves SOAP

Send us your:
Resume
Code samples you love
Code samples you hate
Favorite reading materials on Software Engineering (e.g. books, blogs)
Tell us when you would use a framework, and when you would avoid using a framework
ORM: Pros, cons?
Unit testing: pros, cons?
Magic: pros, cons?
When/why would you denormalize?
Thoughts on SOAP vs REST

If you want a job: jay at groovesharkdotcom

If you want an internship: internships@grooveshark.com

 

Improve Code by Removing It

17 Oct

I’ve started going through O’Reilly’s 97 Things Every Programmer Should Know, and I plan to post the best ones, the ones I think we do a great job of following at Grooveshark and the ones I wish we did better, here at random intervals.

The first one is Improve Code by Removing It.

It should come as no surprise that the fastest code is code that never has to execute. Put another way, you can usually only go faster by doing less. And of course, code that never runs also exhibits no bugs. :)

Since I started at Groovshark, I’ve deleted a lot of code. 20% more code than I’ve ever written. More code than all but one of our developers has contributed. Despite that, I think we’ve only actually ever removed one feature, and we’ve added many more.

One of the things I see in the old code is an over enthusiastic attempt to make everything extremely flexible and adaptive. The original authors obviously made a noble effort to try to imagine every possible future scenario that the code might some day need to handle and then come up with an abstraction that could handle all of those cases. The problem is, those scenarios almost never come up. Instead, different features are requested which do not fit with the goals of the original abstraction at all, so then you end up having to work around it in weird ways that make the code more difficult to understand and less efficient.

Let me try to provide a more concrete example so you can see what I’m talking about. We have an Auth class that really only needs to handle 3 things:

  • Let users log in (something like Auth->login(username, password)
  • Let users log out (something like Auth->logout()
  • Find out who the logged in user is, if a user is logged in. (Auth->getUser())

Should be extremely straightforward, right? Well, the original author decided that the class should allow for multiple authentication strategies over various protocols in some scenarios that could never possibly arise (such as not having access to sessions) even though at the time only one was needed. Instead of ~100 lines of code to just get the job done and nothing else, we ended up with 1,176 lines spanning 5 files. The vast majority of that code was useless; our libraries are behind other front-facing code so the protocol and “strategy” for authenticating is handled at one level higher up, and we always use sessions so that no matter how a user logged in, they are logged in to all Grooveshark properties. When we finally did add support for a truly new way to log in (via Facebook Connect), none of that code was useful at all because Facebook Connect works in a way the author could never have anticipated 2 years ago. Finally, because the original author anticipated a scenario that cannot possibly arise (that we might know the user’s username but not their user ID), fetching the logged-in User’s information from the database required a less efficient lookup by username rather than a primary key lookup by ID.

Let’s step back a moment and pretend that the author had in fact been able to anticipate how we were going to incorporate Facebook Connect and made the class just flexible enough and just abstract enough in just the right ways to accommodate that feature that we just now got around to implementing. What would have been the benefit? Well, most of the effort of implementing that feature is handling all of the Facebook specific things, so that part would still need to be written. I’d say at best it could have saved me from having to write about 10 lines of code. In the meantime, we would have still been carrying around all of that extra code for no reason for a whole two years before it was finally needed!

Let’s apply YAGNI whenever possible, and pay the cost of adding features when they actually need to be added.

Edit:
Closely related: Beauty is in Simplicity.

 

Why You Should Always Wrap Your Package

30 Jul

Ok, the title is a bit of a stretch, but it’s a good one isn’t it?

What I really want to talk about is an example of why it’s a good idea to make wrappers for PHP extensions instead of just using them directly.

When Grooveshark started using memcached ever-so-long-ago, with the memcache pecl extension, we decided to create a GMemcache class which extends memcache. Our main reason for doing this was to add some convenience (like having the constructor register all the servers) and to add some features that the extension was missing (like key prefixes). We recently decided that it’s time to move from the stagnant memcache extension to the pecl memcached extension, which is based on libmemcached, which supports many nifty features we’ve been longing for, such as:

  • Binary protocol
  • Timeouts in milliseconds, not seconds
  • getByKey
  • CAS
  • Efficient consistent hashing
  • Buffered writes
  • Asyncronous I/O

Normally such a transition would be a nightmare. Our codebase talks to memcached in a million different places. But since we’ve been using a wrapper from day 1, I was able to make a new version of GMemcache with the same interface as the old one, that extends memcached. It handles all the minor differences between how the two work, so all the thousands of other lines in the app that talk to memcached do not have to change. That made the conversion a <1 day project, when it probably would have otherwise been a month long project. It also has the advantage that if we decide for some reason to go back to using pecl memcache, we only have to revert one file.

 

You don’t know when you’re done if you don’t know what done is

08 Aug

As I mentioned in an earlier blog post, we are working hard on a 2.0 version of the product. One of the questions our team is asked quite frequently is “when will it be ready?” This question is impossible for us to answer, because ready is not defined. It’s one of the dangers of working without a real spec.

There are a lot reasons why we don’t use a real spec, none of which are up to me so I’ll not discuss them here.

Whenever 2.0 gets close to what everyone thinks we have agreed on, people look and poke at it, decide they don’t like things or realize nobody ever asked for critical feature x, or it somehow didn’t make it onto our bug list, and then we have to go back and get new designs, file a bunch of bugs, and set a new milestone. Repeat, repeat, repeat. It’s not just for 2.0 that it works this way, it’s whenever you’re working without a complete spec. When the goal posts for done are constantly moving, the question of “when will it be done?” Is really a question of “when will the goal posts stop moving?”

To break the cycle, we’re picking a done date, and mandating that the goal posts stop moving some time before that date. Working towards that, we’ve submitted our “last chance” milestone, meaning after this milestone any decisions/changes/designs must be final, because we’re going to call it done when those have been implemented, or when we hit our chosen date, whichever comes first.

 

Leaky Abstractions

12 Sep

As with nearly every issue in Software Engineering worth thinking about, Joel Spolsky has written an article about leaky abstractions that is very relevant to some problems I ran into tonight.

Abstractions do not really simplify our lives as much as they were meant to. [...] all abstractions leak, and the only way to deal with the leaks competently is to learn about how the abstractions work and what they are abstracting. So the abstractions save us time working, but they don’t save us time learning [...] and all this means that paradoxically, even as we have higher and higher level programming tools with better and better abstractions, becoming a proficient programmer is getting harder and harder

In my case, I was not working with programming tools per se, more of an abstraction of an abstraction built into the Grooveshark framework that is meant to make life as a programmer easier. And normally it does. But in this case, that abstraction was wrapped in a couple more layers of abstraction away from where my code needed the information, and somewhere in there the information was, for lack of a better description, being mangled. The particular form of mangling is actually due to a lower level abstraction at the DB layer, but is not handled in the higher level of abstraction because for most uses it doesn’t matter.

From the level of abstraction where my code was sitting, the information needed in order to un-mangle the data was simply not available. I went through the various layers to find a convenient place to put the un-mangling, but by the time I found a place, everything was so abstract that I couldn’t confidently modify that code and see all potential ramifications, so I just did an end-run around many of the layers of abstraction. This is certainly not ideal, but it works. The point is, an abstraction intended to make life as a programmer slightly easier most of the time, can easily make life as a programmer significantly more difficult on edge cases. Unfortunately, once a code base has been established, most cases are edge cases: feature additions, new products built on top of the old infrastructure, etc., all or most unforeseen, and therefore unaccounted for during the original design process.

 

Who will monitor the monitors?

21 Aug

This is the second time that an error in a monitoring/testing tool that we use has caused me to waste a good deal of time trying to solve a non-problem.

The first was when we were using a tool to test the scaling performance of a new feature. According to this tool the performance was absolutely abysmal with only 50 virtual clients, and I spent a couple of days writing alternative algorithms and trying to test them with this tool. The scalability was horrible no matter what I did, so I added some logging to see if something was going on that wasn’t supposed to be. It was. The testing tool was sending completely wrong information, creating completely unrealistic scenarios. The testing implementation was done by another developer who was sure that he had set it up right, and I was not familiar with the tool so I assumed that it was a problem with my code. Having another developer write the tests was supposed to save me time.

The latest monitoring snag was caused by a monitoring tool that, for our convenience, tails the php error log and sends the last couple hundred lines to us whenever there is an error. We kept getting the same error periodically and could not track down the source; everything seemed to be working perfectly, but we still got the error. Usage of the site every day is breaking new records so we assumed that it had something to do with the unprecedented load on servers, but could never pinpoint the source of the errors. Today I looked a bit more closely at the error log only to discover that the log was from 3 days ago!

So the question is, when your testing or monitoring tools report an error, how do you ensure that the error is real and not just in the tool itself?

 

The pain of project management

30 May

For the second time now, Grooveshark is making a serious effort to utilize project management software.

I don’t like the new project management solution, even though it is a vast improvement over the old one, and I think I’ve discovered why: I don’t like project management. I don’t like project management because project management is not for me; it’s for managers. Managers and developers have vastly different interests. Developers want to know what bugs are in their code, what features they need to develop, a way to view dependencies, and a way to see which bugs/features are most important. Bugzilla fits those needs perfectly. Managers, on the other hand, need to make sure that they are maximizing productivity by making sure devs are never sitting idle, they need to know what is going to happen when; they need ship dates.

The thing about bug tracking is that the people who benefit from it are the same people who have to do the work. If you want to see a bug get fixed, you file it. If you want to find bugs to fix, use the bug tracker. Project management tools, on the other hand, require a vast amount of work from the people who don’t need them: developers. That drastically decreases the chances that they will be used consistently. It also means that project managers end up working against their own goal: they reduce productivity. This is consistent with Le Chatelier’s Principle: Complex systems tend to oppose their own proper function.

It’s actually a more pronounced effect than just wasting dev time on project management. At least for me, having multiple people assign tasks to me with various arbitrary deadlines tends to make me feel like I am being micromanaged, which increases stress and also keeps me from being able to focus on any one thing for long enough to accomplish much. (“I’ve been working on this thing for 3 hours and I have all these other things to do, I should put it on the back burner and move on…”)

What’s the solution? I don’t know, but a good start would be to combine project management with bug tracking, ala FogBugz, set overarching deadlines for projects, let your devs work out the details, and be flexible. Plan for other things coming up, because new and very important bugs will pop up all the time, servers will break, and meetings will happen (sadly). If your developers are good, hardworking employees (and Grooveshark devs are), they will strive to meet or exceed the goals you set for them, without being micromanaged.

 

Always test in context

03 May

When testing your work, it’s essential to always test it in the context that it’s going to be used in.

Last weekend, I helped a friend move her stuff down to the Orlando area. UHaul trailers are incredibly cheap compared to trucks, and my car has a hitch, so we got one of those, loaded up her stuff and headed south. Somewhere along the way, the hookup for the lights became disconnected and then dragged along the ground, getting all nice and melted in the process. I noticed this when we got to her new apartment and called UHaul. Awesomely, they have free roadside assistance for any time something happens to their trailer.

They sent out a guy, he looked at the hookup on the trailer, rewired it, and tested it with his wire testing doohickey before taking off. We had already disconnected the trailer from the car for parking purposes, and when he pulled in to fix the trailer he blocked access to my car, so we didn’t actually hook it up to my car to test it.

A few hours later, it’s time to go home and return the trailer and oh, crap, the lights don’t work. I had to call UHaul and have them come fix it again. Because of the way the trailer was hooked up originally, the tongue of the trailer managed to get on top of one of the wires on my side of the connector and eventually wore through it, something I hadn’t noticed previously. It also managed to blow a fuse at some point so my brake lights didn’t work at all even when the wiring was fixed. Fortunately, the UHaul guy was very friendly and also had all the parts one could ever need for fixing minor car problems, so he easily re-rewired the connections and fixed my fuse.

Still, a few extra minutes of checking the trailer in the context it was going to be used in (i.e. attached to my car) would have revealed the problems and saved UHaul several hours worth of labor (driving to and from the apartment complex; they were far away).

That was a non-technical example of the principle, but there are certainly plenty of technical ones as well:
When I wrote the code to handle our PayPal processing, PayPal helpfully provided a development sandbox for testing with. All my code was thoroughly tested, seemed to be handling every sort of error that I threw at it, processing successful charges properly and everything. Then when it was time to release we pointed it at the ‘real’ PayPal servers and suddenly it didn’t work anymore. A few (real) payments later and some minor differences between how the real servers worked and how the dev servers worked were revealed. No big deal, but again, some testing up front in the correct context would have prevented the issue from ever appearing in the first place.