RSS
 

Archive for the ‘software engineering’ Category

Git Tips (just a couple)

05 Jun

Below are a couple of git tips for the advanced git user. This is not meant to be remotely comprehensive in any way, just a couple of things I have figured out that I haven’t seen floating around out there yet.

Always rebase

At Grooveshark, we like to keep our history clean, such that the history of the main repo reflects when code was actually committed/accepted, regardless of when that code was written. As such, we almost always want to rebase. If you search for how to rebase by default, most of the advice you’ll find says to add the following to your ~/.gitconfig:

[branch]
    autosetuprebase = always

What this does, however, is it tells git to set up each branch to rebase as you create it. Any branch that git already knows about will not be set up to rebase; you’ll have to do that manually. Then in the future if you change your mind and decide you don’t want this functionality, you will have to remove it from each branch one by one. Instead put this in your ~/.gitconfig:

[pull]
    rebase = true

Now you’ve made the default behavior for pulling be to rebase. If you want to override that behavior for a particular branch, just set rebase = false in the .git/config file for the applicable repo, under the particular branch you want to do that for.

Github pull requests by the number

As mentioned earlier, when we do a pull request, we want it to be rebased onto the latest before it gets merged, so it’s a simple fast forward. Add the following to your ~/.gitconfig:

[alias]
    review = !sh -c 'REF=`git rev-parse --symbolic-full-name --abbrev-ref HEAD` && git fetch $1 $2:mergeReview -f && git checkout mergeReview && git rebase $REF && git checkout $REF && git diff HEAD..mergeReview' -
    pr = !git review origin refs/pull/$1/head
    approve = merge mergeReview

What’s going on here?
First we make a new command for reviewing arbitrary refs from arbitrary forks, called ‘review’. To use it, first checkout the target branch that you want to merge into, then run: ‘git review {remote} {ref}‘. Behind the scenes that fetches the remote ref into a temporary local branch called mergeReview, based on the remote ref you are reviewing. It rebases that onto the target branch you were in, switches you back to the target branch and then does a diff for you to review.
Next we make a new command for reviewing pull requests from github, called ‘pr’. This really just saves you from having to remember where github keeps pull request refs. It just calls git review and passes in the remote ref for the pull request.
Finally, ‘approve’ just merges the temporary mergeReview branch into the branch you’re currently in, which if you followed the instructions above is the target branch.

 

Migrating to github

26 Jan

I needed to migrate a couple of repositories with lots of tags and branches to github, and Github’s instrucitons just weren’t cutting it:

Existing Git Repo?
cd existing_git_repo
git remote add origin [email protected]:[orgname]/[branchname].git
git push -u origin master

That’s great as long as all I care about is moving my master branch, but what I really wanted was just to make github have everything that my current origin did, whether or not I was tracking it. A quick cursory Google search didn’t find any instructions how to do this, so I had to figure it out the old fashioned way, by reading git help pages. The quickest and easiest way I could find to do this after creating the repo on github is:

git clone git@[originurl]:[orgname]/[reponame].git [local temp dir] --mirror
cd [local temp dir]
git remote add github [email protected]:[orgname]/[reponame].git
git push -f --mirror github
cd ..
rm -rf [local temp dir]

that’s it! At this point github should have everything that your original origin had, including everything in /refs. It won’t have any of your local branches, since you did everything from a clean checkout. You might also want to change where origin is pointing:

cd [dir where your repo is checked out]
git remote add oldOrigin git@[oldgitserver]:[orgname]/[reponame].git
git remote set-url origin [email protected]:[orgname]/[reponame].git

Of course, this same procedure should work for moving from any git server to any other git server too.

 

Grooveshark is Hiring (Part 2 – Javascript Edition)

19 May

Grooveshark is looking for talented web developers. We are looking to fill a wide variety of web dev positions. The next few posts I make will be job descriptions for each of the major positions we are looking to fill. For part 2, I’m listing our Javascript developer position. If your skillset is more front-end leaning but you feel like you would be a good fit for Grooveshark, by all means apply now rather than waiting for me to post the job description. :)

Grooveshark is looking for a hardcore JavaScript developer

NOTE: Grooveshark is a web application, written with HTML/CSS, JavaScript, PHP, and ActionScript 3, not a web page or collection of webpages. This position involves working directly with the client-side application code in JavaScript: the “backend of the frontend” if you will.

Must be willing to relocate to Gainesville, FL and legally work in the US. Relocation assistance is available.
Responsibilities:
Maintaining existing client-side code, creating new features and improving existing ones
Writing good, clean, fast, secure code on tight deadlines
Ensuring that client architecture is performant without sacrificing maintainability and flexible enough for rapid feature changes
Striking a balance between optimizing client performance versus minimizing load on the backend
Integration of third-party APIs

Desired Qualities:
Enjoy writing high quality, easy to read, self-documenting code
A passion for learning about new technologies and pushing yourself
A deep understanding of writing bug-free code in an event-driven, asynchronous environment
Attention to detail
A high LOC/bug ratio
Able to follow coding standards
Good written and verbal communication skills
Well versed in best practices & security concerns for web development
Ability to work independently and on teams, with little guidance and with occasional micromanagement
More pragmatic than idealistic

Experience:
Extensive JavaScript experience, preferably in the browser environment
Experience with jQuery
Experience with jQueryMX or another MVC framework
HTML & CSS experience, though writing it will not be a primary responsibility
Some PHP experience, though you won’t be required to write it
Knowledge of cross-browser compatibility ‘gotchas’
Experience with EJS, smarty, or other templating systems
Experience with version control software (especially git or another dvcs)

Bonus points for:
Having written a client application (in any language) that relies on a remote server for data storage and retrieval
Having written a non-trivial jQuery plugin
Experience with JavaScript/Flash communication via ExternalInterface
Experience with integrating popular web APIs (OAuth, Facebook, Twitter, Google, etc) into client applications
Experience with ActionScript 3 outside the Flash Professional environment (ie, non-timelined code, compiling with mxmlc or similar)
Experience developing on the LAMP stack (able to set up a LAMP install with multiple vhosts on your own)
Experience with profiling/debugging tools
Being well read in Software Engineering practices
Useful contributions to the open source community
Fluency in lots of different programming languages
BS or higher in Computer Science or related field
Being more of an ‘evening’ person than a ‘morning’ person
A passion for music and a desire to revolutionize the industry

Who we don’t want:
Architecture astronauts
Trolls
Complete n00bs (apply for internship or enroll in Grooveshark University instead!)
People who want to work a 9-5 job
People who would rather pretend to know everything than actually learn
Religious adherents to The Right Way To Do Software Development
Anyone who would rather use XML over JSON for RPC

Send us:
Resume
Code samples you love
Code samples you hate
Links to projects you’ve worked on
Favorite reading materials on Software Engineering (e.g. books, blogs)
What you love about JavaScript
What you hate about JavaScript (and not just the differences in browser implementations)
Prototypal vs Classical inheritance – what are the differences and how do you feel about each?
If you could change one thing about the way Grooveshark works, what would it be and how would you implement it?

 

If you want a job:  jay at groovesharkdotcom
If you want an internship: [email protected]

 

Grooveshark is Hiring (Part 1 – PHP edition)

16 May

Grooveshark is looking for talented web developers. We are looking to fill a wide variety of web dev positions. The next few posts I make will be job descriptions for each of the major positions we are looking to fill. For part 1, I’m listing our backend PHP position. If your skillset is more front-end leaning but you feel like you would be a good fit for Grooveshark, by all means apply now rather than waiting for me to post the job description. :)

 

Grooveshark is seeking awesome PHP developers.

Must be willing to relocate to Gainesville, FL and legally work in the US. Relocation assistance is available.
Responsibilities:
Maintaining existing backend code & APIs, creating new features and improving existing ones
Writing good, clean, fast, secure code on tight deadlines
Identifying and eliminating bottlenecks
Writing and optimizing queries for high-concurrency workloads in SQL, MongoDB, memcached, etc
Identifying and implementing new technologies and strategies to help us scale to the next level

Desired Qualities:
Enjoy writing high quality, easy to read, self-documenting code
A passion for learning about new technologies and pushing yourself
Attention to detail
A high LOC/bug ratio
Able to follow coding standards
Good written and verbal communication skills
Well versed in best practices & security concerns for web development
Ability to work independently and on teams, with little guidance and with occasional micromanagement
More pragmatic than idealistic

Experience:
Experience developing on the LAMP stack (able to set up a LAMP install with multiple vhosts on your own)
Extensive experience with PHP
Extensive experience with SQL
Some experience with Javascript, HTML & CSS though you won’t be required to write it
Some experience with lower level languages such as C/C++
Experience with version control software (especially dvcs)

Bonus points for:
Well read in Software Engineering practices
Experience with a SQL database and optimizing queries for high concurrency on large data sets.
Experience with noSQL databases like MongoDB, Redis, memcached.
Experience with Nginx
Experience creating APIs
Knowledge of Linux internals
Experience working on large scale systems with high volume of traffic
Useful contributions to the open source community
Fluency in lots of different programming languages
Experience with browser compatability weirdness
Experience with smarty or other templating systems
BS or higher in Computer Science or related field
Experience with Gearman, RabbitMQ, ActiveMQ or some other job distribution/message passing system for distributing work
A passion for music and a desire to revolutionize the industry

Who we don’t want:
Architecture astronauts
Trolls
Complete n00bs (apply for internship or enroll in Grooveshark University instead!)
People who want to work a 9-5 job
People who would rather pretend to know everything than actually learn
Religious adherents to The Right Way To Do Software Development
Anyone who loves SOAP

Send us your:
Resume
Code samples you love
Code samples you hate
Favorite reading materials on Software Engineering (e.g. books, blogs)
Tell us when you would use a framework, and when you would avoid using a framework
ORM: Pros, cons?
Unit testing: pros, cons?
Magic: pros, cons?
When/why would you denormalize?
Thoughts on SOAP vs REST

If you want a job: jay at groovesharkdotcom

If you want an internship: [email protected]

 

Improve Code by Removing It

17 Oct

I’ve started going through O’Reilly’s 97 Things Every Programmer Should Know, and I plan to post the best ones, the ones I think we do a great job of following at Grooveshark and the ones I wish we did better, here at random intervals.

The first one is Improve Code by Removing It.

It should come as no surprise that the fastest code is code that never has to execute. Put another way, you can usually only go faster by doing less. And of course, code that never runs also exhibits no bugs. :)

Since I started at Groovshark, I’ve deleted a lot of code. 20% more code than I’ve ever written. More code than all but one of our developers has contributed. Despite that, I think we’ve only actually ever removed one feature, and we’ve added many more.

One of the things I see in the old code is an over enthusiastic attempt to make everything extremely flexible and adaptive. The original authors obviously made a noble effort to try to imagine every possible future scenario that the code might some day need to handle and then come up with an abstraction that could handle all of those cases. The problem is, those scenarios almost never come up. Instead, different features are requested which do not fit with the goals of the original abstraction at all, so then you end up having to work around it in weird ways that make the code more difficult to understand and less efficient.

Let me try to provide a more concrete example so you can see what I’m talking about. We have an Auth class that really only needs to handle 3 things:

  • Let users log in (something like Auth->login(username, password)
  • Let users log out (something like Auth->logout()
  • Find out who the logged in user is, if a user is logged in. (Auth->getUser())

Should be extremely straightforward, right? Well, the original author decided that the class should allow for multiple authentication strategies over various protocols in some scenarios that could never possibly arise (such as not having access to sessions) even though at the time only one was needed. Instead of ~100 lines of code to just get the job done and nothing else, we ended up with 1,176 lines spanning 5 files. The vast majority of that code was useless; our libraries are behind other front-facing code so the protocol and “strategy” for authenticating is handled at one level higher up, and we always use sessions so that no matter how a user logged in, they are logged in to all Grooveshark properties. When we finally did add support for a truly new way to log in (via Facebook Connect), none of that code was useful at all because Facebook Connect works in a way the author could never have anticipated 2 years ago. Finally, because the original author anticipated a scenario that cannot possibly arise (that we might know the user’s username but not their user ID), fetching the logged-in User’s information from the database required a less efficient lookup by username rather than a primary key lookup by ID.

Let’s step back a moment and pretend that the author had in fact been able to anticipate how we were going to incorporate Facebook Connect and made the class just flexible enough and just abstract enough in just the right ways to accommodate that feature that we just now got around to implementing. What would have been the benefit? Well, most of the effort of implementing that feature is handling all of the Facebook specific things, so that part would still need to be written. I’d say at best it could have saved me from having to write about 10 lines of code. In the meantime, we would have still been carrying around all of that extra code for no reason for a whole two years before it was finally needed!

Let’s apply YAGNI whenever possible, and pay the cost of adding features when they actually need to be added.

Edit:
Closely related: Beauty is in Simplicity.

 

Why You Should Always Wrap Your Package

30 Jul

Ok, the title is a bit of a stretch, but it’s a good one isn’t it?

What I really want to talk about is an example of why it’s a good idea to make wrappers for PHP extensions instead of just using them directly.

When Grooveshark started using memcached ever-so-long-ago, with the memcache pecl extension, we decided to create a GMemcache class which extends memcache. Our main reason for doing this was to add some convenience (like having the constructor register all the servers) and to add some features that the extension was missing (like key prefixes). We recently decided that it’s time to move from the stagnant memcache extension to the pecl memcached extension, which is based on libmemcached, which supports many nifty features we’ve been longing for, such as:

  • Binary protocol
  • Timeouts in milliseconds, not seconds
  • getByKey
  • CAS
  • Efficient consistent hashing
  • Buffered writes
  • Asyncronous I/O

Normally such a transition would be a nightmare. Our codebase talks to memcached in a million different places. But since we’ve been using a wrapper from day 1, I was able to make a new version of GMemcache with the same interface as the old one, that extends memcached. It handles all the minor differences between how the two work, so all the thousands of other lines in the app that talk to memcached do not have to change. That made the conversion a <1 day project, when it probably would have otherwise been a month long project. It also has the advantage that if we decide for some reason to go back to using pecl memcache, we only have to revert one file.

 

You don’t know when you’re done if you don’t know what done is

08 Aug

As I mentioned in an earlier blog post, we are working hard on a 2.0 version of the product. One of the questions our team is asked quite frequently is “when will it be ready?” This question is impossible for us to answer, because ready is not defined. It’s one of the dangers of working without a real spec.

There are a lot reasons why we don’t use a real spec, none of which are up to me so I’ll not discuss them here.

Whenever 2.0 gets close to what everyone thinks we have agreed on, people look and poke at it, decide they don’t like things or realize nobody ever asked for critical feature x, or it somehow didn’t make it onto our bug list, and then we have to go back and get new designs, file a bunch of bugs, and set a new milestone. Repeat, repeat, repeat. It’s not just for 2.0 that it works this way, it’s whenever you’re working without a complete spec. When the goal posts for done are constantly moving, the question of “when will it be done?” Is really a question of “when will the goal posts stop moving?”

To break the cycle, we’re picking a done date, and mandating that the goal posts stop moving some time before that date. Working towards that, we’ve submitted our “last chance” milestone, meaning after this milestone any decisions/changes/designs must be final, because we’re going to call it done when those have been implemented, or when we hit our chosen date, whichever comes first.

 

Leaky Abstractions

12 Sep

As with nearly every issue in Software Engineering worth thinking about, Joel Spolsky has written an article about leaky abstractions that is very relevant to some problems I ran into tonight.

Abstractions do not really simplify our lives as much as they were meant to. [...] all abstractions leak, and the only way to deal with the leaks competently is to learn about how the abstractions work and what they are abstracting. So the abstractions save us time working, but they don’t save us time learning [...] and all this means that paradoxically, even as we have higher and higher level programming tools with better and better abstractions, becoming a proficient programmer is getting harder and harder

In my case, I was not working with programming tools per se, more of an abstraction of an abstraction built into the Grooveshark framework that is meant to make life as a programmer easier. And normally it does. But in this case, that abstraction was wrapped in a couple more layers of abstraction away from where my code needed the information, and somewhere in there the information was, for lack of a better description, being mangled. The particular form of mangling is actually due to a lower level abstraction at the DB layer, but is not handled in the higher level of abstraction because for most uses it doesn’t matter.

From the level of abstraction where my code was sitting, the information needed in order to un-mangle the data was simply not available. I went through the various layers to find a convenient place to put the un-mangling, but by the time I found a place, everything was so abstract that I couldn’t confidently modify that code and see all potential ramifications, so I just did an end-run around many of the layers of abstraction. This is certainly not ideal, but it works. The point is, an abstraction intended to make life as a programmer slightly easier most of the time, can easily make life as a programmer significantly more difficult on edge cases. Unfortunately, once a code base has been established, most cases are edge cases: feature additions, new products built on top of the old infrastructure, etc., all or most unforeseen, and therefore unaccounted for during the original design process.

 

Who will monitor the monitors?

21 Aug

This is the second time that an error in a monitoring/testing tool that we use has caused me to waste a good deal of time trying to solve a non-problem.

The first was when we were using a tool to test the scaling performance of a new feature. According to this tool the performance was absolutely abysmal with only 50 virtual clients, and I spent a couple of days writing alternative algorithms and trying to test them with this tool. The scalability was horrible no matter what I did, so I added some logging to see if something was going on that wasn’t supposed to be. It was. The testing tool was sending completely wrong information, creating completely unrealistic scenarios. The testing implementation was done by another developer who was sure that he had set it up right, and I was not familiar with the tool so I assumed that it was a problem with my code. Having another developer write the tests was supposed to save me time.

The latest monitoring snag was caused by a monitoring tool that, for our convenience, tails the php error log and sends the last couple hundred lines to us whenever there is an error. We kept getting the same error periodically and could not track down the source; everything seemed to be working perfectly, but we still got the error. Usage of the site every day is breaking new records so we assumed that it had something to do with the unprecedented load on servers, but could never pinpoint the source of the errors. Today I looked a bit more closely at the error log only to discover that the log was from 3 days ago!

So the question is, when your testing or monitoring tools report an error, how do you ensure that the error is real and not just in the tool itself?

 

The pain of project management

30 May

For the second time now, Grooveshark is making a serious effort to utilize project management software.

I don’t like the new project management solution, even though it is a vast improvement over the old one, and I think I’ve discovered why: I don’t like project management. I don’t like project management because project management is not for me; it’s for managers. Managers and developers have vastly different interests. Developers want to know what bugs are in their code, what features they need to develop, a way to view dependencies, and a way to see which bugs/features are most important. Bugzilla fits those needs perfectly. Managers, on the other hand, need to make sure that they are maximizing productivity by making sure devs are never sitting idle, they need to know what is going to happen when; they need ship dates.

The thing about bug tracking is that the people who benefit from it are the same people who have to do the work. If you want to see a bug get fixed, you file it. If you want to find bugs to fix, use the bug tracker. Project management tools, on the other hand, require a vast amount of work from the people who don’t need them: developers. That drastically decreases the chances that they will be used consistently. It also means that project managers end up working against their own goal: they reduce productivity. This is consistent with Le Chatelier’s Principle: Complex systems tend to oppose their own proper function.

It’s actually a more pronounced effect than just wasting dev time on project management. At least for me, having multiple people assign tasks to me with various arbitrary deadlines tends to make me feel like I am being micromanaged, which increases stress and also keeps me from being able to focus on any one thing for long enough to accomplish much. (“I’ve been working on this thing for 3 hours and I have all these other things to do, I should put it on the back burner and move on…”)

What’s the solution? I don’t know, but a good start would be to combine project management with bug tracking, ala FogBugz, set overarching deadlines for projects, let your devs work out the details, and be flexible. Plan for other things coming up, because new and very important bugs will pop up all the time, servers will break, and meetings will happen (sadly). If your developers are good, hardworking employees (and Grooveshark devs are), they will strive to meet or exceed the goals you set for them, without being micromanaged.