GitHub for reviewing code

A couple of weeks ago we started (in my current job) to use GitHub internally for our projects. We were already using git, so it sort of make sense to use GitHub, as it is very widespread and used in the community. I had used GitHub before, but only as a remote repository and to get code, but without much interaction with the “GitHub extra features”. I must say, I was excited about using it, as I though that it will be a good step forward in making the code more visible and adding some cool features.

One of the main uses we have for GitHub is using it for code reviews. In DemonWare we peer-review all the code, which really improves the quality of the code. Of course, peer-review is different from reviewing the code in an open software situation, as it is done more often and I suspect than the average number of reviewers is lower in most open source projects. We were using Rietveld, which is a good tool, but probably not stellar when you start using it. The main process was, write code, submit it to Rietveld, ask for reviewers, check and discuss comments, update the code and repeat the process; and wait for approval for the reviewers to submit the code to the remote repo.

What can I say? I am quite disappointed with the result. It seems that the tools for code review in GitHub are not great, to put it lightly. Not great at all.

I guess that my disappointment is big because I though that, as GitHub is so used in the open source community, the review tools should be very good, as open source is all about checking and reviewing code that anyone can send. But the fact is that the review tool itself is pretty limited.

Missing features

Git compare

GitHub compare view which is similar to pull request one

The only way of looking the code is through diffs. That means colored extracts, red for code removed, green for code added. Those are very useful and up to the point, but hardly are enough, specially for big changes of code. It is a good overview, but it doesn’t allow to understand the full picture. Sometimes more context than a couple of lines is needed to understand what’s going on. That’s when side-by-side comparison comes. Surprisingly, GitHub does not allow to check code side by side. I mean, that is, IMHO, really really basic and powerful tool, and I cannot understand why it is not available in GitHub. It is available from ages ago in any IDE or in tools like Meld, WinMergeKompare or Kaleidoscope. Even VimDiff. Rietveld uses side-by-side comparison, allowing to show the full code if needed.

It is also complicated to get a view of how the code changes from the first submitted code for review, to the current status (last submitted commit), as well as to previous intermediate steps and to the original code. You can check one commit to the next, but not easily between all the different steps. After all, is not unusual that a particular review has two or three intermediate steps until everyone is happy with the resulting code.

Another problem is the flow of the page. New comments and commits get added at the bottom of the page. My way of looking at it will be to add them at the top, so I can refresh the page and check the news, but that’s not very important. What it’s important is that not everything new is added at the bottom. New general comments and commits are, but the comments that addresses parts of the code adds up on the original position, the diff parts. There is also no way of distinguish new comments, so it’s easy to skip them.

Also, there is no way of closing a comment. Something like marking as “done”, “closed” or similar. That makes it difficult to keep track of what comments are still relevant after implementing some in a new code submission. Even worse, if there is a comment related to some code that changes, the comment gets hidden (and marked as “updated”). Wait a second, just because I changed some code, it doesn’t mean that the comment is not relevant any more. It MAY be outdated, but I want someone to take that decision. Let’s say I submit some changes, someone else reviews it and thinks it should be done in a different way. If I change the code again (in a new commit), probably the reviewer wants to be sure that the new code is good. The way it’s implemented makes it very easy to just miss the new code. Sure, you can open a new comment, if you remember where to look, but the tool should point you “hey, you said before that this was interesting, maybe it still is”.

In Rietveld, the way this works is through “revisions”, so each revision has a code associated and comments. Publishing a revision publish all the comments at once, and you can always check between revisions. As you check one revision against another, you can see the comments and keep them going to the new revision if necessary. Comments can be marked as “done”.

Annoying parts

Another annoying feature is sending an individual email for each comment. I already receive too any emails every day, and most of them, created by tools, are pretty much spam. Filtering helps, but can be very tricky when the number of notifications grow… As the signal to noise ratio gets lower, is more and more difficult to get the information, so it’s easier to skip the important bits. Email as a notification system is not great. And getting one email per comment is way just too much, as there can be a significant number of comments per review, specially if the review grows. There is also no options to digest all those notifications somehow, so you can maybe submit a group of related comments as a unique review. That, at least, gives less email, with more info per notification.

I think that with a better notification system, one that will make very clear what comment and changes are new and related to which commit (maybe using colors), email could be almost completely avoided, other than an optional “something new has happen in this project, visit project page to check” that won’t be sent again until you visit the page. I know that this sounds easier that it actually is, but that will be a great improvement towards reducing the number of emails and increasing the relevance of each one.

Rietveld send only one email per revision. Even when the comments, changes, etc are present in the email, it is basically a reminder to go to the page and check the new revision, as it is clear what is new.

All those details, after using GitHub, has made me appreciate Rietveld as a tool that is probably not very impressive at first glance, but it clearly gets straight to the point for code reviews, and it has been done with focus on peer-review. I guess is one of the Guido Van Rossum (creator also of Python) talents, know what are the bits and pieces that resonate in your daily work and integrate well with your flow.

Messy git history

Last but not least, I’m not totally sold to the “pull request” way of doing things, in the sense that it forces you to submit your changes to the remote repo, and then do a merge on top to the main branch. Well, one of the things that I find more appealing of git is the way that you can create a clean, meaningful git history. Using carefully rebase and commit –amend, the git history is a very good tool to know what features and bugfixes are implemented, where and when. But the GitHub pull request way is not really encouraging that.

First of all, I think that a pull-request merge that will actually rebase the new branch on top of master is something that should be already implemented. It is a very basic thing that can easily avoid useless and redundant merge commits that only adds noise to the git history.

I’d also love to see a way of squashing together “review commits”, the ones that are only refactoring small pieces of code of the previous commit, or fixing typos; so keeping core functionality together and keeping meaningful, clean commits. This probably is more difficult to implement in a way that is both simple and does not promote abuse. I know that this can be achieved by doing it locally and doing force push, but it feels to me that it’s a very hacky way of doing something that should be natural.

GitHub, I’m counting on you to fix it

Of course, I understand that some of the features are done in a certain way because the typical workflow in GitHub is different that peer-review, but as GitHub is also promoting its way into companies (and not only open source projects) I think that it needs to improve sensibly all those tools. They have obviously the talent to achieve this and some of the stuff is pretty basic and shoulnd’t be that difficult to implement. Being honest, I find totally unbelievable that some really basic stuff like side-by-side comparison and a better email system are not available. It is not that Rietveld is perfect, though. But it seems to give the minimum and do it in a way that gets integrated in your regular workflow.

I’d also like to say that I think that GitHub is a very interesting project for many reasons. They have done a great job in promoting the use of git and in behalf of the open source community. It is also a joy to watch and it’s full of really nice small interface details. All that it is very important and I really appreciate. That’s why I am so shocked with the limiting code review tools they have….

So, GitHub, if you are reading this, go fix it.

4 thoughts on “GitHub for reviewing code

  1. I think that email notifications can be turned off and only use web notification. Unfortunately you can not selectively disable pull request notifications.

    With webservice hooks you can receive all notifications on a custom server and then implement your own notifications rules.

    This is how I plan to filter/route notifications for the whole team.

    —-

    My biggest annoyance is the lack of support for “retargeting” a pull request.

    Does your branch workflow involve “stacked”/”chained”/”dependent” topic/feature branches?

    For implementing non-trivial “user story” (or how you call a product feature) the diff will be larger than 1000 lines of diff (just a random limit).
    To help with review, I break the code into a series of branches, each delivering usable/testable ready to review code.
    When one such “sub-feature” branch is ready, I prepare the review, send it for review and then create a new branch and implement the goal feature on top of this branch.

    By doing so, I can end up with a backlog of up to 2 or 3 branches.

    Since the 3rd branch is based on the 2nd, I should pull the request to 2nd, but at some time 2nd branch will be merged into 1st or master and the pull request will no longer update it’s diff and a new pull request is required.

    I am still looking for a workaround for this issue.

    Thanks!

    • About notifications, right now is an ‘all-or-nothing’ thing. Either get each individual comment, or not receiving any.

      Using a webservice hook to filter notifications looks a little complex, and probably needs time to done right, but I’ll take a look.

      Yes, I know what you mean about the pull requests when more than one branch is involved and the code update. That’s also not very well handled, as the diffs are not updated with new code (only hidden when the code change), which can be confusing…
      I’m afraid I can’t think of a better way to do it…

  2. Hi,

    Here is a quick hack for GitHub Service Hooks Server https://github.com/chevah/txghserf

    GitHub Issue Management is too simple for my taste, so I use this serve to sync GitHub activity with Trac (my main project management tool). For example, close Trac ticket on merge, change Trac ticket state when a new pull request is created.

    Hope it helps.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s