Your project's RCS history affects ease of contribution (or: don't squash PRs)

ereyes01 · on May 21, 2016

Your commit history should be readable like your code and your tests. It's part of the final product you deliver to the rest of your team. It's easy to forget or ignore your commit history, which is probably what leads people to encouraging squashing all commits in a PR. Ugly commit history is a code smell.

I agree with the OP that it's a bad blanket policy. We are fortunate that git has powerful abilities to edit commit history (git rebase -i is my favorite). Rather than just requiring everything to be squashed as a matter of policy, maintainers should instead request that commit history be cleaned up if it's not presentable, IMO.

newjersey · on May 21, 2016

> Your commit history should be readable like your code and your tests. It's part of the final product you deliver to the rest of your team. It's easy to forget or ignore your commit history, which is probably what leads people to encouraging squashing all commits in a PR. Ugly commit history is a code smell.

> I agree with the OP that it's a bad blanket policy. We are fortunate that git has powerful abilities to edit commit history (git rebase -i is my favorite). Rather than just requiring everything to be squashed as a matter of policy, maintainers should instead request that commit history be cleaned up if it's not presentable, IMO.

For what it is worth, I think history is sacred. Never rewrite history (except in extraordinary circumstances and that too very rarely). Of course, do whatever you want on your local machine before pushing to remote but once you have pushed, stop worrying about it.

ereyes01 · on May 21, 2016

> Of course, do whatever you want on your local machine before pushing to remote but once you have pushed, stop worrying about it.

This is what I'm talking about. After you push and share with others, I think you are absolutely right. Before it leaves your computer via git push to a shared location, I advocate that you should tidy things up. Thanks for clarifying...

ikawe · on May 20, 2016

On the other hand, a couple years down the road, a generic pull-request-level comment (added feature foo) can be more useful than a bunch of noisy commits like

    - WIP: feature foo
    - fixed typo
    - whoops! 
    - working
    - reformatted

As is often the case with new contributors to open source projects.

In theory --no-ff merges give you the best of both worlds, but in practice, I traverse git history with 'git blame' or 'git log -p <filename>' neither of which do a good job of surfacing "this commit was merged as part of feature/foo".

Of course, all the info is there... But I haven't yet found a good UI for it.

geofft · on May 20, 2016

I have the following aliases in ~/.gitconfig:

    [alias]
        log0 = log --first-parent -m
        show0 = show --first-parent -m
        blame0 = blame --first-parent -m

These aliases treat history as if each merge commit introduced the changes directly, and didn't have the individual history. (Even for one-patch PRs, this means that the log/blame view will point me to a commit with a link to the pull request, which means I can see any discussion of the PR.) For instance, `git show` on a merge commit will typically show no output (unless there was a merge conflict), but `git show0` will show all changes introduced by the branch being merged in, as if they had all been squashed down.

My experience is that I've really appreciated projects with a policy of mandatory merge commits (e.g. Rust; every commit is tested and merged by the CI bot, and humans only push to master in emergencies) with these in place.

infogulch · on May 20, 2016

Aha! This is what I need to make merge-based workflows palatable! Now if only the Github UI could do the same...

fzzzy · on May 21, 2016

This is the real solution here, updating github's ui so it does it like this, but still allows viewing the individual commits if needed.

fennecfoxen · on May 20, 2016

In theory, this is why we rebase our history and make an elegant series of coherent patches. At least if "we" are the Linux dev team or something.

ikawe · on May 20, 2016

Yup, I agree that would be ideal. I think squash commits are a pragmatic way to deal with messy incoherent commits.

Plus it's easier to revert a squashed commit than a merge. (which is the index number of the parent I want?!)

nfm · on May 20, 2016

I think everyone would agree that the above commit history is bad, and should be avoided.

But solving that issue by adopting a squash-on-merge policy is like doing surgery with a chainsaw. There are better tools for the job.

Upas · on May 20, 2016

There is definite value in having more granular commits, but at the same time, there are commits that don't really belong in the commit history.

While it might take more time than squashing everything, git rebase -i really helps a ton in these situations. I can fixup all the "fixed typo" type of commits into a meaningful commit, and even reword commits if necessary.

nas · on May 21, 2016

I've had exactly this experience multiple times. Having nice logical commits makes it a heck of a lot easier to port features around to different branches later on (if you decide you want to). It makes 'git annotate' much more useful. It makes 'git bisect' more useful. It makes backing out a feature easier.

I don't want to see some massive 3 month worth of feature hacking, 5000 line modification get squashed down into a single commit. I want it broken into the smallest possible working and incremental changes that transform the old code into the new. Each step should make sense for the code maintainers who come along later and try to figure out what you did and why something unexpected broke. ;-)

I don't want to see 200 noisy commits like you say. I've worked with code bases that used both of these styles. The former is vastly superior in my experience.

MaulingMonkey · on May 21, 2016

The problem I find is that squashing those commits together will mean I can't easily figure out what typo was fixed in commit 2, because of the diff-spam reformatting of commit 5. This makes it that much harder to figure out that the typo was a variable name in a dynamic language where all use cases on-branch were fixed, but other use cases (e.g. in my own personal private branch) were not.

I care much more about the actual mechanical code change which broke my stuff, than the feature or branch that happened to contain that mechanical code change.

The same problems apply to reviewing the code in the first place. I'll gladly review 10x the code if it's split into nice tiny micro-commits, even messy ones, than try and suss out exactly all the moving parts that changed from a squashed diff to make sure all potential concerns are addressed. Commit #2 might trigger a "email your coworkers to alert them of the breaking change when submitting" review note, whereas I'm liable to miss the fact that you fixed a typo in the first place in a squashed commit.

dllthomas · on May 21, 2016

As I understood the example, the typo was meant to have been introduced in the "WIP feature foo" commit, and the reformatting was meant to be of the changed code as well.

marssaxman · on May 20, 2016

Some people clearly pay a great deal more attention to commit history than I ever do. The only difference this makes to me is that adding more hoops I have to jump through before you'll accept my patch makes it less likely that I will bother to submit one in the first place.

jonaf · on May 20, 2016

Regardless of your opinion, if you have any interest in code quality (hint: you should), then you should respect the contributing guidelines for the project you are contributing to.

Your argument / opinion as stated could have "commit history" replaced with "unit tests" and in many circles (for some reason), people are still making that argument. For a project to be healthy, a critical eye must be used for every change: from the code, to the tests, to the commit message. If you don't care about any one of these, you are saying you don't care about the quality of the software. I would deny your PR on principle. My project needs more quality than it needs more contributors.

Note that I did not suggest squashing is good or bad. My opinion on that is not related to the fact that you should care about it and treat it as just as important as your contribution, and you should respect the project enough to abide their guidelines for contribution (unless your contribution is revising those guidelines).

marssaxman · on May 22, 2016

Of course. I'm not talking about projects where I have already made a commitment to participate, though: I'm talking about the process before that, where there's some open-source project I am using or otherwise interested in, and I consider getting involved and contributing. The higher the initial barrier to entry, the more likely it is that I'll just shrug my shoulders and say "fuck it". A complex process for contributing changes, with lots of unusual fiddly steps about exactly how commits should be formatted, makes it more likely that I won't ever make that first commit - especially because the first commit tends to be something trivial where it's as much about dipping my toe in and seeing if I want to do something as about the actual change I've made.

So, yes, I certainly do respect the project enough to abide by their guidelines for contribution: but if those guidelines demand too much work of me, that respect will take the form of quietly walking away and spending my time on something else, and the maintainers of the project will likely never know I had any interest in helping to begin with. Guessing that there might be other people like me in the world, then, I suggest that having a complex and specific process for open-source contribution is likely to reduce the pool of devs willing to get involved, and it's worth considering whether the amount of work saved by requiring devs to jump through these hoops outweighs the obstacle that creates for recruitment.

richmarr · on May 20, 2016

I've let at least one PR go stale after having made the maintainer's requested changes, only to find a secondary list of demands such as squashing and 'comments too long'. Way to suck all the enthusiasm out of contributing.

ams6110 · on May 20, 2016

I've had exactly the same experience. I submit a fix for a bug, OK clean up a few cosmetic things, resubmit, then get another list of nitpicks and I just dropped it there. I remember thinking "fuck it, clean it up yourself if you like, or just keep the bug."

ealloc · on May 21, 2016

As a maintainer who sometimes requests PR submitters to make style changes or even squash their PRs: Please tell me you don't want to finish it so that I can do so.

I am reluctant to merge your PR if it has too many style issues or improper commit history. The last 10% can be important when maintaining a large project.

Because of the way git/github work, modifying someone else's PR (by pulling locally + modifying + resubmitting as a new PR) often means their authorship in the commit headers is erased, and I wouldn't want to do that without permission.

richmarr · on May 21, 2016

Totally understandable, but I don't think project owners/maintainers behave quite as rationally as you're presenting here. 'Important', 'style issues', 'improper' and 'large project' are open to interpretation and ego comes into play.

Edit: Not saying PR authors behave perfectly rationally either. Sometimes it's kids in long trousers failing to share their toys, and in the end "my ball my rules" wins.

geofft · on May 20, 2016

I do feel like this sort of thing should be the maintainer's responsibility. I'm pretty familiar with `rebase -i`, but only because I've done a bunch of work with git. Random contributors are as likely as not to be brand new to git. It should be reasonable to tell the maintainer, "Here's my branch, feel free to rebase or squash as you see fit."

Of course, I think the same is true with things like comments, obvious coding style fixes, tests (especially if the maintainer has more access to test infrastructure), etc. I can see the advantage of getting people to do it themselves if the maintainer is busy or especially if the contributor is planning on sticking around for a while, but surely the maintainer can ask "Hey, would you <...>? If not, I can do it."

panic · on May 20, 2016

IMO, comments and tests should be written alongside the original code, not left up to someone else. If you don't have access to the test infrastructure, how did you test your code before submitting it?

geofft · on May 21, 2016

In general, I test things in my environment, for the use case I have, which is the only way I can confirm that it fixes the actual problem I'm having. I'm not an expert in every test harness out there, and I generally expect the maintainer knows better than I do.

But beyond that, for most projects, tests apply to multiple environments, and I'm only guaranteed to have mine. If a project is supported on Windows, Mac, and Linux, on Python 2.7, 3.3, 3.4, and 3.5, I probably have one, maybe two, of those twelve choices. It makes more sense to reuse the project's test infrastructure than for me to track down the other eleven configurations. And as a random contributor, I might not have access to the test environments.

Of course, if the tests fail, that's a valid reason to push back on me (or for the maintainer to fix the problem and squash, if it requires access to an environment I don't have).

Manishearth · on May 21, 2016

You should squash PRs. What you shouldn't do is indiscriminately squash PRs. Having a bunch of "oops, fix" and review fix commits lying around makes it harder for everyone too. It also affects bisection much more. "don't squash PRs" is just bad advice; it's more nuanced than that.

Split your work into bite sized pieces, and amend/squash the relevant fixups into these before merging.

Since these tasks can need knowledge of rebase, I offer to do them for new contributors too, which usually fixes that issue.

ashitlerferad · on May 21, 2016

Use `git rebase -i` but don't squash PRs.

Manishearth · on May 21, 2016

git rebase -i is one of the ways you can squash PRs and there are tons of cases where you should.

geuis · on May 21, 2016

I cannot disagree with this more. Commits should be feature-based, with thorough commit notes. A feature can be a one line change all the way up to something affecting hundreds of lines (or more).

High quality code standards (good variable naming, plentiful comments, etc) tell you what's happening at the code level. The commit history tells you what's happening at the feature level.

Don't pollute history with noise.