Git/git-svn

Overview edit

Subversion is an extremely popular version control system, and there are many OSS and proprietary projects that use it. Git comes with an excellent utility, git-svn, that allows a user to both track a project maintained in a subversion repository, as well as participate. Users can generate local patches to send to a mailing list, or even commit changes directly back into the repository (given that they have commit access, of course).

Getting Started edit

To begin using git with a subversion-hosted project, you must create a local repository for your files, as well as configure git-svn. Typically you'll use commands much like the following:

mkdir project
cd project
git-svn init <url to repository root> -T/path/to/trunk
git-svn fetch -r <first rev>:HEAD

Usually when working with subversion repositories, you're given the full project URL. To determine the url to the repository root, you can issue the following command:

svn info <full project URL>

There will be a line in the output stating the repository root. The path to trunk is simply the rest of the URL that follows. It is possible to simply give git-svn the full project URL, but doing it this way gives you greater flexibility should you end up working on subversion branches down the line.

Also note the "first rev" argument. You could simply use "1", as that is guaranteed to work, but it would likely take a very long time (especially if the repository is over a slow network). Usually, for an entrenched project, the last 10-50 revs is sufficient. Again, svn info will tell you what the most recent revision is.

Interacting with the repository edit

Chances are if you're using git instead of svn to interact with a subversion repository, its because you want to make use of offline commits. This is very easy to do, if you keep a few caveats in mind. Most important is to never use "git pull" while in a branch from which you plan to eventually run git-svn dcommit. Merge commits have a tendency to confuse git-svn, and you're able to do most anything you'd need without git pull.

The most common task that I perform is to combine the change(s) I'm working on with the upstream subversion changes. This is the equivalent to an svn update. Here's how its done:

git stash  # stash any changes so you have a clean tree
git-svn fetch # bring down the latest changes
git rebase trunk
git stash apply

The first and last steps are unnecessary if your tree is clean to begin with. That leaves "git rebase trunk" as the primary operation. If you're unfamiliar with rebasing, you should go read the documentation for git-rebase. The jist of it is that your local commits are now on top of svn HEAD.

Dealing with local changes edit

It often happens that there are some changes you want to have in a repository file that you do not want to propagate. Usually this happens with configuration files, but it could just as easily be some extra debugging statements or anything else. The danger of committing these changes is that you'll run "git-svn dcommit" in the branch without weeding out your changes. On the other hand, if you leave the changes uncommitted, you lose out on git's features for those changes, and you'll have to deal with other branches clashing with the changes. A dilemma!

There are a couple of solutions to this issue. Which one works better is more a matter of taste than anything. The first approach is to keep a "local" branch for each branch that you want to have local changes. For example, if you have local changes that you want in the branch "foo", you would create a branch "foo-local" containing the commit(s) with the changes you want to keep local. You can then use rebase to keep "foo" on top of "foo-local". e.g.:

git rebase trunk foo-local
git rebase foo-local foo

As the example code implies, you'll still spend most of your time with "foo" checked out, rather than "foo-local." If you decide on a new change that you want to keep locally, you're again faced with two choices. You can checkout "foo-local" and make the commit, or you can make the commit on "foo" and then cherry-pick the commit from foo-local. You would then want to use git-reset to remove the commit from "foo".

As an alterative to the rebase-centric approach, there is a merge-based method. You still keep your local changes on a separate branch, as before. With this method, however, you don't have to keep "foo" on top of "foo-local" with rebase. This is an advantage, because 1) its more to type, and 2) historically, rebase has often asked you to resolve the same conflict twice if any conflicts occur during the first rebase.

So instead of using rebase, you create yet another branch. I call this the "build" branch. You start the build branch at whatever commit you want to test. You can then "git merge" the local branch, bringing all your changes into one tree. "But I thought you should avoid merge?" you ask. The reason I like to call this branch the "build" branch is to dissuade me from using "git-svn dcommit" from it. As long as its not your intention to run dcommit from the branch, the use of merge is acceptable.

This approach can actually be taken a step further, making it unnecessary to rebase your topic branch "foo" on top of trunk every day. If you have several topic branches, this frequent rebasing can become quite a chore. Instead:

git checkout build
git reset --hard trunk # Make sure you dont have any important changes
git merge foo foo-local # Octopus merges are fun

Now build contains the changes from trunk, foo, and foo-local! Often I'll keep several local branches. Perhaps one branch has your local configuration changes, and another has extra debugging statements. You can also use this approach to build with several topic branches in the tree at once:

git merge topic1 topic2 config debug...

Unfortunately, the octopus merge is rather dumb about resolving conflicts. If you get any conflicts, you'll have to perform the merges one at a time:

git merge topic1
git merge topic2
git merge local
...

Sending changes upstream edit

Eventually, you'll want your carefully crafted topic branches and patch series to be integrated upstream. If you're lucky enough to have commit access, you can run "git-svn dcommit". This will take each local commit in the current branch and commit it to subversion. If you had three local commits, after dcommit there would be three new commits in subversion.

For the less fortunate, your patches will probably have to be submitted to a mailing list or bug tracker. For that, you can use git-format-patch. For example, keeping with the three-local-commits scenario above:

git format-patch HEAD~3..

The result will be three files in $PWD, 0001-commit-name.patch, 0002-commit-name.patch, and 0003-commit-name.patch. You're then free to mail these patches of or attach them to a bug in Bugzilla. If you'll be mailing the patches, however, git can help you out even a little further. There is the git-send-email utility for just this situation:

git send-email *.patch

The program will ask you a few questions, most important where to send the patches, and then mail them off for you. Piece of cake!

Of course, this all assumes that you have your patch series in perfect working order. If this is not the case, you should read about "git rebase -i".

Examples edit

Get Pywikipedia:

$ git svn init http://svn.wikimedia.org/svnroot/pywikipedia/trunk/pywikipedia/
Initialized empty Git repository in .../.git/
$ git svn fetch -r 1:HEAD
...
r370 = 318fb412e5d1f1136a92d079f3607ac23bde2c34 (refs/remotes/git-svn)
        D       treelang_all.py
        D       treelang.py
W: -empty_dir: treelang.py
W: -empty_dir: treelang_all.py
r371 = e8477f292b077f023e4cebad843e0d36d3765db8 (refs/remotes/git-svn)
        D       parsepopular.py
W: -empty_dir: parsepopular.py
r372 = 8803111b0411243af419868388fc8c7398e8ab9d (refs/remotes/git-svn)
        D       getlang.py
W: -empty_dir: getlang.py
r373 = ad935dd0472db28379809f150fcf53678630076c (refs/remotes/git-svn)
        A       splitwarning.py
...

Get AWB (AutoWikiBrowser):

$ git svn init svn://svn.code.sf.net/p/autowikibrowser/code/
Initialized empty Git repository in .../.git/
$ git svn fetch -r 1:HEAD
...
r15 = 086d4ff454a9ddfac92edb4013ec845f65e14ace (refs/remotes/git-svn)
        M       AWB/AWB/Main.cs
        M       AWB/WikiFunctions/WebControl.cs
r16 = 14f49de6b3c984bb8a87900e8be42a6576902a06 (refs/remotes/git-svn)
        M       AWB/AWB/ExitQuestion.Designer.cs
        M       AWB/WikiFunctions/GetLists.cs
        M       AWB/WikiFunctions/Tools.cs
r17 = 8b58f6e5b21c91f0819bea9bc9a8110c2cab540d (refs/remotes/git-svn)
        M       AWB/AWB/Main.Designer.cs
        M       AWB/AWB/Main.cs
        M       AWB/WikiFunctions/GetLists.cs
r18 = 51683925cedb8effb274fadd2417cc9b1f860e3c (refs/remotes/git-svn)
        M       AWB/AWB/specialFilter.Designer.cs
        M       AWB/AWB/specialFilter.cs
r19 = 712edb32a20d6d2ab4066acf056f14daa67a9d4b (refs/remotes/git-svn)
        M       AWB/WikiFunctions/WPEditor.cs
r20 = 3116588b52a8e27e1dc72d25b1981d181d6ba203 (refs/remotes/git-svn)
...

Beware, this downloading operation can take one hour.