Splitting a Git repository

First some backstory…

I had a public and a private Subversion repository on my web server, and when I started a new project I’d import it into one of them. This is nice because I get versioning and history, plus I get implicit synchronization between my various development boxen.

It’s not unusual to have one monolithic Subversion repository for many different projects, mainly because setting up a Subversion repository can take a small bit of work, especially if you are serving it from HTTP. However, Git makes it so easy to create new repositories that there is no excuse not to create per-project repositories, not to mention that you should anyway since Git doesn’t support checking out a subdirectory of a repository. You simply can’t use a monolithic Git repository because you have to clone the whole big tree, even if you only plan on working on one subdirectory.

Since Git is so much more awesome, I plan on converting my Subversion repositories over. But what do I do with the multi-project ones? There’s no built-in mechanism for pulling out just one directory, with history, into a new repository. So I wrote one.

The usage is git-pluck src-repo dest-repo path/to/directory. The script copies the repository at src-repo to a new directory, dest-repo, and then does its magic. It rewrites all of the commits so that all of the files in path/to/directory are moved into the root of the repository, and everything else is deleted. Commits that do not introduce changes to that directory are removed from the history, potentially including the first commit to the repository. Finally, the reflogs and backups are removed and the repository is compacted, leaving you with a small, single-project repository.

This script was written and tested using Git 1.5.6.5. Feedback is welcome!

3 Replies to “Splitting a Git repository”

  1. If you look at the script, you will see that it is a bash script that uses git-filter-branch, among other things. It uses index-filter and commit-filter, for the following reasons:

    1. For each commit, tree-filter causes a checkout of that commit. Then your filter is run, and finally the commit is recreated using the state of the working tree. index-filter operates directly against the commit’s index as stored by git, and does not require a checkout. This means that index-filter is substantially faster.

    2. prune-empty was added sometime on or after Git 1.6.0. I use a commit-filter that does exactly the same thing as prune-empty.

  2. You might also be interested in Avery Pennarum’s git-subtree (git://github.com/apenwarr/git-subtree.git), which designed for a similar workflow and is a potential future addition to git or git/contrib. It works pretty excellently in my experience.

Comments are closed.