Split git repositories by subfolders with history

This article describes how to turn the subfolders in a git repository into new git repositories, keeping the history.

In our case we have a big git repository (2GB) with slides for a lot of courses. Every course was a subdirectory in the repository. So even if you hold just one course, you have to clone everything. That’s why we decided to put every course into a separate repository. We use SCM-Manager for our repository and could therefore use its REST api in the script.

Step-by-Step

The whole procedure is based on the git command filter-branch (see https://www.kernel.org/pub/software/scm/git/docs/git-filter-branch.html). We use the --subdirectory-filter option to turn a subdirectory into a new project root. So here we go:

  1. Get the list of subdirectories. This could be done in several ways: Clone your repository and run a find/ls or use an api, if your git repository provides something like this. This is the first block in our script. If you like, you could run this as a separate step and change the directory list as you like.
  2. For every subdirectory
    1. clone the repo
    2. run the filter
    3. create a new repository with Access rights (owner and group write here)
    4. push your changes to the new repository
    5. clean up. After every  git filter-branch run we have to clone the original repository again, because the filter changes your local repository.

    NOTE: The sub directory name could contain spaces or even umlauts. So we have to replace that to get a valid repository name. sed is our choice here.

So the complete script is:

#!/bin/sh

#get directory list
git clone https://git.spree.de/scm-manager/git/presentations/schulungen.git schulungen.git
cd schulungen.git
ls -1d */ | sed 's#/##' > ../dirs.txt
cd ..
rm schulungen.git

# filenames with space - so dont use a for-loop
cat dirs.txt | while read i
do
REPO=$i
REPO_NAME=`echo $REPO | sed 's/\([\ä\ö\ü\Ä\Ü\Ö]\)/\1e/g;y/\ä\ö\ü\Ä\Ö\Ü/aouAOU/;s/\ß/ss/g;s/ /-/g'`

#filter
git clone https://git.spree.de/scm-manager/git/presentations/schulungen.git schulungen.git
cd schulungen.git
git filter-branch --subdirectory-filter "$REPO" -- --all

#create new repository
DATA="<repositories><name>presentations/Schulungen/$REPO_NAME.git</name><type>git</type><permissions><groupPermission>true</groupPermission><name>ats</name><type>WRITE</type></permissions><permissions><groupPermission>false</groupPermission><name>rmagnus</name><type>OWNER</type></permissions><description>Schulungen $REPO</description></repositories>"
curl -s -XPOST -n -H 'content-type: application/xml' -d "$DATA" https://git.spree.de/scm-manager/api/rest/repositories.xml

# add the new repository as remote
git remote add schulung https://git.spree.de/scm-manager/git/presentations/schulungen/$REPO_NAME.git

# and push
git push schulung

#clean up
cd ..
rm schulungen.git

done

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s