How to migrate from subversion to git with almost no down time?
Last year I was in charge of SVN to Git migration at the company where I work for. We wanted to migrate the history as well. In our case, there were about 40,000 revisions made during the last 8 years. In order to minimize developers’ downtime I did a lot of scripting preparation ahead of time. The actual switch from SVN to Git took less than 2 hours. Here are the steps that we took.
1. Retrieve a list of all committers
You’ll need to create a list of users that have committed to the SVN repo and then convert those users over to the Git format as Subversion only supplies the username of the person committing and not the username and email. To retrieve the list of users from SVN, create a new folder, right-click and select Git Bash Here to open a Git command window. Run the following command:
svn log http://url/to/svn/repository -q | awk -F '|' '/^r/ {sub("^ ", "", $2);
sub(" $", "", $2);
print $2" = "$2" <"$2">"}' | sort -u > users.txt
Note: this will take a couple of minutes to complete based on the size of your repository, number of commits, and number of committers.
The text file will have separate lines for each committer and will need to be transformed from vkarpach = vkarpach <vkarpach>
to vkarpach = Viktar Karpach <vkarpach@company.com>
2. Clone the repository using git-svn
Note - this step will take hours to complete, so it is suggested to run this step overnight on a dedicated box. Run the following command to convert the repository to a Git repository:
git svn clone --stdlayout --no-metadata -A users.txt http://url/to/svn/repository dest_dir-tmp
3. Make a copy of this folder.
git svn clone takes a lot of time. For our main project, it took 48 hours for about 18000 commits. Make a copy of this folder, so you don’t need to do it again. Create scripts for the next steps, when you are ready to switch you can do it quickly.
4. Fetch the latest commits.
The team continued to use Subversion until the very last moment, so while working on migration scripts from time to time I had to fetch the latest commits.
git svn fetch
git reset --hard trunk
5. Clean up the script.
Delete tags
for t in `git branch -r | grep 'tags/' | sed s_tags/__` ; do
git tag $t tags/$t^
git branch -d -r tags/$t
done
Delete trunk, since we will use master from now on.
git branch -d -r trunk
Remove SVN references
git config --remove-section svn-remote.svn
rm -rf .git/svn .git/{logs/,}refs/remotes/svn/
And finally, convert the remaining remote branches to local branches
git config remote.origin.url .
git config --add remote.origin.fetch +refs/remotes/*:refs/heads/*
git fetch
Remove remote branches:
for t in `git branch -r` ; do
git branch -d -r $t
done
Git doesn’t support space in branch names, so git svn fetch replaced spaces with %20. I think it is more aesthetic to use underscore instead of %20:
for t in `git branch -a|grep '%20'` ; do
newName=`echo $t | sed 's/%20/-/g'`
git branch -m $t $newName
done
You might want to delete some unused branches:
for t in `cat ../list_of_branches_for_deletion.txt`; do
git branch -D $t
done
Where list_of_branches_for_deletion.txt
contains branch names that will be deleted. Use the following code to populate these files:
git branch -a > ../list_of_branches_for_deletion.txt
Manually edit list_of_branches_for_deletion.txt
file. Leave only those branches that you want to delete.
6. Replace any svn externals with git submodules
git submodule add ssh://git@git.company.com:7999/ProjectName/external_repo.git ExternalFolderName
git commit -m "Added submodules"
Only use git submodules for external projects that don’t change very often. We had to combine our internal projects in one git repository since it is hard to maintain submodules for rapidly changing projects. Each project gets its own directory in the git repository:
Before migration:
svn_main_project
external_1
external_1_folder_1
external_1_folder_2
external_2
external_2_folder_1
external_2_folder_2
svn_main_project_folder_1
svn_main_project_folder_2
Where svn_main_project
has to externals external_1
and external_2
.
After migration
git
svn_main_project
svn_main_project_folder_1
svn_main_project_folder_2
external_1
external_1_folder_1
external_1_folder_2
external_2
external_2_folder_1
external_2_folder_2
You can use following bash script to push everything in sub_folder, so later you can combine repositories. The script will modify commit history as well.
git filter-branch --index-filter \
'git ls-files -s | sed "s-\t\"*-&sub_folder/-" |
GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
git update-index --index-info &&
mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE" || true' HEAD
7. Get your repository onto the server
Create a repository on your git server.
Init local repository:
git init
Use the following if you are combining repositories:
git remote add external_1 ../external_1/
git pull external_1 master
git remote rm external_1
Add gitignore
cp ../gitignore.txt .gitignore
git add .
git commit -m "Added .gitignore"
Push all branches in one shot:
git remote add origin ssh://git@git.company.com:7999/repo.git
git push --all origin