quick hack how to move a part of a Mercurial (hg) repo to git
Saturday, 10 July 2010
My quick dirty hack to move a (small) part of a Mercurial (hg) repo to Git. In my case this was for moving my single file “testlib.py” from here on bitbucket to here on github
-
Dump the log of that part of the hg repo to a file.
export WORKDIR=$HOME/tmp/migrate cd HGREPO/foo hg log -pv . > $WORKDIR/full.patch
-
Create the starter git repo
cd $WORKDIR git init foo
-
Break up the log into a number of “changesetNNN.patch” files with this Python code:
# parse.py import codecs changeset = [] i = 0 def write_changeset(): global changeset if not changeset: return codecs.open("changeset%05d.patch" % i, 'w', "utf-8").write(''.join(changeset)) i += 1 changeset = [] for line in open("full.patch"): if line.startswith("changeset:"): write_changeset() changeset.append(line) write_changeset()
and then run:
python parse.py
-
Apply and commit each patch with this Python script
# Usage: python apply.py TARGET-REPO-BASE-DIR import os from os.path import * from glob import glob import subprocess from pprint import pprint def apply_patch(target_repo_base_dir, patch_path): content = open(patch_path).read() header, diff = content.split('\n\n\n', 1) assert diff.startswith("diff --git a") fields = {} lines = header.splitlines(False) for i, line in enumerate(lines): key, value = line.split(':', 1) if key == "description": fields[key] = '\n'.join(lines[i+1:]) break value = value.strip() fields[key] = value # Do any path renamings here. For example, I wanted to move from # "testlib/testlib.py" in the old repo to "lib/testlib.py" in the new. diff = diff.replace('a/testlib/testlib.py', 'a/lib/testlib.py') diff = diff.replace('b/testlib/testlib.py', 'b/lib/testlib.py') f = open(patch_path+".diff", 'w') f.write(diff) f.close() f = open(patch_path+".msg", 'w') f.write(fields["description"]) f.close() cwd = target_repo_base_dir subprocess.check_call(['git', 'apply', '--whitespace=nowarn', abspath(patch_path+".diff")], cwd=cwd) subprocess.check_call(['git', 'add', 'lib/testlib.py'], cwd=cwd) subprocess.check_call(['git', 'commit', '--date', fields["date"], '-F', abspath(patch_path+".msg")], cwd=cwd) if __name__ == "__main__": target_repo_base_dir = sys.argv[1] patches = list(sorted(glob("changeset*.patch"))) for patch_path in patches: print "--", patch_path apply_patch(target_repo_base_dir, patch_path)
Then run:
python apply.py foo # apply all changeset*.patch files to "foo" git repo
Now you can push this Git repo to github or whereever.
To improve on
- You currently need to manually create the dir structure first.
- This doesn’t currently used the parse “user” field from the hg commit log for the “git commit -a AUTHOR” command. Mainly this is because I didn’t need that, but also because the “user” value in the hg log isn’t the configured full user name and email, but just the short username. Maybe that was just me.