Extracting parts of a Git repository while preserving history

siddhi-project-structureIf you have multiple modules within a git repository, it’s only a matter of time for it to get bulky and clutter enough to make you want to refactor it and move some modules to their own separate repositories (for better maintainability). If that bulky repository has several contributors, it’s also critical to preserve contribution history (i.e commits, changes), even when you move it into several new repositories.

Recently I came up with such situation. There, I had to move several extension from a parent repo into their own separate repos as shown in the figure. And here’s how I did it;

  1. Fork the original repository into your account. Here, I have forked wso2/siddhi repo into grainier/siddhi
  2. Clone the forked repo into your local machine. Here, I cloned it into a new directory named “siddhi-execution-time” (I’m going to extract time extension first).
  3. Go inside cloned directory, and remove all remote tracking from the cloned git repository.
  4. Now it’s time to filter and prune. Doing this will clean all the other directories as well as their git history, while only keeping files and history of filtered subdirectory.

    It will result in something like this;screenshot-from-2016-11-30-092136
  5. Now, fix the project structure (fixing pom file, add a .gitignore, add other required modules etc..) as you want.
  6. Add new files to git and commit those changes (remember still you cannot push it since we haven’t set the remote tracking location yet).
  7. Now create a new git repo in GitHub (or any git provider), and copy it’s remote tracking location (mine is https://github.com/grainier/siddhi-execution-time.git). It doesn’t matter whether this is an empty repo or existing repo (with files).
  8. Now add that link as the remote tracking location to our local git repo.
  9. [optional] If it’s not an empty repo, do a git pull and merge the content with our repository.
  10. Finally, push the restructured local repo to remote-tracking branch;

    That’s it, you’re all done 🙂

Create tar archive per subdirectory

Following bash script will generate a tar archive per subdirectory found within a given directory.

Write this code in a .sh file (i.e archive.sh ). Put it in your desired root directory. Execute the .sh using ./archive.sh command.