git-partial-submodule
September 4, 2021 · Comments
Have you ever thought about adding a submodule to your git project, but you didn’t want to bear the burden of downloading and storing the submodule’s entire history, or you only need a handful of files out of the submodule?
Git provides partial clone
and sparse checkout
features that can make this happen for top-level repositories, but so far they aren’t available for
submodules. That’s a hole I aimed to fill with this project. git-partial-submodule is a tool for
setting up submodules with blobless clones. It can also save sparse-checkout patterns in your
.gitmodules
file, allowing them to be managed by version control, and automatically applied when
the submodules are cloned.
As a motivating example, a fresh clone of Dear ImGui consumes
about 80 MB (of which 75 MB is in the .git
directory) and takes about 10 seconds to clone on a
fast connection. It also brings in roughly 200 files, including numerous examples and backends and
various other ancillary files. The actual ImGui implementation—the part you need for your app—is
in 11 files totaling 2.5 MB.
In contrast, a blobless, sparse clone of Dear ImGui requires only about 7 MB (4.5 MB in the .git
directory), takes ~2 seconds to clone, and checks out only the files you want.
(This is not to pick on Dear ImGui at all! These issues arise with any healthy, long-lived project, and the history bloat in particular is an artifact of git’s design.)
One way developers might address this is by “vendoring”, or copying the ImGui files they need into their own repository and checking them in. That can be a legitimate solution, but it has various downsides.
Another solution supported out of the box by git is “shallow” clones, which essentially only download the latest commit and no history. Submodules can be configured to be cloned shallowly. This works, and is useful in some cases such as cloning on a build machine where you’re not going to be manipulating the repository at all. However, shallow clones make it difficult to do normal development workflows with the submodule. In contrast, a blobless clone functions normally with most workflows, as it can download missing data on demand.
Since git’s own submodule commands do not (yet) allow specifying blobless mode or sparse checkout,
I built git-partial-submodule to work around this. It’s a single-file Python script that you use
just for the initial setup of submodules. Instead of git submodule add
, you do
git-partial-submodule.py add
. When cloning a repository with existing submodules, you use
git-partial-submodule.py clone
instead of recursively cloning or git submodule update --init
.
It works by manually calling git clone
with the blobless/sparse options, setting up the submodule
repo in your .git/modules
directory, and hooking everything up so git sees it as a legit submodule.
Afterward, ordinary submodule operations such as fetches and updates should work normally—although
I haven’t done super extensive testing on this, and I’ve been warned that blobless/sparse are still
experimental git features that may have sharp edges.
The other thing git-partial-submodule does is to save and restore sparse-checkout patterns in your
.gitmodules
for each submodule. When you only need a subset of the submodule’s file tree, this
lets you manage those patterns under version control in the superproject, so that others who clone
the project (and are also using git-partial-submodule) will automatically get the right set of
files. You can configure this using the ordinary git sparse-checkout
commands, but currently you
have to remember to do the extra step of saving the patterns to .gitmodules
when changing them, or
restoring the patterns from .gitmodules
after pulling/merging. This might be able to be
automated further using some git hooks, but I haven’t looked into it yet.
I’m excited to try out this workflow for some of my own projects, replacing vendored projects with partial submodules, and I hope it will be helpful to some others out there as well. Issues and PRs are open on GitHub, and contributions are welcome. If you end up trying this, let me know if it works for you!