Git Submodules vs. Subtrees vs. Monorepos

A Git repository is one object database and one commit graph. Once a codebase needs another project's code inside it — a shared library, a vendored dependency, an internal service — there are exactly three structural answers: keep the history external and reference it (submodules), keep the history external and merge its files in (subtrees), or never split the history at all (monorepo). The difference between them is where the history boundary sits, how dependency versions get pinned, and how much tooling the resulting workflow requires.

Submodules

A submodule is a pointer, not a copy. The parent repository stores a tree entry with mode 160000 — a gitlink — that records the exact commit SHA of the submodule's HEAD at the time it was added or last updated. The submodule remains a fully independent repository with its own object database, refs, and remote; the parent only ever stores a 40-character reference to one of its commits.

.gitmodules
[submodule "vendor/widgets"]
    path = vendor/widgets
    url = https://github.com/example/widgets.git
    branch = main

Two consequences follow from the gitlink model:

  • Cloning the parent does not fetch submodule content by default — the directory exists but is empty until explicitly initialized.
  • The submodule's own .git is normally a one-line gitfile pointing into the parent's .git/modules/<name>, so submodule metadata for the whole tree lives in one place rather than scattered across nested .git directories.

Workflow

bash
# add a submodule and commit the gitlink
git submodule add https://github.com/example/widgets.git vendor/widgets
git commit -m "Add widgets submodule"

# clone a repo that already has submodules
git clone --recurse-submodules https://github.com/example/app.git

# or, after a plain clone
git submodule update --init --recursive

# move every submodule's tracking branch forward
git submodule update --remote --merge

# run an arbitrary command inside every submodule
git submodule foreach 'git checkout main && git pull'
Pitfall

Updating a submodule's working tree does not update the parent. The parent only tracks a pinned commit, so after pulling new submodule commits you still need to stage and commit the new gitlink in the parent repo — otherwise collaborators stay pinned to the old commit on their next checkout.

bash — committing a pointer bump
cd vendor/widgets && git checkout main && git pull
cd ../..
git add vendor/widgets
git commit -m "Bump widgets submodule to latest main"
Failure mode

If a parent commit references a submodule commit that was never pushed to the submodule's own remote, the gitlink is unresolvable for anyone else who clones the parent. Push the submodule before pushing the parent pointer that references it.

Removing a submodule cleanly requires three steps, since its metadata lives in three places — the working tree, the index, and the centralized .git/modules store:

bash — removal
git submodule deinit -f vendor/widgets
git rm vendor/widgets
rm -rf .git/modules/vendor/widgets

A checked-out submodule sits in detached HEAD state by default, pinned at the recorded commit rather than tracking a branch. CI pipelines need explicit recursive-checkout handling, private submodule URLs need their own credential configuration separate from the parent repo, and contributors who forget --recurse-submodules or submodule update --init end up with empty directories and confusing build failures.

Subtrees

A subtree import has no gitlink and no .gitmodules. The subproject's files are merged directly into a subdirectory of the parent's tree using Git's subtree merge strategy. To anyone who clones the parent afterward, vendor/widgets/ is an ordinary tracked directory in ordinary history — no special clone flags, no separate object database, no pointer to resolve.

bash — import
git subtree add --prefix=vendor/widgets https://github.com/example/widgets.git main --squash
--squash

Without --squash, every commit from the upstream project's entire history is imported into the parent's object database, growing it permanently. With --squash, the import collapses into a single commit on the parent's history — upstream history is still reachable from the upstream remote, just not carried along locally.

bash — sync with upstream
# pull upstream changes into the subdirectory
git subtree pull --prefix=vendor/widgets https://github.com/example/widgets.git main --squash

# push local changes made under the prefix back upstream
git subtree push --prefix=vendor/widgets https://github.com/example/widgets.git contribution-branch

# extract the prefix's history onto its own branch
git subtree split --prefix=vendor/widgets -b widgets-only
Cost of push and split

subtree push and subtree split walk the parent's entire commit history to filter out everything outside the prefix and reconstruct a linear history for it. On a large, long-lived repository this is CPU-bound and can take minutes, regardless of how small the actual change being pushed is.

Conflict resolution during subtree pull is an ordinary three-way merge — familiar to anyone who has resolved a regular Git merge conflict, unlike the non-fast-forward pointer failures submodules produce. The tradeoff is repository hygiene: without disciplined use of --squash, subtree imports bloat the parent repo's size and clutter git log with merge commits (filter them out locally with git log --first-parent).

Advantage over submodules

Downstream consumers need nothing special. A plain git clone gets full content immediately — no --recurse-submodules, no separate init step, no risk of an unresolvable pointer.

Monorepos

A monorepo keeps every project in one history from the start: no external repo, no merge boundary, no pinned-commit pointer to keep in sync. A commit that touches a shared library and every consumer of that library is just one ordinary commit — there's no window where the library and its consumers reference different versions of each other, and a single CI run can validate the whole change.

Scaling

The tradeoff is working-tree and clone size, which grows with the entire organization's code rather than one project's. Microsoft's pre-2017 Windows source repository held more than 3.5 million files and exceeded 270 GB — a plain git clone took over 12 hours and git status took close to 10 minutes, which is what forced the development of a virtual filesystem layer (GVFS) and, later, the Git-native scaling features now shipped in core Git.

bash — partial clone + sparse-checkout
# fetch commit and tree objects only; defer file content until needed
git clone --filter=blob:none --sparse https://github.com/example/monorepo.git
cd monorepo

# populate the working tree with only the directories this checkout needs
git sparse-checkout init --cone
git sparse-checkout set services/billing libs/protobuf-defs

--filter=blob:none defers downloading file content until it's actually checked out or read; cone-mode sparse-checkout then restricts which directories populate the working tree at all. Combined, a clone of a multi-hundred-gigabyte monorepo can be reduced to only the paths a given contributor touches, while the full object history remains fetchable on demand.

bash — scalar (merged into core Git 2.38)
git scalar clone https://github.com/example/monorepo.git
Note

git scalar wraps partial clone, cone-mode sparse-checkout, and background maintenance (commit-graph updates, repacking) behind one entry point with opinionated defaults — it configures the same primitives shown above rather than replacing them.

Tooling layer

Git itself has no concept of independently buildable packages, so a monorepo needs a build orchestration layer on top: something that maps file changes to affected packages, caches build/test results per package, and parallelizes work across the dependency graph. Common choices: Bazel and Buck2 (language-agnostic, hermetic builds), Nx and Turborepo (JS/TS-focused task graphs and remote caching), and workspace-native package managers (pnpm workspaces, Yarn workspaces) for dependency linking without a full build-graph tool.

No native path ACLs

Git has no built-in mechanism to restrict read access to part of a repository — anyone with clone access sees the entire history. CODEOWNERS files route review requests by path but do not enforce read restrictions; teams that need hard access boundaries between projects generally need that to be a reason to not use a single repository, or to layer enforcement at the hosting platform.

Comparison

Dimension Submodules Subtrees Monorepo
History boundary Separate repo, separate object DB Merged into parent's history None — single history
Dependency pinning Explicit gitlink to one commit SHA Implicit — whatever was last merged in N/A — always at current commit
Consumer clone Needs --recurse-submodules or init step Plain git clone works Plain clone, but may need partial clone / sparse-checkout at scale
Cross-project atomic commits No — two repos, two commits No — two repos, two commits Yes — one commit
Repo size impact None — content stays external Grows parent unless --squash used Grows with every project combined
Sync/update cost Cheap — update pointer, commit Expensive on push/split (history filtering) N/A — no sync step exists
Tooling required None beyond core Git None beyond core Git Build graph tool (Bazel, Nx, Turborepo, etc.) at any real scale
Access control granularity Per-repo, natively Per-repo for upstream, none for merged copy None native — whole repo or nothing

Choosing

Use submodules when

The dependency has its own release cadence and ownership boundary, you need an explicit, auditable pin to one exact commit, and contributors are comfortable with the two-step commit-and-push workflow.

Use subtrees when

You want vendored code physically present with zero extra steps for downstream clones, updates from upstream are infrequent, and occasional slow push/split operations are acceptable.

Use a monorepo when

Projects change together often enough that cross-project atomic commits matter more than independent versioning, and you're willing to adopt partial clone, sparse-checkout, and a build orchestration tool before the repo outgrows plain Git.