I recently had a repository that felt strangely heavy. The source files were not large, but every clone took longer than expected. After checking the repository size, the reason was obvious: some large files had been committed in the past and were still living in Git history.
Confirm the Size Problem
The first thing I checked was the .git directory:
$ du -sh .git
2.5G .git
For that project, 2.5 GB made no sense. The current working tree was much smaller, so the extra weight had to be in historical objects.
List the Largest Blobs
This command prints the largest files stored anywhere in Git history:
$ git rev-list --objects --all | \
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | \
sed -n 's/^blob //p' | \
sort --numeric-sort --key=2 --reverse | \
head -n 10 | \
awk '{
split("B KB MB GB TB", unit);
v=$2; i=1;
while (v>=1024 && i<5) { v/=1024; i++ }
printf "%-12s %7.2f %s\t%s\n", $1, v, unit[i], $3
}'
The output looked like this:
a1b2c3d4e5f6 125.50 MB backend/apiserver/logs/app.log
f6e5d4c3b2a1 45.20 MB backend/apiserver/data/test.db
9876543210ab 12.30 MB assets/screenshots/debug-full.png
The main problem was not one large source file. It was a directory that had collected logs and test databases during local debugging.
Remove the Path from History
I used git-filter-repo for the rewrite:
$ brew install git-filter-repo
Then I removed the directory from all history:
$ git filter-repo --path backend/apiserver --invert-paths
--invert-paths means "remove the matching path and keep everything else". If you only need to remove one file, pass the exact file path instead:
$ git filter-repo --path backend/apiserver/logs/app.log --invert-paths
As with every history rewrite, a fresh clone is the safer place to do this. If you run it in an existing checkout, the tool may require --force, but I try not to make that the default habit.
Repack and Check Again
After the rewrite, run garbage collection:
$ git gc --prune=now --aggressive
Then check the size again:
$ du -sh .git
450M .git
In this case, the repository dropped from about 2.5 GB to 450 MB. That was still not tiny, but it matched the real project size much better.
Push Carefully
Because the commit hashes changed, the remote history needs a force push:
$ git push --force --all origin
$ git push --force --tags origin
Do this only after the team knows what is happening. Anyone who has the old history locally will need to clean up their clone, reset their branches, or clone again.
Preventing the Same Problem
The practical fixes are simple:
- Add generated logs, local databases, dump files, and temporary outputs to
.gitignore. - Use Git LFS for large binary files that really do belong in the repository.
- Review
git statusbefore committing from a directory that contains build or debug output. - Run the large-blob command occasionally for repositories that receive many assets or test fixtures.
The most useful lesson for me was that deleting a file in a later commit does not remove it from history. If a 100 MB file was committed once, every clone still pays for it until the history is rewritten.