I recently had a repository that felt strangely heavy. The source files were not large, but every clone took longer than expected. After checking the repository size, the reason was obvious: some large files had been committed in the past and were still living in Git history.

Confirm the Size Problem

The first thing I checked was the .git directory:

$ du -sh .git
2.5G    .git

For that project, 2.5 GB made no sense. The current working tree was much smaller, so the extra weight had to be in historical objects.

List the Largest Blobs

This command prints the largest files stored anywhere in Git history:

$ git rev-list --objects --all | \
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | \
  sed -n 's/^blob //p' | \
  sort --numeric-sort --key=2 --reverse | \
  head -n 10 | \
  awk '{
      split("B KB MB GB TB", unit);
      v=$2; i=1;
      while (v>=1024 && i<5) { v/=1024; i++ }
      printf "%-12s %7.2f %s\t%s\n", $1, v, unit[i], $3
  }'

The output looked like this:

a1b2c3d4e5f6   125.50 MB    backend/apiserver/logs/app.log
f6e5d4c3b2a1    45.20 MB    backend/apiserver/data/test.db
9876543210ab    12.30 MB    assets/screenshots/debug-full.png

The main problem was not one large source file. It was a directory that had collected logs and test databases during local debugging.

Remove the Path from History

I used git-filter-repo for the rewrite:

$ brew install git-filter-repo

Then I removed the directory from all history:

$ git filter-repo --path backend/apiserver --invert-paths

--invert-paths means "remove the matching path and keep everything else". If you only need to remove one file, pass the exact file path instead:

$ git filter-repo --path backend/apiserver/logs/app.log --invert-paths

As with every history rewrite, a fresh clone is the safer place to do this. If you run it in an existing checkout, the tool may require --force, but I try not to make that the default habit.

Repack and Check Again

After the rewrite, run garbage collection:

$ git gc --prune=now --aggressive

Then check the size again:

$ du -sh .git
450M    .git

In this case, the repository dropped from about 2.5 GB to 450 MB. That was still not tiny, but it matched the real project size much better.

Push Carefully

Because the commit hashes changed, the remote history needs a force push:

$ git push --force --all origin
$ git push --force --tags origin

Do this only after the team knows what is happening. Anyone who has the old history locally will need to clean up their clone, reset their branches, or clone again.

Preventing the Same Problem

The practical fixes are simple:

The most useful lesson for me was that deleting a file in a later commit does not remove it from history. If a 100 MB file was committed once, every clone still pays for it until the history is rewritten.