Git history rewrite detection — recovering squashed and force-pushed states
On this page
Wiki route
This entry sits under security domain. It is the source-code analogue of Wayback Machine as a forensic tool (which recovers de-published web content), and it feeds the people-layer work in forensic identity anchor chain by recovering committer emails and timestamps that a rewrite tried to erase.
[!info] TL;DR A force-push or squash rewrites a branch pointer, but Git does not immediately delete the old commits — they survive as unreachable (“dangling”) objects until garbage collection, and any clone, fork, or CI checkout taken before the rewrite still holds them. On GitHub, the Events API independently logs the pre-rewrite commit SHAs. So “we cleaned that up” is a recoverable claim: enumerate dangling objects, replay the reflog, and read PushEvent history. The rewrite itself is a signal that someone wanted a prior state gone.
Why rewritten history is recoverable
| Mechanism | What it preserves | Public command / endpoint |
|---|---|---|
| Dangling objects | Commits unreferenced after a rewrite stay in the object DB until gc | git fsck --lost-found / git fsck --unreachable |
| Reflog | A local log of where HEAD/branches pointed, default ~90-day retention | git reflog (and git reflog show <branch>) |
| Pre-rewrite clones/forks/CI | Any copy taken before the push has the old graph in full | inspect that copy’s git log --all |
| GitHub Events API | PushEvent records the SHAs of pushed commits, independent of later rewrites | GET /repos/{owner}/{repo}/events |
A force-push moves the remote branch pointer to the new history; the displaced commits become orphans but persist until the next GC cycle. Locally, git reflog still remembers the exact prior commit hash, and git fsck enumerates the unreachable objects. Remotely, the GitHub Events API returns recent PushEvents carrying the before/after SHAs — effectively a remote reflog you do not control.
Detection procedure (public, reproducible)
- Look for force-push fingerprints. Non-linear timestamps, a sudden drop in commit count, or a single squashed “Initial commit” replacing a known longer history are tells. A “for
<upstream-project>” string surviving in a commit message is the kind of self-incrimination noted in module path confusion supply chain attack. - Enumerate dangling objects. In any clone you hold:
git fsck --lost-foundand inspectdangling commit <sha>entries withgit show <sha>. - Replay the reflog.
git reflog/git reflog show <branch>reveals prior tips even after an “overwrite,” within the retention window. - Query the remote event log.
GET /repos/{owner}/{repo}/eventsand readPushEventpayloads for pre-rewrite SHAs; cross-reference against what the current branch shows. (See GitHub event types for thePushEventshape.) - Diff recovered vs current. Treat any removed file (LICENSE, an earlier README, a deleted contract, leaked config) as evidence — the removal, plus its timestamp, is the finding.
- Preserve. Snapshot recovered objects to an out-of-band store; a publisher can later GC the remote or rewrite again.
Companion to Wayback
| Layer | Tool | Recovers |
|---|---|---|
| Published web pages / docs / PDFs | [[security/wayback-machine-as-forensic-tool | Wayback Machine]] |
| Source repository state | this entry (git fsck / reflog / Events API) | Squashed/force-pushed commits, deleted files, committer metadata |
Run both: the web layer recovers the claim, the git layer recovers the code and authorship behind it.
When to use
- A project’s repo shows a suspiciously short or squashed history while marketing implies long development.
- A LICENSE, audit report, or sensitive contract “was never there” but a prior clone/fork suggests otherwise.
- Recovering committer emails / author names erased by a rewrite, to feed identity anchor chain cross-checks.
When NOT to use
- No pre-rewrite copy exists and the remote has already GC’d and the reflog window has expired — recovery may be impossible (note the gap rather than overclaim).
- Legitimate, disclosed history rewrites (e.g. removing an accidentally committed secret) — a rewrite is a signal, not proof of bad intent; corroborate before attributing.
- Non-Git VCS or fully closed-source projects — use a different evidence route.
Related
- Security domain index
- Wayback Machine as a forensic tool
- Forensic identity anchor chain
- Module path confusion supply chain attack
- Fork-and-Rebrand five-layer audit framework
Sources
- git-scm — git-fsck (unreachable / dangling object enumeration) — https://git-scm.com/docs/git-fsck
- git-scm — git-reflog (local history of branch tips, retention window) — https://git-scm.com/docs/git-reflog
- GitHub Docs — REST API endpoints for events (PushEvent before/after SHAs) — https://docs.github.com/en/rest/activity/events
- GitHub Docs — GitHub event types (PushEvent payload) — https://docs.github.com/en/rest/using-the-rest-api/github-event-types