Release-recovery runbook¶
This runbook restores Gad360/apothem to a working state when the
release cutover sequence fails mid-flight. The cutover is a
four-stage atomic sequence (Stage 1 verify-private-snapshot; Stage 2
delete-existing; Stage 3 create-fresh; Stage 4 force-push fresh
history); failure at Stage 2, 3, or 4 leaves the repository in a
recoverable but inconsistent state. The recovery path is governed by
the recovery snapshot captured at Stage 1 — the snapshot is the
recoverable retreat path; without it, recovery is manual reconstruction
from the operator's local machine and downstream caches.
The runbook is written for the maintainer wielding gh, git, and
the recovery-snapshot archive at _inputs/recovery-snapshot.tar.gz.
A competent new contributor follows it without prior context, provided
the recovery snapshot exists.
1. Stage taxonomy and failure surfaces¶
The four cutover stages and their failure modes:
| Stage | Action | Failure mode | Recovery path |
|---|---|---|---|
| 1 | Verify private snapshot | Snapshot capture fails or is incomplete | Re-run capture script before proceeding to Stage 2; document the gap in the snapshot manifest |
| 2 | Delete existing repository | gh repo delete returns non-204 OR the deletion partially completed (some metadata persists) |
§2 Stage-2 recovery |
| 3 | Create fresh repository | gh repo create returns non-201 OR the new repository creates with wrong scope / topics / homepage |
§3 Stage-3 recovery |
| 4 | Force-push fresh history | The push fails OR the push lands but post-push verification (Verified badge, branch protection, Pages, DNS) does not satisfy | §4 Stage-4 recovery |
Each stage's recovery path is idempotent — running it on a clean
state is a no-op; running it on a partial-failure state restores the
expected post-stage condition. The recovery paths assume the snapshot
at _inputs/recovery-snapshot.tar.gz is present and valid.
2. Stage-2 recovery — Delete-existing failed¶
Stage 2 invokes gh repo delete Gad360/apothem --yes. The expected
response is HTTP 204 (No Content). Failure modes:
- The call returns 403 / 404 / 5xx.
- The call returns 204 but a follow-up
gh api repos/Gad360/apothemstill returns 200 (deletion did not propagate or was reverted). - The deletion partially completed (the repository is gone, but a webhook / app-installation / outside-collaborator binding persists on the GitHub side).
2.1 Diagnose¶
gh api repos/Gad360/apothem 2>&1 | head -3
Three outcomes:
- HTTP 404. The repository is fully deleted; Stage 2 succeeded. Proceed to Stage 3 — no recovery needed.
- HTTP 200. The deletion did not land; the repository is still
there. Re-run
gh repo delete Gad360/apothem --yes; if that fails again, the recovery path is the operator's GitHub UI ("Delete this repository" at the bottom of the Settings page) or escalation to GitHub Support if the UI fails the same way. - HTTP 5xx. GitHub-side outage; wait 5 minutes and re-check. The deletion call's 204 may have queued the action; verify after the outage clears.
2.2 Restore from snapshot (if the deletion partially landed)¶
When the diagnosis at §2.1 shows a 404 (deleted) but the cutover sequence cannot continue (the maintainer aborts), restore the pre-cutover state from the snapshot:
# Extract the snapshot
mkdir -p /tmp/apothem-recovery
tar -xzf plans/apothem-release/_inputs/recovery-snapshot.tar.gz -C /tmp/apothem-recovery
# Re-create the repository from the snapshot's metadata
gh repo create Gad360/apothem --private \
--description "$(cat /tmp/apothem-recovery/repo-description.txt)" \
--homepage "$(cat /tmp/apothem-recovery/repo-homepage.txt)"
# Push the snapshot's git bundle as the initial history
cd /tmp/apothem-recovery
git clone repo.bundle apothem
cd apothem
git remote add origin git@github.com:Gad360/apothem.git
git push --all origin
git push --tags origin
# Re-apply the snapshot's branch protection
gh api -X PUT /repos/Gad360/apothem/branches/main/protection \
--input /tmp/apothem-recovery/branch-protection.json
# Re-apply the snapshot's Pages config
gh api -X POST /repos/Gad360/apothem/pages \
--input /tmp/apothem-recovery/pages-config.json
# Re-apply topics
gh api -X PUT /repos/Gad360/apothem/topics \
--input /tmp/apothem-recovery/topics.json
The snapshot carries every per-repo binding the cutover sequence intended to preserve; the recovery script applies them in the order the snapshot manifest declares. Verify after each step:
gh api repos/Gad360/apothem --jq '.private, .description, .homepage'
gh api repos/Gad360/apothem/pages --jq '.cname, .https_enforced'
gh api repos/Gad360/apothem/branches/main/protection --jq '.required_status_checks.contexts | length'
3. Stage-3 recovery — Create-fresh failed¶
Stage 3 invokes gh repo create Gad360/apothem --private
--description ... --homepage .... The expected response is HTTP 201.
Failure modes:
- The call returns 422 (the repository name is taken — Stage 2 did not complete).
- The call returns 201 but the repository creates with wrong visibility, missing description, missing homepage, or missing topics.
- The call returns 5xx (GitHub-side outage).
3.1 Diagnose¶
gh api repos/Gad360/apothem --jq '.private, .description, .homepage, .topics'
Four outcomes:
- HTTP 404. The repository was not created; re-run the Stage 3 command after verifying Stage 2 fully completed.
private: trueAND description / homepage / topics match the spec. Stage 3 succeeded; proceed to Stage 4.private: false. The repository created public; recover viagh api -X PATCH /repos/Gad360/apothem -f private=true. Re-verify the visibility flipped.- Description / homepage / topics empty or mismatched. Apply patches:
gh api -X PATCH /repos/Gad360/apothem \
-f description="<spec-grade description>" \
-f homepage="https://apothem.ahmedgad.com"
gh api -X PUT /repos/Gad360/apothem/topics \
-f names='["agents","claude-code","conformity-gates","framework"]'
(Replace the names array with the spec-ratified topic set.)
3.2 Idempotent re-creation¶
If the repository state is sufficiently broken that patching is more work than recreating, the operator deletes and re-creates:
gh repo delete Gad360/apothem --yes # Stage 2 again
sleep 5
gh repo create Gad360/apothem --private \
--description "<spec-grade description>" \
--homepage "https://apothem.ahmedgad.com"
The 5-second sleep gives GitHub's deletion queue time to settle before the create call lands.
4. Stage-4 recovery — Force-push or post-push verification failed¶
Stage 4 invokes git push --force-with-lease origin main against the
fresh repository, then applies branch protection, re-enables Pages,
and verifies the commit's GPG signature surfaces as Verified. Failure
modes:
- The push fails (network error, GPG signing failure, force-lease rejection).
- The push lands but the post-push verification fails — Verified badge absent, branch protection not applied, Pages not enabled, DNS CNAME absent.
4.1 Diagnose push¶
gh api repos/Gad360/apothem/commits/main --jq '.commit.message, .commit.verification.verified, .commit.verification.reason'
Three outcomes:
- HTTP 404. The push did not land; the repository is empty.
Re-run the Stage 4 push from the operator's local machine. Verify
the local
mainref points at the signed initial commit before pushing. verified: trueAND commit message matches the spec's initial commit. Stage 4 push succeeded; proceed to §4.2 binding verification.verified: false. The push landed but the GPG signature did not verify. Inspect.commit.verification.reasonfor the failure class:unsigned— the commit was not signed at the push source. Re-sign locally (git commit --amend -S --no-edit) and force-push again.unknown_key— the signing key is not registered with the operator's GitHub account. Add it via Settings → SSH and GPG keys → New GPG key.bad_email— the committer email does not match the GPG key's UID. Fix the local git config (git config user.email me@ahmedgad.com) and re-sign.
4.2 Apply branch protection, Pages, DNS¶
# Branch protection — extract from snapshot
gh api -X PUT /repos/Gad360/apothem/branches/main/protection \
--input plans/apothem-release/_inputs/branch-protection.json
# Pages — the Pages-enablement runbook is the canonical procedure;
# this stage re-applies the snapshot's Pages config
gh api -X POST /repos/Gad360/apothem/pages \
--input plans/apothem-release/_inputs/pages-config.json
# DNS — the Pages-enablement runbook governs the registrar-side step;
# at this stage the maintainer confirms the CNAME still resolves
dig apothem.ahmedgad.com CNAME +short
Expected DNS output:
gad360.github.io.
If the DNS record is absent, the cutover did not retain the apex's DNS record (the registrar is unaffected by GitHub-side cutover); the maintainer follows the Pages-enablement runbook's provider appendix to re-add it.
4.3 Recover from a fully botched Stage 4¶
When Stage 4 partially landed and the post-push state is too divergent to patch, the recovery path is:
- Delete the repository again (§2 recovery).
- Re-create the repository (§3 recovery).
- Re-push the fresh history from a known-clean local clone:
cd /tmp/apothem-fresh
git clone --bare $(pwd)/<source-of-truth> apothem.git
cd apothem.git
git remote add origin git@github.com:Gad360/apothem.git
git push --mirror origin
- Re-apply protection + Pages + DNS per §4.2.
The mirror push lands the full commit graph + tags + refs in one operation; the protection / Pages / DNS apply afterward.
5. Snapshot integrity¶
The recovery snapshot at _inputs/recovery-snapshot.tar.gz is the
recovery path's binding constraint. A missing or corrupted snapshot
forfeits §2.2's restore path; the operator falls back to manual
reconstruction. Verify integrity before any recovery cycle:
tar -tzf plans/apothem-release/_inputs/recovery-snapshot.tar.gz | head -20
Expected entries (sample):
recovery-snapshot/
recovery-snapshot/repo.bundle
recovery-snapshot/repo-description.txt
recovery-snapshot/repo-homepage.txt
recovery-snapshot/branch-protection.json
recovery-snapshot/pages-config.json
recovery-snapshot/topics.json
recovery-snapshot/release-assets/
recovery-snapshot/issues.json
If any expected entry is missing, the snapshot is incomplete; the maintainer re-runs the Stage 1 capture script before proceeding with recovery. Re-running Stage 1 against a deleted repository (§2 already landed) produces an empty snapshot — the snapshot must be captured before Stage 2 fires.
6. Snapshot retention¶
The recovery snapshot is retained for 7 days post-cutover. After 7 days, the snapshot is retired via the per-file destructive-op confirmation surface:
ls -lh plans/apothem-release/_inputs/recovery-snapshot.tar.gz
The maintainer routes the retiral through the canonical confirmation
channel (the AskUserQuestion invocation invoked at
/plan-execute Cluster 8 closure). The 7-day window covers DNS
propagation maxima, CDN cache lifetimes, and the empirical window
within which a downstream consumer surfaces a cutover-induced break.
7. Failure recovery — meta¶
When the recovery cycle itself fails (the snapshot is corrupted, the GitHub API is offline, the maintainer's local clone is stale), the last-resort path is manual reconstruction from:
- The maintainer's local
mainbranch (assumed to mirror the public state at cutover time). - Downstream package manager caches (PyPI, Homebrew, Scoop, AUR all retain prior versions).
- The maintainer's GPG key archive (signed tags can be recreated locally).
- The DNS registrar's record history (most registrars retain a rollback window).
This path is escape-hatch only; the snapshot path at §2.2 + §4.2 is the canonical recovery surface. Manual reconstruction is logged as a high-severity finding in the post-cutover audit.
8. Cross-references¶
- Spec source. Specification §2.7 enumerates the four cutover stages and the snapshot manifest contents.
- Pages flow. The Pages-enablement runbook (
docs/runbooks/ pages-enablement.md) is the canonical procedure for the registrar-side and Pages-side configuration this runbook references during §4.2 recovery. - Release flow. The release-cycle runbook (
docs/runbooks/ release-cycle.md) governs the per-version release procedure after the cutover settles; this runbook governs the cutover itself. - Decisions. D-9 (cutover atomicity), D-29 (snapshot retention), D-31 (Pages re-application), D-32 (DNS continuity) are the decisions this runbook operationalises.