CI/CD for a Docs Site: ADR-004¶
How we built a deployment pipeline that stays fresh without manual intervention.
The Problem¶
We needed a CI/CD pipeline that could:
- Deploy on merge to main
- Aggregate docs from upstream repos daily
- Allow manual rebuilds with cache bypass
- Run security scanning without slowing deploys
Why GitHub Pages?¶
We considered three options:
| Platform | Cost | PR Previews | HTTPS | Vendor Count |
|---|---|---|---|---|
| GitHub Pages | Free | No | Auto (*.github.io) | 1 |
| Netlify/Vercel | Free tier | Yes | Auto | 2 |
| Railway | ~$5/mo | Yes | Auto | 2 |
Cost wasn't the deciding factor—all have generous free tiers. What mattered:
- Vendor consolidation - secrets, permissions, and logs in one place
- No external OAuth - fewer security surface areas
- Workflow simplicity - deploy-pages action just works
The trade-off: No PR preview deployments. We accepted this because our site is documentation—reviewing markdown diffs is sufficient. For a React app with visual changes, we'd choose differently.
Note: Custom domains need DNS configuration and propagation time. The *.github.io subdomain gets HTTPS immediately.
The Pipeline¶
graph LR
A[Push to main] --> B[deploy.yml]
A --> S[security.yml]
B --> C[Validate Schema]
C --> D[Restore Cache]
D --> E[Aggregate Docs]
E --> F[MkDocs Build]
F --> G[Deploy to Pages]
S --> H[Gitleaks]
S --> I[YAML Lint]
J[Daily Schedule] --> B
K[Manual + Force] --> L[Clear Cache]
L --> C
Key insight: security.yml runs in parallel with deploy.yml. A linting failure doesn't block deployment—but it does show up as a failed check on the commit.
Three triggers, one pipeline:
on:
push:
branches: [main]
schedule:
- cron: '0 6 * * *' # Daily at 6 AM UTC
workflow_dispatch:
inputs:
force_refresh:
type: boolean
default: false
Caching Strategy¶
Template aggregation fetches docs from GitHub repos. Without caching, every build would re-fetch everything.
Our approach:
- Cache key includes
hashFiles('templates.yaml')- config changes invalidate - Restore keys allow partial cache hits
- Manifest tracking in aggregation script compares commit SHAs
- name: Restore template cache
if: ${{ github.event.inputs.force_refresh != 'true' }}
uses: actions/cache@v5
with:
path: .cache/templates
key: templates-${{ hashFiles('templates.yaml') }}-${{ github.run_id }}
restore-keys: |
templates-${{ hashFiles('templates.yaml') }}-
templates-
The force refresh option clears the cache entirely:
- name: Clear cache (if force refresh)
if: ${{ github.event.inputs.force_refresh == 'true' }}
run: rm -rf .cache/templates
Security Scanning¶
Separate workflow, parallel execution:
# security.yml
jobs:
gitleaks:
# Secret scanning on every push
dependency-review:
# License and vulnerability check on PRs
yaml-lint:
# Configuration validation
This keeps security checks from blocking deploys while still catching issues.
The yamllint War Story¶
Our first security run failed spectacularly:
##[error]mkdocs.yml:88:5 [indentation] wrong indentation: expected 6 but found 4
##[error]templates.yaml:45:121 [line-length] line too long (156 > 120 characters)
##[warning].github/workflows/deploy.yml:3:1 [truthy] truthy value should be one of [false, true]
The investigation revealed three conflicts:
on:is not a boolean - GitHub Actions useson:as a keyword, but yamllint sees it as a truthy value- MkDocs doesn't require
---- yamllint'sdocument-startrule expects it - Description fields are long - template descriptions exceed 120 characters
The fix: .yamllint.yml configuration that respects ecosystem conventions:
rules:
# GitHub Actions uses `on:` as a keyword
truthy:
allowed-values: ['true', 'false', 'on']
# MkDocs files don't need document start
document-start: disable
# Allow longer lines for descriptions
line-length:
max: 200
Lesson: Linting tools need per-ecosystem configuration. Default rules assume vanilla YAML.
Build Times¶
| Scenario | Time |
|---|---|
| Cold build (no cache) | ~45s |
| Warm build (cached) | ~20s |
| Force refresh | ~45s |
Most deploys hit the cache. Daily scheduled builds may be slower if upstream repos changed.
What We Learned¶
- Separate security from deploy - don't let linting failures block urgent content fixes
- Cache aggressively, invalidate precisely - manifest-based tracking beats time-based expiry
- Make force refresh easy - when caching goes wrong, you need an escape hatch
What's Next¶
- ADR-005: DevSecOps implementation (the security.yml details)
Links: