ADR-006: Cross-Project Documentation Aggregation¶

Status: Accepted 2026-01-03 Date: 2026-01-03 Decision Makers: @amiable-dev/maintainers Depends On: ADR-003 (Configuration), ADR-004 (CI/CD) Council Review: 2026-01-03 (Tier: High, Models: GPT-5.2, Claude Opus 4.5, Gemini 3 Pro, Grok 4.1)

Context¶

The amiable-templates site aggregates documentation from multiple template repositories. Each template repository contains its own documentation that should be pulled into the unified site. We need a system that:

Fetches documentation from configured repositories at build time
Transforms content for the unified site context
Handles caching efficiently for fast builds
Gracefully handles missing or changed content

This ADR adapts patterns from amiable-docusaurus ADR-003: Cross-Project ADR Aggregation.

Current State¶

No aggregation system exists. Template documentation is manually referenced via links.

Goals¶

Automated fetching of template documentation at build time
Consistent presentation across all template docs
Source attribution for aggregated content
Fast incremental builds via caching
Resilient to upstream changes

Non-Goals¶

Real-time synchronization (build-time is sufficient)
Editing aggregated content in this repository
Version history of aggregated content
Webhook-triggered updates

Decision¶

Implement build-time documentation aggregation using Python scripts:

1. Aggregation Script: `scripts/aggregate_templates.py`¶

#!/usr/bin/env python3
"""
Fetch documentation from template repositories at build time.
Reads configuration from templates.yaml.
"""

import asyncio
import json
import os
import re
from pathlib import Path
from datetime import datetime

import aiohttp
import yaml

GITHUB_RAW_BASE = "https://raw.githubusercontent.com"
GITHUB_API_BASE = "https://api.github.com"

class TemplateAggregator:
    def __init__(self, config_path: str = "templates.yaml"):
        self.config = self._load_config(config_path)
        self.cache_dir = Path(".cache/templates")
        self.output_dir = Path("docs/templates")
        self.token = os.environ.get("GITHUB_TOKEN")

    async def aggregate_all(self):
        """Fetch docs from all configured templates."""
        async with aiohttp.ClientSession() as session:
            for template in self.config.get("templates", []):
                await self.aggregate_template(session, template)

    async def aggregate_template(self, session, template):
        """Fetch and transform docs for a single template."""
        template_id = template["id"]
        owner = template["repo"]["owner"]
        repo = template["repo"]["name"]

        # Get current commit SHA
        sha = await self._get_commit_sha(session, owner, repo)
        if not sha:
            print(f"  Warning: Could not fetch SHA for {owner}/{repo}")
            return

        # Check cache
        if self._is_cached(template_id, sha):
            print(f"  Using cached content for {template_id}")
            return

        # Fetch each doc
        output_path = self.output_dir / template_id
        output_path.mkdir(parents=True, exist_ok=True)

        for doc in template["directories"]["docs"]:
            content = await self._fetch_file(
                session, owner, repo, sha, doc["path"]
            )
            if content:
                transformed = self._transform_content(
                    content, owner, repo, sha, doc["path"]
                )
                target = output_path / doc["target"]
                target.write_text(transformed)
                print(f"  Wrote {doc['target']}")

        self._update_cache(template_id, sha)

    def _transform_content(self, content, owner, repo, sha, path):
        """Transform content for unified site."""
        # Rewrite relative links to absolute GitHub URLs
        content = self._rewrite_links(content, owner, repo, sha, path)

        # Inject source attribution
        source_url = f"https://github.com/{owner}/{repo}/blob/{sha}/{path}"
        attribution = f"""
!!! info "Source"
    This documentation is from [{owner}/{repo}]({source_url}).
    Last synced: {datetime.utcnow().strftime('%Y-%m-%d')}

"""
        return attribution + content

    def _rewrite_links(self, content, owner, repo, sha, path):
        """Rewrite relative links to absolute GitHub URLs."""
        base_path = Path(path).parent

        # Rewrite images
        def replace_image(match):
            alt, src = match.groups()
            if src.startswith(('http://', 'https://')):
                return match.group(0)
            resolved = (base_path / src).as_posix()
            return f"![{alt}]({GITHUB_RAW_BASE}/{owner}/{repo}/{sha}/{resolved})"

        content = re.sub(r'!\[([^\]]*)\]\(([^)]+)\)', replace_image, content)
        return content

2. Caching Strategy¶

Cache structure:

.cache/templates/
├── manifest.json     # Tracks commit SHAs
└── raw/              # Cached raw content
    └── {template-id}/

Manifest format:

{
  "litellm-langfuse-starter": {
    "commit_sha": "abc123...",
    "fetched_at": "2026-01-03T10:00:00Z",
    "files": ["overview.md", "setup.md"]
  }
}

Cache invalidation: - Compare current commit SHA with cached SHA - If different, refetch all docs for that template - Daily scheduled builds ensure freshness

3. Content Transformation¶

Transformation	Purpose
Link rewriting	Relative → absolute GitHub URLs
Image rewriting	Point to raw.githubusercontent.com
Source attribution	Add info box with source link
Front matter	Inject MkDocs metadata

4. Error Handling¶

Scenario	Behavior
Template repo not found	Skip, log warning
Doc file not found	Skip file, continue others
Rate limit hit	Use cached content, warn
API error	Use cached content if available

Never fail the build due to upstream issues.

5. GitHub API Usage¶

Authenticated via GITHUB_TOKEN (5000 req/hr)
Use raw.githubusercontent.com for content (no rate limit)
Use Trees API for efficient directory listing
Cache reduces API calls significantly

6. Integration with CI¶

In .github/workflows/deploy.yml:

- name: Aggregate template docs
  env:
    GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
  run: python scripts/aggregate_templates.py

- name: Build MkDocs
  run: mkdocs build --strict

7. ADR Aggregation¶

ADRs are fetched similarly: - Read from docs/adr/ in each template repo - Write to docs/adrs/aggregated/{template-id}/ - Include in navigation under ADRs section

Consequences¶

Positive¶

Unified experience: All template docs in one place
Always fresh: Daily rebuilds catch upstream changes
Fast builds: Caching minimizes fetch time
Resilient: Cached content used on errors

Negative¶

Build dependency: Requires GitHub API access
Delayed updates: Changes not instant (daily rebuild)
Complexity: Aggregation script to maintain

Neutral¶

Content transformation may need updates for edge cases
Large template repos may slow initial builds

Implementation Phases¶

Phase 1: Core Aggregation¶

[x] Create scripts/aggregate_templates.py
[x] Implement basic fetch and cache
[x] Test with litellm-langfuse-railway

Phase 2: Transformation¶

[x] Implement link rewriting
[x] Add source attribution
[x] Handle front matter

Phase 3: Integration¶

[x] Add to CI workflow
[x] Configure caching in GitHub Actions
[x] Test scheduled builds

Phase 4: ADR Aggregation¶

[ ] Extend script for ADRs (deferred - templates may not have ADRs)
[x] Update navigation
[ ] Generate ADR index (deferred)

Compliance / Validation¶

[x] Aggregation completes without errors
[x] Links in aggregated content work
[x] Images display correctly
[x] Cache speeds up subsequent builds
[x] Errors don't fail the build

LLM Council Review Summary¶

Reviewed: 2026-01-03 Tier: High (4 models: GPT-5.2, Claude Opus 4.5, Gemini 3 Pro, Grok 4.1)

Verdict: Accepted¶

Robust aggregation design with appropriate caching and error handling strategies.

Key Findings Incorporated¶

Finding	Resolution
Rate limiting concern (60/hr unauthenticated)	Use `GITHUB_TOKEN` in CI (5000/hr authenticated)
Link validation for aggregated content	Added link checking to weekly schedule
Large file handling	Added size limit check (>1MB files logged and skipped)
Attribution injection placement	Info box at top of content for visibility

Dissenting Views¶

All models agreed build-time aggregation is appropriate for this use case.
Discussion on caching granularity (file vs. template level); consensus on template-level for simplicity.