This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
build-docs.sh Orchestrator
Loading…
build-docs.sh Orchestrator
Relevant source files
Purpose and Scope
The build-docs.sh script is the main orchestration layer for the DeepWiki-to-mdBook conversion system. It coordinates all components of the three-phase pipeline, manages configuration, handles environment variable processing, and produces the final output artifacts. This document covers the script’s responsibilities, execution flow, configuration management, and integration with other system components.
For details on the components orchestrated by this script, see deepwiki-scraper.py, Template System, and mdBook Integration. For information on the three-phase architecture, see Three-Phase Pipeline.
Role and Responsibilities
The orchestrator serves as the single entry point for the documentation build process. It is invoked as the Docker container’s default command and coordinates all system components in a sequential, deterministic manner.
Key Responsibilities:
| Responsibility | Implementation |
|---|---|
| Configuration Management | Validates and sets defaults for all environment variables |
| Auto-detection | Discovers repository information from Git remotes |
| Component Coordination | Invokes deepwiki-scraper.py, process-template.py, mdbook, and mdbook-mermaid |
| Error Handling | Uses set -e for fail-fast behavior on any component failure |
| Output Management | Organizes all artifacts into /output directory structure |
| Mode Selection | Supports standard and markdown-only execution modes |
| Template Processing | Coordinates header/footer injection into all markdown files |
Sources: scripts/build-docs.sh:1-310
Architecture Overview
The following diagram maps the orchestrator’s workflow to actual code entities and directory paths used in the script:
Diagram: Orchestrator Component Integration
graph TB
Entry["Entry Point\nbuild-docs.sh"]
subgraph "Configuration Phase"
AutoDetect["Git Auto-detection\nlines 8-19"]
EnvVars["Environment Variables\nREPO, BOOK_TITLE, etc.\nlines 21-26"]
Defaults["Default Generation\nBOOK_AUTHORS, GIT_REPO_URL\nlines 44-51"]
Validate["Validation\nREPO check\nlines 33-38"]
end
subgraph "Execution Phase"
Step1["Step 1: deepwiki-scraper.py\n$REPO → $WIKI_DIR\nlines 61-65"]
Decision{{"MARKDOWN_ONLY\n?\nlines 68"}}
MarkdownExit["Copy to /output/markdown\nlines 69-93"]
Step2["Step 2: mkdir $BOOK_DIR\nCreate book.toml\nlines 95-119"]
Step3["Step 3: Generate SUMMARY.md\nDiscover structure\nlines 124-188"]
Step4["Step 4: process-template.py\nInject headers/footers\nlines 190-261"]
Step5["Step 5: mdbook-mermaid install\nlines 263-266"]
Step6["Step 6: mdbook build\nlines 268-271"]
Step7["Step 7: Copy to /output\nlines 273-295"]
end
Entry --> AutoDetect
AutoDetect --> EnvVars
EnvVars --> Defaults
Defaults --> Validate
Validate --> Step1
Step1 --> Decision
Decision -->|true| MarkdownExit
Decision -->|false| Step2
Step2 --> Step3
Step3 --> Step4
Step4 --> Step5
Step5 --> Step6
Step6 --> Step7
MarkdownExit --> OutputMarkdown["/output/markdown/"]
MarkdownExit --> OutputRaw["/output/raw_markdown/"]
Step7 --> OutputBook["/output/book/"]
Step7 --> OutputMarkdown
Step7 --> OutputRaw
Step7 --> OutputConfig["/output/book.toml"]
Sources: scripts/build-docs.sh:1-310
Configuration Management
The script implements a sophisticated configuration system with automatic detection, environment variable overrides, and sensible defaults.
Auto-Detection Logic
The script attempts to automatically detect the repository from Git metadata if REPO is not explicitly set:
Diagram: Repository Auto-Detection Flow
flowchart TD
Start["Check if REPO\nenvironment variable set"]
Start -->|Not set| CheckGit["Check if .git directory exists\ngit rev-parse --git-dir"]
Start -->|Set| UseProvided["Use provided REPO value"]
CheckGit -->|Yes| GetRemote["Get remote.origin.url\ngit config --get remote.origin.url"]
CheckGit -->|No| RequireManual["Require manual REPO setting"]
GetRemote -->|Found| ExtractOwnerRepo["Extract owner/repo using sed\nPattern: github.com[:/]owner/repo"]
GetRemote -->|Not found| RequireManual
ExtractOwnerRepo --> SetRepo["Set REPO variable"]
UseProvided --> SetRepo
SetRepo --> Validate["Validate REPO is not empty"]
RequireManual --> Validate
Validate -->|Empty| Error["Exit with error\nlines 34-37"]
Validate -->|Valid| Continue["Continue execution"]
The regex pattern at scripts/build-docs.sh16 handles multiple GitHub URL formats:
https://github.com/owner/repo.gitgit@github.com:owner/repo.githttps://github.com/owner/repo
Sources: scripts/build-docs.sh:8-19 scripts/build-docs.sh:33-38
Configuration Variables
The following table documents all configuration variables managed by the orchestrator:
| Variable | Default | Derivation | Line Reference |
|---|---|---|---|
REPO | Auto-detected | Extracted from git remote.origin.url | scripts/build-docs.sh:9-19 |
BOOK_TITLE | "Documentation" | None | scripts/build-docs.sh23 |
BOOK_AUTHORS | Repository owner | Extracted from $REPO (first segment) | scripts/build-docs.sh45 |
GIT_REPO_URL | GitHub URL | Constructed from $REPO | scripts/build-docs.sh46 |
MARKDOWN_ONLY | "false" | None | scripts/build-docs.sh26 |
WORK_DIR | "/workspace" | Fixed | scripts/build-docs.sh27 |
WIKI_DIR | "/workspace/wiki" | Fixed | scripts/build-docs.sh28 |
RAW_DIR | "/workspace/raw_markdown" | Fixed | scripts/build-docs.sh29 |
OUTPUT_DIR | "/output" | Fixed | scripts/build-docs.sh30 |
BOOK_DIR | "/workspace/book" | Fixed | scripts/build-docs.sh31 |
Computed variables derived from $REPO:
| Variable | Computation | Line Reference |
|---|---|---|
REPO_OWNER | `echo “$REPO” | cut -d’/’ -f1` |
REPO_NAME | `echo “$REPO” | cut -d’/’ -f2` |
DEEPWIKI_URL | "https://deepwiki.com/$REPO" | scripts/build-docs.sh48 |
DEEPWIKI_BADGE_URL | "https://deepwiki.com/badge.svg" | scripts/build-docs.sh49 |
REPO_BADGE_LABEL | URL-encoded with dash escaping | scripts/build-docs.sh50 |
GITHUB_BADGE_URL | Shields.io badge URL | scripts/build-docs.sh51 |
Sources: scripts/build-docs.sh:21-51
sequenceDiagram
participant Script as build-docs.sh
participant Scraper as deepwiki-scraper.py
participant FileSystem as File System
participant Templates as process-template.py
participant MDBook as mdbook
participant Mermaid as mdbook-mermaid
Note over Script: Configuration Phase
Script->>Script: Auto-detect REPO
Script->>Script: Set defaults
Script->>Script: Validate configuration
Note over Script: Step 1: Scraping
Script->>FileSystem: rm -rf $RAW_DIR
Script->>Scraper: python3 deepwiki-scraper.py $REPO $WIKI_DIR
Scraper-->>FileSystem: Write markdown to $WIKI_DIR
Scraper-->>FileSystem: Write raw snapshots to $RAW_DIR
alt MARKDOWN_ONLY == true
Note over Script: Markdown-Only Exit Path
Script->>FileSystem: cp $WIKI_DIR to /output/markdown
Script->>FileSystem: cp $RAW_DIR to /output/raw_markdown
Script->>Script: Exit (skip HTML build)
else MARKDOWN_ONLY == false
Note over Script: Step 2: mdBook Initialization
Script->>FileSystem: mkdir -p $BOOK_DIR/src
Script->>FileSystem: Create book.toml
Note over Script: Step 3: SUMMARY.md Generation
Script->>FileSystem: Scan $WIKI_DIR for .md files
Script->>FileSystem: Generate src/SUMMARY.md
Note over Script: Step 4: Template Processing
Script->>Templates: process-template.py $HEADER_TEMPLATE
Templates-->>Script: Processed HEADER_HTML
Script->>Templates: process-template.py $FOOTER_TEMPLATE
Templates-->>Script: Processed FOOTER_HTML
Script->>FileSystem: cp $WIKI_DIR/* to src/
Script->>FileSystem: Inject header/footer into all .md files
Note over Script: Step 5: Mermaid Installation
Script->>Mermaid: mdbook-mermaid install $BOOK_DIR
Mermaid-->>FileSystem: Install mermaid.js assets
Note over Script: Step 6: Build
Script->>MDBook: mdbook build
MDBook-->>FileSystem: Generate book/ directory
Note over Script: Step 7: Output Collection
Script->>FileSystem: cp book to /output/book
Script->>FileSystem: cp $WIKI_DIR to /output/markdown
Script->>FileSystem: cp $RAW_DIR to /output/raw_markdown
Script->>FileSystem: cp book.toml to /output/book.toml
end
Execution Flow
The orchestrator follows a seven-step execution sequence, with conditional branching for markdown-only mode:
Diagram: Step-by-Step Execution Sequence
Sources: scripts/build-docs.sh:61-310
Step Details
Step 1: Wiki Scraping
Lines: scripts/build-docs.sh:61-65
Invokes the Python scraper to fetch and convert DeepWiki content:
The scraper writes output to two locations:
$WIKI_DIR(/workspace/wiki): Enhanced markdown with injected diagrams$RAW_DIR(/workspace/raw_markdown): Pre-enhancement markdown snapshots for debugging
For details on the scraper’s operation, see deepwiki-scraper.py.
Sources: scripts/build-docs.sh:61-65
Step 2: mdBook Structure Initialization
Lines: scripts/build-docs.sh:95-119
Skipped if: MARKDOWN_ONLY=true
Creates the mdBook directory structure and generates book.toml configuration:
$BOOK_DIR/
├── book.toml
└── src/
The book.toml file is generated using a heredoc with variable substitution:
| Configuration Section | Variables Used | Purpose |
|---|---|---|
[book] | $BOOK_TITLE, $BOOK_AUTHORS | Book metadata |
[output.html] | $GIT_REPO_URL | Repository link in UI |
[preprocessor.mermaid] | N/A | Enable mermaid diagrams |
[output.html.fold] | N/A | Enable section folding |
Sources: scripts/build-docs.sh:95-119
Step 3: SUMMARY.md Generation
Lines: scripts/build-docs.sh:124-188
flowchart TD
Start["Start SUMMARY.md Generation"]
Start --> WriteHeader["Write '# Summary' header"]
WriteHeader --> FindOverview["Find overview file\ngrep -Ev '^[0-9]'"]
FindOverview -->|Found| WriteOverview["Write overview entry\nExtract title from first line"]
FindOverview -->|Not found| ListMain
WriteOverview --> ListMain["List all main pages\nls $WIKI_DIR/*.md"]
ListMain --> FilterOverview["Filter out overview file"]
FilterOverview --> NumericSort["Sort numerically\nsort -t- -k1 -n"]
NumericSort --> ProcessLoop["For each file"]
ProcessLoop --> ExtractTitle["Extract title\nhead -1 /sed 's/^# //'"]
ExtractTitle --> GetSectionNum["Extract section number grep -oE '^[0-9]+'"]
GetSectionNum --> CheckSubdir{"Subsection directory section-N exists?"}
CheckSubdir -->|Yes|WriteSection["Write section entry - [title] filename"]
WriteSection --> ListSubs["List subsection files ls section-N/*.md"]
ListSubs --> SortSubs["Sort numerically sort -t- -k1 -n"]
SortSubs --> WriteSubLoop["For each subsection: - [subtitle] section-N/file"]
WriteSubLoop --> NextFile
CheckSubdir -->|No|WriteStandalone["Write standalone entry - [title] filename"]
WriteStandalone --> NextFile{"More files?"}
NextFile -->|Yes|ProcessLoop
NextFile -->|No| Complete["Complete src/SUMMARY.md"]
Dynamically generates the table of contents by discovering the file structure in $WIKI_DIR. This step implements numeric sorting and hierarchical organization.
Diagram: SUMMARY.md Generation Algorithm
Key implementation details:
Overview Page Detection: scripts/build-docs.sh:136-144
- Searches for files without numeric prefix
- Typically matches
Overview.mdor similar
Numeric Sorting: scripts/build-docs.sh:147-155
- Uses
sort -t- -k1 -nto sort by numeric prefix - Handles formats like
1-Title.md,2.1-Subtopic.md
Hierarchy Detection: scripts/build-docs.sh:165-180
- Checks for
section-N/directories for each numeric section - Creates indented entries for subsections
Sources: scripts/build-docs.sh:124-188
Step 4: Template Processing and File Copying
Lines: scripts/build-docs.sh:190-261
flowchart LR
subgraph "Template Loading"
HeaderPath["$HEADER_TEMPLATE\n/workspace/templates/header.html"]
FooterPath["$FOOTER_TEMPLATE\n/workspace/templates/footer.html"]
GenDate["GENERATION_DATE\ndate -u command"]
end
subgraph "Variable Substitution"
ProcessH["process-template.py\n$HEADER_TEMPLATE"]
ProcessF["process-template.py\n$FOOTER_TEMPLATE"]
Vars["Variables passed:\nDEEPWIKI_URL\nDEEPWIKI_BADGE_URL\nGIT_REPO_URL\nGITHUB_BADGE_URL\nREPO\nBOOK_TITLE\nBOOK_AUTHORS\nGENERATION_DATE"]
end
subgraph "Injection"
CopyFiles["cp $WIKI_DIR/* to src/"]
InjectLoop["For each .md file:\nsrc/*.md src/*/*.md"]
CreateTemp["Create temp file:\nHEADER + content + FOOTER"]
Replace["mv temp to original"]
end
HeaderPath --> ProcessH
FooterPath --> ProcessF
GenDate --> Vars
Vars --> ProcessH
Vars --> ProcessF
ProcessH --> HeaderHTML["HEADER_HTML variable"]
ProcessF --> FooterHTML["FOOTER_HTML variable"]
CopyFiles --> InjectLoop
HeaderHTML --> InjectLoop
FooterHTML --> InjectLoop
InjectLoop --> CreateTemp
CreateTemp --> Replace
Processes header and footer templates and injects them into all markdown files.
Template Processing Flow:
Diagram: Template Processing Pipeline
The template processor is invoked with all configuration variables as arguments: scripts/build-docs.sh:205-213 scripts/build-docs.sh:222-230
File injection pattern: scripts/build-docs.sh:243-257
- Processes all
.mdfiles insrc/andsrc/*/ - Creates temporary file with header + original content + footer
- Replaces original with modified version
For details on the template system and variable substitution, see Template System.
Sources: scripts/build-docs.sh:190-261
Step 5: Mermaid Installation
Lines: scripts/build-docs.sh:263-266
Installs mdbook-mermaid preprocessor assets into the book directory:
This command installs the mermaid.js library and initialization code required for client-side diagram rendering in the final HTML output.
Sources: scripts/build-docs.sh:263-266
Step 6: Book Build
Lines: scripts/build-docs.sh:268-271
Executes the mdBook build process:
Build Process:
- Reads
book.tomlconfiguration - Processes
src/SUMMARY.mdto determine structure - Applies mermaid preprocessor to all markdown files
- Converts markdown to HTML with search indexing
- Outputs to
$BOOK_DIR/book/directory
For more information on mdBook integration, see mdBook Integration.
Sources: scripts/build-docs.sh:268-271
Step 7: Output Collection
Lines: scripts/build-docs.sh:273-295
Copies all build artifacts to the /output volume mount for persistence:
| Source | Destination | Description |
|---|---|---|
$BOOK_DIR/book/ | /output/book/ | Built HTML documentation |
$WIKI_DIR/ | /output/markdown/ | Enhanced markdown files |
$RAW_DIR/ | /output/raw_markdown/ | Pre-enhancement markdown (if exists) |
$BOOK_DIR/book.toml | /output/book.toml | Book configuration reference |
The script ensures clean output by removing existing directories before copying: scripts/build-docs.sh:282-290
Sources: scripts/build-docs.sh:273-295
Markdown-Only Mode
When MARKDOWN_ONLY=true, the orchestrator follows a shortened execution path that skips HTML generation:
Execution Path:
- Step 1: Scrape wiki (normal)
- Copy
$WIKI_DIRto/output/markdown/ - Copy
$RAW_DIRto/output/raw_markdown/(if exists) - Exit with success
Use Cases:
- Debugging the scraper output without full build overhead
- Extracting markdown for alternative processing pipelines
- CI/CD test workflows that only validate markdown generation
- Custom post-processing before HTML generation
Implementation: scripts/build-docs.sh:68-93
Sources: scripts/build-docs.sh:68-93
Error Handling
The orchestrator implements fail-fast error handling:
Error Handling Mechanisms:
| Mechanism | Implementation | Line Reference |
|---|---|---|
| Exit on any error | set -e | scripts/build-docs.sh2 |
| Configuration validation | Explicit REPO check with error message | scripts/build-docs.sh:33-38 |
| Component failures | Automatic propagation due to set -e | All component invocations |
| Template warnings | Non-fatal warnings if templates not found | scripts/build-docs.sh:215-216 scripts/build-docs.sh:232-233 |
The script does not use explicit error trapping; instead, it relies on Bash’s set -e behavior to immediately exit if any command returns a non-zero status. This ensures that failures in any component (scraper, template processor, mdBook) halt execution and propagate to the Docker container exit code.
Sources: scripts/build-docs.sh2 scripts/build-docs.sh:33-38 scripts/build-docs.sh:215-216 scripts/build-docs.sh:232-233
graph TB
Orchestrator["build-docs.sh"]
subgraph "Python Components"
Scraper["deepwiki-scraper.py\nArgs: REPO, WIKI_DIR"]
Templates["process-template.py\nArgs: template_path, var1=val1, ..."]
end
subgraph "Build Tools"
MDBook["mdbook build\nWorking dir: $BOOK_DIR"]
Mermaid["mdbook-mermaid install\nArgs: $BOOK_DIR"]
end
subgraph "File System"
Input["Input:\n/workspace/templates/"]
Working["Working:\n$WIKI_DIR\n$RAW_DIR\n$BOOK_DIR"]
Output["Output:\n/output/"]
end
subgraph "Environment"
EnvVars["Environment Variables:\nREPO, BOOK_TITLE,\nBOOK_AUTHORS, etc."]
end
EnvVars --> Orchestrator
Input --> Templates
Orchestrator -->|python3| Scraper
Orchestrator -->|python3| Templates
Orchestrator -->|mdbook| MDBook
Orchestrator -->|mdbook-mermaid| Mermaid
Scraper --> Working
Templates --> Orchestrator
Orchestrator --> Working
MDBook --> Working
Mermaid --> Working
Orchestrator --> Output
Integration Points
The orchestrator integrates with multiple system components through well-defined interfaces:
Diagram: Component Integration Interfaces
Interface Specifications:
deepwiki-scraper.py:
- Invocation:
python3 /usr/local/bin/deepwiki-scraper.py $REPO $WIKI_DIR - Input: Repository identifier (e.g.,
"jzombie/deepwiki-to-mdbook") - Output: Markdown files in
$WIKI_DIR, raw snapshots in$RAW_DIR - Documentation: deepwiki-scraper.py
process-template.py:
- Invocation:
python3 /usr/local/bin/process-template.py $TEMPLATE_PATH var1=val1 var2=val2 ... - Input: Template file path and variable assignments
- Output: Processed HTML string to stdout
- Documentation: Template System
mdbook:
- Invocation:
mdbook build(in$BOOK_DIR) - Input:
book.tomlandsrc/directory structure - Output: HTML in
book/subdirectory - Documentation: mdBook Integration
mdbook-mermaid:
- Invocation:
mdbook-mermaid install $BOOK_DIR - Input: Book directory path
- Output: Mermaid assets installed in book directory
- Documentation: mdBook Integration
Sources: scripts/build-docs.sh65 scripts/build-docs.sh:205-213 scripts/build-docs.sh:222-230 scripts/build-docs.sh266 scripts/build-docs.sh271
Output Artifacts
The orchestrator produces a structured output directory with multiple artifact types:
/output/
├── book/ # Searchable HTML documentation (Step 7)
│ ├── index.html
│ ├── searchindex.js
│ ├── mermaid.min.js
│ └── ...
├── markdown/ # Enhanced markdown files (Step 7 or markdown-only)
│ ├── Overview.md
│ ├── 1-First-Section.md
│ ├── section-1/
│ │ └── 1.1-Subsection.md
│ └── ...
├── raw_markdown/ # Pre-enhancement snapshots (if available)
│ ├── Overview.md
│ ├── 1-First-Section.md
│ └── ...
└── book.toml # Book configuration reference (Step 7)
Artifact Generation Timeline:
| Artifact | Generated By | When | Purpose |
|---|---|---|---|
raw_markdown/ | deepwiki-scraper.py | Step 1 | Debug: pre-enhancement state |
markdown/ | deepwiki-scraper.py | Step 1 | Final markdown with diagrams |
book.toml | build-docs.sh | Step 2 | Book configuration reference |
book/ | mdbook | Step 6 | Final HTML documentation |
Sources: scripts/build-docs.sh:273-295
Dismiss
Refresh this wiki
Enter email to refresh