build-docs.sh Orchestrator
Relevant source files
Purpose and Scope
This page documents the build-docs.sh shell script, which serves as the central orchestrator for the entire documentation build process. This script is the container's entry point and coordinates all phases of the system: configuration parsing, scraper invocation, mdBook configuration generation, and output management.
For details about the Python scraping component that this orchestrator calls, see deepwiki-scraper.py. For information about the mdBook integration and configuration format, see mdBook Integration.
Overview
The build-docs.sh script is a Bash orchestration layer that implements the three-phase pipeline described in Three-Phase Pipeline. It has the following core responsibilities:
| Responsibility | Lines | Description |
|---|---|---|
| Auto-detection | build-docs.sh:8-19 | Detects repository from Git remote if not provided |
| Configuration | build-docs.sh:21-53 | Parses environment variables and applies defaults |
| Phase 1 orchestration | build-docs.sh:55-58 | Invokes Python scraper |
| Markdown-only exit | build-docs.sh:60-76 | Implements fast-path for debugging |
| Phase 3 orchestration | build-docs.sh:78-191 | Generates configs, builds mdBook, copies outputs |
Sources: build-docs.sh:1-206
Script Workflow
Complete Execution Flow
The following diagram shows the complete control flow through the orchestrator, including all decision points and phase transitions:
flowchart TD
Start[["build-docs.sh entry"]]
Start --> AutoDetect["Auto-detect repository\nfrom git config"]
AutoDetect --> ValidateRepo{"REPO variable\nset?"}
ValidateRepo -->|No| Error[["Exit with error"]]
ValidateRepo -->|Yes| ExtractParts["Extract REPO_OWNER\nand REPO_NAME"]
ExtractParts --> SetDefaults["Set defaults:\nBOOK_AUTHORS=REPO_OWNER\nGIT_REPO_URL=github.com/REPO"]
SetDefaults --> PrintConfig["Print configuration\nto stdout"]
PrintConfig --> Phase1["Execute Phase 1:\npython3 deepwiki-scraper.py"]
Phase1 --> CheckMode{"MARKDOWN_ONLY\n= true?"}
CheckMode -->|Yes| CopyMd["Copy WIKI_DIR to\nOUTPUT_DIR/markdown"]
CopyMd --> ExitMd[["Exit: markdown-only"]]
CheckMode -->|No| InitBook["Create BOOK_DIR\nand book.toml"]
InitBook --> GenSummary["Generate SUMMARY.md\nfrom file structure"]
GenSummary --> CopySrc["Copy WIKI_DIR to\nBOOK_DIR/src"]
CopySrc --> InstallMermaid["mdbook-mermaid install"]
InstallMermaid --> BuildBook["mdbook build"]
BuildBook --> CopyOutputs["Copy outputs:\nbook/, markdown/, book.toml"]
CopyOutputs --> Success[["Exit: build complete"]]
style Start fill:#f9f9f9
style Phase1 fill:#e8f5e9
style CheckMode fill:#fff9c4
style ExitMd fill:#f9f9f9
style Success fill:#f9f9f9
style Error fill:#ffebee
Sources: build-docs.sh:1-206
Key Decision Point: MARKDOWN_ONLY Mode
The MARKDOWN_ONLY environment variable creates two distinct execution paths in the orchestrator. When set to "true", the script bypasses mdBook configuration generation and building (Phase 3), providing a fast path for debugging content extraction and diagram placement.
Sources: build-docs.sh26 build-docs.sh:60-76
Configuration Handling
Auto-Detection System
The script implements an intelligent auto-detection system for the REPO variable when running in a Git repository context:
flowchart LR
Start["REPO variable"] --> Check{"REPO set?"}
Check -->|Yes| UseProvided["Use provided value"]
Check -->|No| GitCheck{"Inside Git\nrepository?"}
GitCheck -->|No| RequireManual["REPO remains empty"]
GitCheck -->|Yes| GetRemote["git config --get\nremote.origin.url"]
GetRemote --> Extract["Extract owner/repo\nusing sed regex"]
Extract --> SetRepo["Set REPO variable"]
UseProvided --> Validate["Validation check"]
SetRepo --> Validate
RequireManual --> Validate
Validate --> ValidCheck{"REPO\nis set?"}
ValidCheck -->|No| ExitError[["Exit with error:\nREPO must be set"]]
ValidCheck -->|Yes| Continue["Continue execution"]
The regular expression used for extraction handles multiple GitHub URL formats:
https://github.com/owner/repo.gitgit@github.com:owner/repo.githttps://github.com/owner/repo
Sources: build-docs.sh:8-19 build-docs.sh:32-37
Configuration Variable Flow
The script manages five primary configuration variables, with the following precedence and default logic:
| Variable | Source | Default Derivation | Code Reference |
|---|---|---|---|
REPO | Environment or Git auto-detect | (required) | build-docs.sh8-22 |
BOOK_TITLE | Environment | "Documentation" | build-docs.sh23 |
BOOK_AUTHORS | Environment | $REPO_OWNER | build-docs.sh:40-44 |
GIT_REPO_URL | Environment | https://github.com/$REPO | build-docs.sh:40-45 |
MARKDOWN_ONLY | Environment | "false" | build-docs.sh26 |
The script extracts REPO_OWNER and REPO_NAME from the REPO variable using shell string manipulation:
Sources: build-docs.sh:39-45
Working Directory Structure
The orchestrator uses four primary directory paths:
WORK_DIR="/workspace": Temporary workspace for all build operationsWIKI_DIR="$WORK_DIR/wiki": Scraper output locationBOOK_DIR="$WORK_DIR/book": mdBook project directoryOUTPUT_DIR="/output": Volume-mounted final output location
Sources: build-docs.sh:27-30
Phase Orchestration
Phase 1: Scraper Invocation
The orchestrator invokes the Python scraper with exactly two positional arguments:
This command executes the complete Phase 1 and Phase 2 pipeline as documented in Phase 1: Markdown Extraction and Phase 2: Diagram Enhancement. The scraper writes all output to $WIKI_DIR.
Sources: build-docs.sh:55-58
Phase 3: mdBook Configuration and Build
Phase 3 is implemented through six distinct steps in the orchestrator:
Note: Step numbering in stdout messages is off-by-one from phase numbering because the scraper is "Step 1."
Sources: build-docs.sh:78-191
Configuration File Generation
flowchart LR
EnvVars["Environment variables:\nBOOK_TITLE\nBOOK_AUTHORS\nGIT_REPO_URL"]
Template["Heredoc template\nat line 85-103"]
BookToml["BOOK_DIR/book.toml"]
EnvVars --> Template
Template --> BookToml
BookToml --> MdBook["mdbook build"]
book.toml Generation
The orchestrator dynamically generates the book.toml configuration file for mdBook using a heredoc:
The generated book.toml includes:
[book]section:title,authors,language,multilingual,src[output.html]section:default-theme,git-repository-url[preprocessor.mermaid]section:command[output.html.fold]section:enable,level
The git-repository-url setting enables mdBook's "Edit this page" functionality, linking back to the GitHub repository specified in $GIT_REPO_URL.
Sources: build-docs.sh:84-103
flowchart TD
Start["Begin SUMMARY.md generation"]
Start --> FindFirst["Find first .md file\nin WIKI_DIR root"]
FindFirst --> ExtractTitle1["Extract title from\nfirst line (# Title)"]
ExtractTitle1 --> WriteIntro["Write as Introduction link"]
WriteIntro --> IterateMain["Iterate *.md files\nin WIKI_DIR root"]
IterateMain --> SkipFirst{"Is this\nfirst file?"}
SkipFirst -->|Yes| NextFile["Skip to next file"]
SkipFirst -->|No| ExtractTitle2["Extract title\nfrom first line"]
ExtractTitle2 --> GetSectionNum["Extract section number\nusing grep regex"]
GetSectionNum --> CheckSubdir{"section-N/\ndirectory exists?"}
CheckSubdir -->|No| WriteStandalone["Write as standalone:\n- [Title](file.md)"]
CheckSubdir -->|Yes| WriteSection["Write section header:\n# Title"]
WriteSection --> WriteMainLink["Write main page link:\n- [Title](file.md)"]
WriteMainLink --> IterateSubs["Iterate section-N/*.md"]
IterateSubs --> WriteSubLinks["Write indented sub-links:\n - [SubTitle](section-N/file.md)"]
WriteStandalone --> NextFile
WriteSubLinks --> NextFile
NextFile --> MoreFiles{"More\nfiles?"}
MoreFiles -->|Yes| IterateMain
MoreFiles -->|No| WriteSummary["Write to BOOK_DIR/src/SUMMARY.md"]
WriteSummary --> Done["Generation complete"]
SUMMARY.md Generation Algorithm
The orchestrator generates the table of contents (SUMMARY.md) by scanning the actual file structure in $WIKI_DIR. This dynamic generation ensures the table of contents always matches the scraped content.
The algorithm extracts titles by reading the first line of each Markdown file and removing the # prefix using sed:
Section numbers are extracted using grep with a regex pattern:
For detailed information about how the file structure is organized, see Wiki Structure Discovery.
Sources: build-docs.sh:108-159
File Operations
Copy Operations Mapping
The orchestrator performs strategic copy operations to move data through the pipeline:
| Source | Destination | Purpose | Code Reference |
|---|---|---|---|
$WIKI_DIR/* | $OUTPUT_DIR/markdown/ | Markdown-only mode output | build-docs.sh65 |
$WIKI_DIR/* | $BOOK_DIR/src/ | Source files for mdBook | build-docs.sh166 |
$BOOK_DIR/book | $OUTPUT_DIR/book/ | Final HTML output | build-docs.sh184 |
$WIKI_DIR/* | $OUTPUT_DIR/markdown/ | Markdown reference copy | build-docs.sh188 |
$BOOK_DIR/book.toml | $OUTPUT_DIR/book.toml | Configuration reference | build-docs.sh191 |
The final output structure in $OUTPUT_DIR is:
/output/
├── book/ # HTML documentation (from BOOK_DIR/book)
│ ├── index.html
│ ├── *.html
│ └── ...
├── markdown/ # Source Markdown files (from WIKI_DIR)
│ ├── 1-overview.md
│ ├── 2-section.md
│ ├── section-2/
│ └── ...
└── book.toml # Configuration copy (from BOOK_DIR)
Sources: build-docs.sh:178-191
Atomic Output Management
The orchestrator uses a two-stage directory strategy for atomic outputs:
- Working stage : All operations occur in
/workspace(ephemeral) - Output stage : Final artifacts are copied to
/output(volume-mounted)
This ensures that partial builds never appear in the output directory—only completed artifacts are copied. If any step fails, the set -e directive at build-docs.sh2 causes immediate script termination with no partial outputs.
Sources: build-docs.sh2 build-docs.sh:27-30 build-docs.sh:178-191
Tool Invocations
External Command Execution
The orchestrator invokes three external tools during execution:
Each tool is invoked with specific working directories and arguments:
Python scraper invocation (build-docs.sh58):
mdbook-mermaid installation (build-docs.sh171):
This installs the necessary JavaScript and CSS assets for Mermaid diagram rendering into the mdBook project.
mdBook build (build-docs.sh176):
Executed from within $BOOK_DIR due to cd "$BOOK_DIR" at build-docs.sh82
Sources: build-docs.sh58 build-docs.sh82 build-docs.sh171 build-docs.sh176
Error Handling
Validation and Exit Conditions
The script implements minimal but critical validation:
The set -e directive at build-docs.sh2 ensures that any command failure (non-zero exit code) immediately terminates the script. This includes:
- HTTP failures in the Python scraper
- File system errors during copy operations
- mdBook build failures
- mdbook-mermaid installation failures
The only explicit validation check is for the REPO variable at build-docs.sh:32-37 which prints usage instructions and exits with code 1 if not set.
Sources: build-docs.sh2 build-docs.sh:32-37
Stdout Output Format
The orchestrator provides structured console output for monitoring build progress:
================================================================================
DeepWiki Documentation Builder
================================================================================
Configuration:
Repository: owner/repo
Book Title: Documentation Title
Authors: Author Name
Git Repo URL: https://github.com/owner/repo
Markdown Only: false
Step 1: Scraping wiki from DeepWiki...
[scraper output...]
Step 2: Initializing mdBook structure...
Step 3: Generating SUMMARY.md from scraped content...
Generated SUMMARY.md with N entries
Step 4: Copying markdown files to book...
Step 5: Installing mdbook-mermaid assets...
Step 6: Building mdBook...
Step 7: Copying outputs to /output...
================================================================================
✓ Documentation build complete!
================================================================================
Outputs:
- HTML book: /output/book/
- Markdown files: /output/markdown/
- Book config: /output/book.toml
To serve the book locally:
cd /output && python3 -m http.server --directory book 8000
Each step is clearly labeled with progress indicators. The configuration block is printed before processing begins to aid in debugging.
Sources: build-docs.sh:4-6 build-docs.sh:47-53 build-docs.sh:55-205