build-docs.sh Orchestrator

Relevant source files

Purpose and Scope

This page documents the build-docs.sh shell script, which serves as the central orchestrator for the entire documentation build process. This script is the container's entry point and coordinates all phases of the system: configuration parsing, scraper invocation, mdBook configuration generation, and output management.

For details about the Python scraping component that this orchestrator calls, see deepwiki-scraper.py. For information about the mdBook integration and configuration format, see mdBook Integration.

Overview

The build-docs.sh script is a Bash orchestration layer that implements the three-phase pipeline described in Three-Phase Pipeline. It has the following core responsibilities:

Responsibility	Lines	Description
Auto-detection	build-docs.sh:8-19	Detects repository from Git remote if not provided
Configuration	build-docs.sh:21-53	Parses environment variables and applies defaults
Phase 1 orchestration	build-docs.sh:55-58	Invokes Python scraper
Markdown-only exit	build-docs.sh:60-76	Implements fast-path for debugging
Phase 3 orchestration	build-docs.sh:78-191	Generates configs, builds mdBook, copies outputs

Sources: build-docs.sh:1-206

Script Workflow

Complete Execution Flow

The following diagram shows the complete control flow through the orchestrator, including all decision points and phase transitions:

flowchart TD
    Start[["build-docs.sh entry"]]
 
   Start --> AutoDetect["Auto-detect repository\nfrom git config"]
AutoDetect --> ValidateRepo{"REPO variable\nset?"}
ValidateRepo -->|No| Error[["Exit with error"]]
 
   ValidateRepo -->|Yes| ExtractParts["Extract REPO_OWNER\nand REPO_NAME"]
ExtractParts --> SetDefaults["Set defaults:\nBOOK_AUTHORS=REPO_OWNER\nGIT_REPO_URL=github.com/REPO"]
SetDefaults --> PrintConfig["Print configuration\nto stdout"]
PrintConfig --> Phase1["Execute Phase 1:\npython3 deepwiki-scraper.py"]
Phase1 --> CheckMode{"MARKDOWN_ONLY\n= true?"}
CheckMode -->|Yes| CopyMd["Copy WIKI_DIR to\nOUTPUT_DIR/markdown"]
CopyMd --> ExitMd[["Exit: markdown-only"]]
    
 
   CheckMode -->|No| InitBook["Create BOOK_DIR\nand book.toml"]
InitBook --> GenSummary["Generate SUMMARY.md\nfrom file structure"]
GenSummary --> CopySrc["Copy WIKI_DIR to\nBOOK_DIR/src"]
CopySrc --> InstallMermaid["mdbook-mermaid install"]
InstallMermaid --> BuildBook["mdbook build"]
BuildBook --> CopyOutputs["Copy outputs:\nbook/, markdown/, book.toml"]
CopyOutputs --> Success[["Exit: build complete"]]
    
    style Start fill:#f9f9f9
    style Phase1 fill:#e8f5e9
    style CheckMode fill:#fff9c4
    style ExitMd fill:#f9f9f9
    style Success fill:#f9f9f9
    style Error fill:#ffebee

Sources: build-docs.sh:1-206

Key Decision Point: MARKDOWN_ONLY Mode

The MARKDOWN_ONLY environment variable creates two distinct execution paths in the orchestrator. When set to "true", the script bypasses mdBook configuration generation and building (Phase 3), providing a fast path for debugging content extraction and diagram placement.

Sources: build-docs.sh26 build-docs.sh:60-76

Configuration Handling

Auto-Detection System

The script implements an intelligent auto-detection system for the REPO variable when running in a Git repository context:

flowchart LR
 
   Start["REPO variable"] --> Check{"REPO set?"}
Check -->|Yes| UseProvided["Use provided value"]
Check -->|No| GitCheck{"Inside Git\nrepository?"}
GitCheck -->|No| RequireManual["REPO remains empty"]
GitCheck -->|Yes| GetRemote["git config --get\nremote.origin.url"]
GetRemote --> Extract["Extract owner/repo\nusing sed regex"]
Extract --> SetRepo["Set REPO variable"]
UseProvided --> Validate["Validation check"]
SetRepo --> Validate
 
   RequireManual --> Validate
    
 
   Validate --> ValidCheck{"REPO\nis set?"}
ValidCheck -->|No| ExitError[["Exit with error:\nREPO must be set"]]
 
   ValidCheck -->|Yes| Continue["Continue execution"]

The regular expression used for extraction handles multiple GitHub URL formats:

https://github.com/owner/repo.git
git@github.com:owner/repo.git
https://github.com/owner/repo

Sources: build-docs.sh:8-19 build-docs.sh:32-37

Configuration Variable Flow

The script manages five primary configuration variables, with the following precedence and default logic:

Variable	Source	Default Derivation	Code Reference
`REPO`	Environment or Git auto-detect	(required)	build-docs.sh8-22
`BOOK_TITLE`	Environment	`"Documentation"`	build-docs.sh23
`BOOK_AUTHORS`	Environment	`$REPO_OWNER`	build-docs.sh:40-44
`GIT_REPO_URL`	Environment	`https://github.com/$REPO`	build-docs.sh:40-45
`MARKDOWN_ONLY`	Environment	`"false"`	build-docs.sh26

The script extracts REPO_OWNER and REPO_NAME from the REPO variable using shell string manipulation:

Sources: build-docs.sh:39-45

Working Directory Structure

The orchestrator uses four primary directory paths:

WORK_DIR="/workspace": Temporary workspace for all build operations
WIKI_DIR="$WORK_DIR/wiki": Scraper output location
BOOK_DIR="$WORK_DIR/book": mdBook project directory
OUTPUT_DIR="/output": Volume-mounted final output location

Sources: build-docs.sh:27-30

Phase Orchestration

Phase 1: Scraper Invocation

The orchestrator invokes the Python scraper with exactly two positional arguments:

This command executes the complete Phase 1 and Phase 2 pipeline as documented in Phase 1: Markdown Extraction and Phase 2: Diagram Enhancement. The scraper writes all output to $WIKI_DIR.

Sources: build-docs.sh:55-58

Phase 3: mdBook Configuration and Build

Phase 3 is implemented through six distinct steps in the orchestrator:

Note: Step numbering in stdout messages is off-by-one from phase numbering because the scraper is "Step 1."

Sources: build-docs.sh:78-191

Configuration File Generation

flowchart LR
    EnvVars["Environment variables:\nBOOK_TITLE\nBOOK_AUTHORS\nGIT_REPO_URL"]
Template["Heredoc template\nat line 85-103"]
BookToml["BOOK_DIR/book.toml"]
EnvVars --> Template
 
   Template --> BookToml
    
 
   BookToml --> MdBook["mdbook build"]

book.toml Generation

The orchestrator dynamically generates the book.toml configuration file for mdBook using a heredoc:

The generated book.toml includes:

[book] section: title, authors, language, multilingual, src
[output.html] section: default-theme, git-repository-url
[preprocessor.mermaid] section: command
[output.html.fold] section: enable, level

The git-repository-url setting enables mdBook's "Edit this page" functionality, linking back to the GitHub repository specified in $GIT_REPO_URL.

Sources: build-docs.sh:84-103

flowchart TD
    Start["Begin SUMMARY.md generation"]
Start --> FindFirst["Find first .md file\nin WIKI_DIR root"]
FindFirst --> ExtractTitle1["Extract title from\nfirst line (# Title)"]
ExtractTitle1 --> WriteIntro["Write as Introduction link"]
WriteIntro --> IterateMain["Iterate *.md files\nin WIKI_DIR root"]
IterateMain --> SkipFirst{"Is this\nfirst file?"}
SkipFirst -->|Yes| NextFile["Skip to next file"]
SkipFirst -->|No| ExtractTitle2["Extract title\nfrom first line"]
ExtractTitle2 --> GetSectionNum["Extract section number\nusing grep regex"]
GetSectionNum --> CheckSubdir{"section-N/\ndirectory exists?"}
CheckSubdir -->|No| WriteStandalone["Write as standalone:\n- [Title](file.md)"]
CheckSubdir -->|Yes| WriteSection["Write section header:\n# Title"]
WriteSection --> WriteMainLink["Write main page link:\n- [Title](file.md)"]
WriteMainLink --> IterateSubs["Iterate section-N/*.md"]
IterateSubs --> WriteSubLinks["Write indented sub-links:\n - [SubTitle](section-N/file.md)"]
WriteStandalone --> NextFile
 
   WriteSubLinks --> NextFile
    
 
   NextFile --> MoreFiles{"More\nfiles?"}
MoreFiles -->|Yes| IterateMain
 
   MoreFiles -->|No| WriteSummary["Write to BOOK_DIR/src/SUMMARY.md"]
WriteSummary --> Done["Generation complete"]

SUMMARY.md Generation Algorithm

The orchestrator generates the table of contents (SUMMARY.md) by scanning the actual file structure in $WIKI_DIR. This dynamic generation ensures the table of contents always matches the scraped content.

The algorithm extracts titles by reading the first line of each Markdown file and removing the # prefix using sed:

Section numbers are extracted using grep with a regex pattern:

For detailed information about how the file structure is organized, see Wiki Structure Discovery.

Sources: build-docs.sh:108-159

File Operations

Copy Operations Mapping

The orchestrator performs strategic copy operations to move data through the pipeline:

Source	Destination	Purpose	Code Reference
`$WIKI_DIR/*`	`$OUTPUT_DIR/markdown/`	Markdown-only mode output	build-docs.sh65
`$WIKI_DIR/*`	`$BOOK_DIR/src/`	Source files for mdBook	build-docs.sh166
`$BOOK_DIR/book`	`$OUTPUT_DIR/book/`	Final HTML output	build-docs.sh184
`$WIKI_DIR/*`	`$OUTPUT_DIR/markdown/`	Markdown reference copy	build-docs.sh188
`$BOOK_DIR/book.toml`	`$OUTPUT_DIR/book.toml`	Configuration reference	build-docs.sh191

The final output structure in $OUTPUT_DIR is:

/output/
├── book/              # HTML documentation (from BOOK_DIR/book)
│   ├── index.html
│   ├── *.html
│   └── ...
├── markdown/          # Source Markdown files (from WIKI_DIR)
│   ├── 1-overview.md
│   ├── 2-section.md
│   ├── section-2/
│   └── ...
└── book.toml          # Configuration copy (from BOOK_DIR)

Sources: build-docs.sh:178-191

Atomic Output Management

The orchestrator uses a two-stage directory strategy for atomic outputs:

Working stage : All operations occur in /workspace (ephemeral)
Output stage : Final artifacts are copied to /output (volume-mounted)

This ensures that partial builds never appear in the output directory—only completed artifacts are copied. If any step fails, the set -e directive at build-docs.sh2 causes immediate script termination with no partial outputs.

Sources: build-docs.sh2 build-docs.sh:27-30 build-docs.sh:178-191

Tool Invocations

External Command Execution

The orchestrator invokes three external tools during execution:

Each tool is invoked with specific working directories and arguments:

Python scraper invocation (build-docs.sh58):

mdbook-mermaid installation (build-docs.sh171):

This installs the necessary JavaScript and CSS assets for Mermaid diagram rendering into the mdBook project.

mdBook build (build-docs.sh176):

Executed from within $BOOK_DIR due to cd "$BOOK_DIR" at build-docs.sh82

Sources: build-docs.sh58 build-docs.sh82 build-docs.sh171 build-docs.sh176

Error Handling

Validation and Exit Conditions

The script implements minimal but critical validation:

The set -e directive at build-docs.sh2 ensures that any command failure (non-zero exit code) immediately terminates the script. This includes:

HTTP failures in the Python scraper
File system errors during copy operations
mdBook build failures
mdbook-mermaid installation failures

The only explicit validation check is for the REPO variable at build-docs.sh:32-37 which prints usage instructions and exits with code 1 if not set.

Sources: build-docs.sh2 build-docs.sh:32-37

Stdout Output Format

The orchestrator provides structured console output for monitoring build progress:

================================================================================
DeepWiki Documentation Builder
================================================================================

Configuration:
  Repository:    owner/repo
  Book Title:    Documentation Title
  Authors:       Author Name
  Git Repo URL:  https://github.com/owner/repo
  Markdown Only: false

Step 1: Scraping wiki from DeepWiki...
[scraper output...]

Step 2: Initializing mdBook structure...

Step 3: Generating SUMMARY.md from scraped content...
Generated SUMMARY.md with N entries

Step 4: Copying markdown files to book...

Step 5: Installing mdbook-mermaid assets...

Step 6: Building mdBook...

Step 7: Copying outputs to /output...

================================================================================
✓ Documentation build complete!
================================================================================

Outputs:
  - HTML book:       /output/book/
  - Markdown files:  /output/markdown/
  - Book config:     /output/book.toml

To serve the book locally:
  cd /output && python3 -m http.server --directory book 8000

Each step is clearly labeled with progress indicators. The configuration block is printed before processing begins to aid in debugging.

Sources: build-docs.sh:4-6 build-docs.sh:47-53 build-docs.sh:55-205

Keyboard shortcuts

deepwiki-to-mdbook Documentation