Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

build-docs.sh Orchestrator

Loading…

build-docs.sh Orchestrator

Relevant source files

Purpose and Scope

The build-docs.sh script is the main orchestration layer for the DeepWiki-to-mdBook conversion system. It coordinates all components of the three-phase pipeline, manages configuration, handles environment variable processing, and produces the final output artifacts. This document covers the script’s responsibilities, execution flow, configuration management, and integration with other system components.

For details on the components orchestrated by this script, see deepwiki-scraper.py, Template System, and mdBook Integration. For information on the three-phase architecture, see Three-Phase Pipeline.

Role and Responsibilities

The orchestrator serves as the single entry point for the documentation build process. It is invoked as the Docker container’s default command and coordinates all system components in a sequential, deterministic manner.

Key Responsibilities:

ResponsibilityImplementation
Configuration ManagementValidates and sets defaults for all environment variables
Auto-detectionDiscovers repository information from Git remotes
Component CoordinationInvokes deepwiki-scraper.py, process-template.py, mdbook, and mdbook-mermaid
Error HandlingUses set -e for fail-fast behavior on any component failure
Output ManagementOrganizes all artifacts into /output directory structure
Mode SelectionSupports standard and markdown-only execution modes
Template ProcessingCoordinates header/footer injection into all markdown files

Sources: scripts/build-docs.sh:1-310

Architecture Overview

The following diagram maps the orchestrator’s workflow to actual code entities and directory paths used in the script:

Diagram: Orchestrator Component Integration

graph TB
    Entry["Entry Point\nbuild-docs.sh"]
subgraph "Configuration Phase"
        AutoDetect["Git Auto-detection\nlines 8-19"]
EnvVars["Environment Variables\nREPO, BOOK_TITLE, etc.\nlines 21-26"]
Defaults["Default Generation\nBOOK_AUTHORS, GIT_REPO_URL\nlines 44-51"]
Validate["Validation\nREPO check\nlines 33-38"]
end
    
    subgraph "Execution Phase"
        Step1["Step 1: deepwiki-scraper.py\n$REPO → $WIKI_DIR\nlines 61-65"]
Decision{{"MARKDOWN_ONLY\n?\nlines 68"}}
MarkdownExit["Copy to /output/markdown\nlines 69-93"]
Step2["Step 2: mkdir $BOOK_DIR\nCreate book.toml\nlines 95-119"]
Step3["Step 3: Generate SUMMARY.md\nDiscover structure\nlines 124-188"]
Step4["Step 4: process-template.py\nInject headers/footers\nlines 190-261"]
Step5["Step 5: mdbook-mermaid install\nlines 263-266"]
Step6["Step 6: mdbook build\nlines 268-271"]
Step7["Step 7: Copy to /output\nlines 273-295"]
end
    
 
   Entry --> AutoDetect
 
   AutoDetect --> EnvVars
 
   EnvVars --> Defaults
 
   Defaults --> Validate
 
   Validate --> Step1
 
   Step1 --> Decision
 
   Decision -->|true| MarkdownExit
 
   Decision -->|false| Step2
 
   Step2 --> Step3
 
   Step3 --> Step4
 
   Step4 --> Step5
 
   Step5 --> Step6
 
   Step6 --> Step7
    
 
   MarkdownExit --> OutputMarkdown["/output/markdown/"]
MarkdownExit --> OutputRaw["/output/raw_markdown/"]
Step7 --> OutputBook["/output/book/"]
Step7 --> OutputMarkdown
 
   Step7 --> OutputRaw
 
   Step7 --> OutputConfig["/output/book.toml"]

Sources: scripts/build-docs.sh:1-310

Configuration Management

The script implements a sophisticated configuration system with automatic detection, environment variable overrides, and sensible defaults.

Auto-Detection Logic

The script attempts to automatically detect the repository from Git metadata if REPO is not explicitly set:

Diagram: Repository Auto-Detection Flow

flowchart TD
    Start["Check if REPO\nenvironment variable set"]
Start -->|Not set| CheckGit["Check if .git directory exists\ngit rev-parse --git-dir"]
Start -->|Set| UseProvided["Use provided REPO value"]
CheckGit -->|Yes| GetRemote["Get remote.origin.url\ngit config --get remote.origin.url"]
CheckGit -->|No| RequireManual["Require manual REPO setting"]
GetRemote -->|Found| ExtractOwnerRepo["Extract owner/repo using sed\nPattern: github.com[:/]owner/repo"]
GetRemote -->|Not found| RequireManual
    
 
   ExtractOwnerRepo --> SetRepo["Set REPO variable"]
UseProvided --> SetRepo
 
   SetRepo --> Validate["Validate REPO is not empty"]
RequireManual --> Validate
    
 
   Validate -->|Empty| Error["Exit with error\nlines 34-37"]
Validate -->|Valid| Continue["Continue execution"]

The regex pattern at scripts/build-docs.sh16 handles multiple GitHub URL formats:

  • https://github.com/owner/repo.git
  • git@github.com:owner/repo.git
  • https://github.com/owner/repo

Sources: scripts/build-docs.sh:8-19 scripts/build-docs.sh:33-38

Configuration Variables

The following table documents all configuration variables managed by the orchestrator:

VariableDefaultDerivationLine Reference
REPOAuto-detectedExtracted from git remote.origin.urlscripts/build-docs.sh:9-19
BOOK_TITLE"Documentation"Nonescripts/build-docs.sh23
BOOK_AUTHORSRepository ownerExtracted from $REPO (first segment)scripts/build-docs.sh45
GIT_REPO_URLGitHub URLConstructed from $REPOscripts/build-docs.sh46
MARKDOWN_ONLY"false"Nonescripts/build-docs.sh26
WORK_DIR"/workspace"Fixedscripts/build-docs.sh27
WIKI_DIR"/workspace/wiki"Fixedscripts/build-docs.sh28
RAW_DIR"/workspace/raw_markdown"Fixedscripts/build-docs.sh29
OUTPUT_DIR"/output"Fixedscripts/build-docs.sh30
BOOK_DIR"/workspace/book"Fixedscripts/build-docs.sh31

Computed variables derived from $REPO:

VariableComputationLine Reference
REPO_OWNER`echo “$REPO”cut -d’/’ -f1`
REPO_NAME`echo “$REPO”cut -d’/’ -f2`
DEEPWIKI_URL"https://deepwiki.com/$REPO"scripts/build-docs.sh48
DEEPWIKI_BADGE_URL"https://deepwiki.com/badge.svg"scripts/build-docs.sh49
REPO_BADGE_LABELURL-encoded with dash escapingscripts/build-docs.sh50
GITHUB_BADGE_URLShields.io badge URLscripts/build-docs.sh51

Sources: scripts/build-docs.sh:21-51

sequenceDiagram
    participant Script as build-docs.sh
    participant Scraper as deepwiki-scraper.py
    participant FileSystem as File System
    participant Templates as process-template.py
    participant MDBook as mdbook
    participant Mermaid as mdbook-mermaid
    
    Note over Script: Configuration Phase
    Script->>Script: Auto-detect REPO
    Script->>Script: Set defaults
    Script->>Script: Validate configuration
    
    Note over Script: Step 1: Scraping
    Script->>FileSystem: rm -rf $RAW_DIR
    Script->>Scraper: python3 deepwiki-scraper.py $REPO $WIKI_DIR
    Scraper-->>FileSystem: Write markdown to $WIKI_DIR
    Scraper-->>FileSystem: Write raw snapshots to $RAW_DIR
    
    alt MARKDOWN_ONLY == true
        Note over Script: Markdown-Only Exit Path
        Script->>FileSystem: cp $WIKI_DIR to /output/markdown
        Script->>FileSystem: cp $RAW_DIR to /output/raw_markdown
        Script->>Script: Exit (skip HTML build)
    else MARKDOWN_ONLY == false
        Note over Script: Step 2: mdBook Initialization
        Script->>FileSystem: mkdir -p $BOOK_DIR/src
        Script->>FileSystem: Create book.toml
        
        Note over Script: Step 3: SUMMARY.md Generation
        Script->>FileSystem: Scan $WIKI_DIR for .md files
        Script->>FileSystem: Generate src/SUMMARY.md
        
        Note over Script: Step 4: Template Processing
        Script->>Templates: process-template.py $HEADER_TEMPLATE
        Templates-->>Script: Processed HEADER_HTML
        Script->>Templates: process-template.py $FOOTER_TEMPLATE
        Templates-->>Script: Processed FOOTER_HTML
        Script->>FileSystem: cp $WIKI_DIR/* to src/
        Script->>FileSystem: Inject header/footer into all .md files
        
        Note over Script: Step 5: Mermaid Installation
        Script->>Mermaid: mdbook-mermaid install $BOOK_DIR
        Mermaid-->>FileSystem: Install mermaid.js assets
        
        Note over Script: Step 6: Build
        Script->>MDBook: mdbook build
        MDBook-->>FileSystem: Generate book/ directory
        
        Note over Script: Step 7: Output Collection
        Script->>FileSystem: cp book to /output/book
        Script->>FileSystem: cp $WIKI_DIR to /output/markdown
        Script->>FileSystem: cp $RAW_DIR to /output/raw_markdown
        Script->>FileSystem: cp book.toml to /output/book.toml
    end

Execution Flow

The orchestrator follows a seven-step execution sequence, with conditional branching for markdown-only mode:

Diagram: Step-by-Step Execution Sequence

Sources: scripts/build-docs.sh:61-310

Step Details

Step 1: Wiki Scraping

Lines: scripts/build-docs.sh:61-65

Invokes the Python scraper to fetch and convert DeepWiki content:

The scraper writes output to two locations:

  • $WIKI_DIR (/workspace/wiki): Enhanced markdown with injected diagrams
  • $RAW_DIR (/workspace/raw_markdown): Pre-enhancement markdown snapshots for debugging

For details on the scraper’s operation, see deepwiki-scraper.py.

Sources: scripts/build-docs.sh:61-65

Step 2: mdBook Structure Initialization

Lines: scripts/build-docs.sh:95-119

Skipped if: MARKDOWN_ONLY=true

Creates the mdBook directory structure and generates book.toml configuration:

$BOOK_DIR/
├── book.toml
└── src/

The book.toml file is generated using a heredoc with variable substitution:

Configuration SectionVariables UsedPurpose
[book]$BOOK_TITLE, $BOOK_AUTHORSBook metadata
[output.html]$GIT_REPO_URLRepository link in UI
[preprocessor.mermaid]N/AEnable mermaid diagrams
[output.html.fold]N/AEnable section folding

Sources: scripts/build-docs.sh:95-119

Step 3: SUMMARY.md Generation

Lines: scripts/build-docs.sh:124-188

flowchart TD
    Start["Start SUMMARY.md Generation"]
Start --> WriteHeader["Write '# Summary' header"]
WriteHeader --> FindOverview["Find overview file\ngrep -Ev '^[0-9]'"]
FindOverview -->|Found| WriteOverview["Write overview entry\nExtract title from first line"]
FindOverview -->|Not found| ListMain
 
   WriteOverview --> ListMain["List all main pages\nls $WIKI_DIR/*.md"]
ListMain --> FilterOverview["Filter out overview file"]
FilterOverview --> NumericSort["Sort numerically\nsort -t- -k1 -n"]
NumericSort --> ProcessLoop["For each file"]
ProcessLoop --> ExtractTitle["Extract title\nhead -1 /sed 's/^# //'"]
ExtractTitle --> GetSectionNum["Extract section number grep -oE '^[0-9]+'"]
GetSectionNum --> CheckSubdir{"Subsection directory section-N exists?"}
CheckSubdir -->|Yes|WriteSection["Write section entry - [title] filename"]
WriteSection --> ListSubs["List subsection files ls section-N/*.md"]
ListSubs --> SortSubs["Sort numerically sort -t- -k1 -n"]
SortSubs --> WriteSubLoop["For each subsection: - [subtitle] section-N/file"]
WriteSubLoop --> NextFile
 CheckSubdir -->|No|WriteStandalone["Write standalone entry - [title] filename"]
WriteStandalone --> NextFile{"More files?"}
NextFile -->|Yes|ProcessLoop
 NextFile -->|No| Complete["Complete src/SUMMARY.md"]

Dynamically generates the table of contents by discovering the file structure in $WIKI_DIR. This step implements numeric sorting and hierarchical organization.

Diagram: SUMMARY.md Generation Algorithm

Key implementation details:

Overview Page Detection: scripts/build-docs.sh:136-144

  • Searches for files without numeric prefix
  • Typically matches Overview.md or similar

Numeric Sorting: scripts/build-docs.sh:147-155

  • Uses sort -t- -k1 -n to sort by numeric prefix
  • Handles formats like 1-Title.md, 2.1-Subtopic.md

Hierarchy Detection: scripts/build-docs.sh:165-180

  • Checks for section-N/ directories for each numeric section
  • Creates indented entries for subsections

Sources: scripts/build-docs.sh:124-188

Step 4: Template Processing and File Copying

Lines: scripts/build-docs.sh:190-261

flowchart LR
    subgraph "Template Loading"
        HeaderPath["$HEADER_TEMPLATE\n/workspace/templates/header.html"]
FooterPath["$FOOTER_TEMPLATE\n/workspace/templates/footer.html"]
GenDate["GENERATION_DATE\ndate -u command"]
end
    
    subgraph "Variable Substitution"
        ProcessH["process-template.py\n$HEADER_TEMPLATE"]
ProcessF["process-template.py\n$FOOTER_TEMPLATE"]
Vars["Variables passed:\nDEEPWIKI_URL\nDEEPWIKI_BADGE_URL\nGIT_REPO_URL\nGITHUB_BADGE_URL\nREPO\nBOOK_TITLE\nBOOK_AUTHORS\nGENERATION_DATE"]
end
    
    subgraph "Injection"
        CopyFiles["cp $WIKI_DIR/* to src/"]
InjectLoop["For each .md file:\nsrc/*.md src/*/*.md"]
CreateTemp["Create temp file:\nHEADER + content + FOOTER"]
Replace["mv temp to original"]
end
    
 
   HeaderPath --> ProcessH
 
   FooterPath --> ProcessF
 
   GenDate --> Vars
 
   Vars --> ProcessH
 
   Vars --> ProcessF
 
   ProcessH --> HeaderHTML["HEADER_HTML variable"]
ProcessF --> FooterHTML["FOOTER_HTML variable"]
CopyFiles --> InjectLoop
 
   HeaderHTML --> InjectLoop
 
   FooterHTML --> InjectLoop
 
   InjectLoop --> CreateTemp
 
   CreateTemp --> Replace

Processes header and footer templates and injects them into all markdown files.

Template Processing Flow:

Diagram: Template Processing Pipeline

The template processor is invoked with all configuration variables as arguments: scripts/build-docs.sh:205-213 scripts/build-docs.sh:222-230

File injection pattern: scripts/build-docs.sh:243-257

  • Processes all .md files in src/ and src/*/
  • Creates temporary file with header + original content + footer
  • Replaces original with modified version

For details on the template system and variable substitution, see Template System.

Sources: scripts/build-docs.sh:190-261

Step 5: Mermaid Installation

Lines: scripts/build-docs.sh:263-266

Installs mdbook-mermaid preprocessor assets into the book directory:

This command installs the mermaid.js library and initialization code required for client-side diagram rendering in the final HTML output.

Sources: scripts/build-docs.sh:263-266

Step 6: Book Build

Lines: scripts/build-docs.sh:268-271

Executes the mdBook build process:

Build Process:

  1. Reads book.toml configuration
  2. Processes src/SUMMARY.md to determine structure
  3. Applies mermaid preprocessor to all markdown files
  4. Converts markdown to HTML with search indexing
  5. Outputs to $BOOK_DIR/book/ directory

For more information on mdBook integration, see mdBook Integration.

Sources: scripts/build-docs.sh:268-271

Step 7: Output Collection

Lines: scripts/build-docs.sh:273-295

Copies all build artifacts to the /output volume mount for persistence:

SourceDestinationDescription
$BOOK_DIR/book//output/book/Built HTML documentation
$WIKI_DIR//output/markdown/Enhanced markdown files
$RAW_DIR//output/raw_markdown/Pre-enhancement markdown (if exists)
$BOOK_DIR/book.toml/output/book.tomlBook configuration reference

The script ensures clean output by removing existing directories before copying: scripts/build-docs.sh:282-290

Sources: scripts/build-docs.sh:273-295

Markdown-Only Mode

When MARKDOWN_ONLY=true, the orchestrator follows a shortened execution path that skips HTML generation:

Execution Path:

  1. Step 1: Scrape wiki (normal)
  2. Copy $WIKI_DIR to /output/markdown/
  3. Copy $RAW_DIR to /output/raw_markdown/ (if exists)
  4. Exit with success

Use Cases:

  • Debugging the scraper output without full build overhead
  • Extracting markdown for alternative processing pipelines
  • CI/CD test workflows that only validate markdown generation
  • Custom post-processing before HTML generation

Implementation: scripts/build-docs.sh:68-93

Sources: scripts/build-docs.sh:68-93

Error Handling

The orchestrator implements fail-fast error handling:

Error Handling Mechanisms:

MechanismImplementationLine Reference
Exit on any errorset -escripts/build-docs.sh2
Configuration validationExplicit REPO check with error messagescripts/build-docs.sh:33-38
Component failuresAutomatic propagation due to set -eAll component invocations
Template warningsNon-fatal warnings if templates not foundscripts/build-docs.sh:215-216 scripts/build-docs.sh:232-233

The script does not use explicit error trapping; instead, it relies on Bash’s set -e behavior to immediately exit if any command returns a non-zero status. This ensures that failures in any component (scraper, template processor, mdBook) halt execution and propagate to the Docker container exit code.

Sources: scripts/build-docs.sh2 scripts/build-docs.sh:33-38 scripts/build-docs.sh:215-216 scripts/build-docs.sh:232-233

graph TB
    Orchestrator["build-docs.sh"]
subgraph "Python Components"
        Scraper["deepwiki-scraper.py\nArgs: REPO, WIKI_DIR"]
Templates["process-template.py\nArgs: template_path, var1=val1, ..."]
end
    
    subgraph "Build Tools"
        MDBook["mdbook build\nWorking dir: $BOOK_DIR"]
Mermaid["mdbook-mermaid install\nArgs: $BOOK_DIR"]
end
    
    subgraph "File System"
        Input["Input:\n/workspace/templates/"]
Working["Working:\n$WIKI_DIR\n$RAW_DIR\n$BOOK_DIR"]
Output["Output:\n/output/"]
end
    
    subgraph "Environment"
        EnvVars["Environment Variables:\nREPO, BOOK_TITLE,\nBOOK_AUTHORS, etc."]
end
    
 
   EnvVars --> Orchestrator
 
   Input --> Templates
    
 
   Orchestrator -->|python3| Scraper
 
   Orchestrator -->|python3| Templates
 
   Orchestrator -->|mdbook| MDBook
 
   Orchestrator -->|mdbook-mermaid| Mermaid
    
 
   Scraper --> Working
 
   Templates --> Orchestrator
 
   Orchestrator --> Working
 
   MDBook --> Working
 
   Mermaid --> Working
    
 
   Orchestrator --> Output

Integration Points

The orchestrator integrates with multiple system components through well-defined interfaces:

Diagram: Component Integration Interfaces

Interface Specifications:

deepwiki-scraper.py:

  • Invocation: python3 /usr/local/bin/deepwiki-scraper.py $REPO $WIKI_DIR
  • Input: Repository identifier (e.g., "jzombie/deepwiki-to-mdbook")
  • Output: Markdown files in $WIKI_DIR, raw snapshots in $RAW_DIR
  • Documentation: deepwiki-scraper.py

process-template.py:

  • Invocation: python3 /usr/local/bin/process-template.py $TEMPLATE_PATH var1=val1 var2=val2 ...
  • Input: Template file path and variable assignments
  • Output: Processed HTML string to stdout
  • Documentation: Template System

mdbook:

  • Invocation: mdbook build (in $BOOK_DIR)
  • Input: book.toml and src/ directory structure
  • Output: HTML in book/ subdirectory
  • Documentation: mdBook Integration

mdbook-mermaid:

  • Invocation: mdbook-mermaid install $BOOK_DIR
  • Input: Book directory path
  • Output: Mermaid assets installed in book directory
  • Documentation: mdBook Integration

Sources: scripts/build-docs.sh65 scripts/build-docs.sh:205-213 scripts/build-docs.sh:222-230 scripts/build-docs.sh266 scripts/build-docs.sh271

Output Artifacts

The orchestrator produces a structured output directory with multiple artifact types:

/output/
├── book/                    # Searchable HTML documentation (Step 7)
│   ├── index.html
│   ├── searchindex.js
│   ├── mermaid.min.js
│   └── ...
├── markdown/                # Enhanced markdown files (Step 7 or markdown-only)
│   ├── Overview.md
│   ├── 1-First-Section.md
│   ├── section-1/
│   │   └── 1.1-Subsection.md
│   └── ...
├── raw_markdown/            # Pre-enhancement snapshots (if available)
│   ├── Overview.md
│   ├── 1-First-Section.md
│   └── ...
└── book.toml                # Book configuration reference (Step 7)

Artifact Generation Timeline:

ArtifactGenerated ByWhenPurpose
raw_markdown/deepwiki-scraper.pyStep 1Debug: pre-enhancement state
markdown/deepwiki-scraper.pyStep 1Final markdown with diagrams
book.tomlbuild-docs.shStep 2Book configuration reference
book/mdbookStep 6Final HTML documentation

Sources: scripts/build-docs.sh:273-295

Dismiss

Refresh this wiki

Enter email to refresh