Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

System Architecture

Loading…

System Architecture

Relevant source files

This page provides a comprehensive overview of the DeepWiki-to-mdBook converter’s architecture, including its component organization, execution model, and data flow patterns. The system is designed as a containerized pipeline that transforms DeepWiki content into searchable mdBook documentation through three distinct processing phases.

For detailed information about the three-phase transformation pipeline, see Three-Phase Pipeline. For Docker-specific implementation details, see Docker Multi-Stage Build.

Architectural Overview

The system follows a pipeline architecture with three sequential phases, orchestrated by a shell script and executed within a Docker container. All components are stateless and communicate through the filesystem, with no external dependencies required at runtime.

graph TB
    subgraph Docker["Docker Container (python:3.12-slim)"]
subgraph Executables["/usr/local/bin/"]
BuildScript["build-docs.sh"]
Scraper["deepwiki-scraper.py"]
TemplateProc["process-template.py"]
mdBook["mdbook"]
mdBookMermaid["mdbook-mermaid"]
end
        
        subgraph Workspace["/workspace/"]
Templates["templates/\nheader.html\nfooter.html"]
WorkingDirs["wiki/\nraw_markdown/\nbook/"]
end
        
        subgraph Output["/output/ (Volume Mount)"]
BookHTML["book/"]
MarkdownSrc["markdown/"]
RawMarkdown["raw_markdown/"]
BookConfig["book.toml"]
end
    end
    
    subgraph External["External Dependencies"]
DeepWiki["deepwiki.com"]
GitRemote["git remote"]
end
    
 
   BuildScript -->|executes| Scraper
 
   BuildScript -->|executes| TemplateProc
 
   BuildScript -->|executes| mdBook
 
   BuildScript -->|executes| mdBookMermaid
    
 
   Scraper -->|writes| WorkingDirs
 
   TemplateProc -->|reads| Templates
 
   TemplateProc -->|outputs HTML| BuildScript
 
   mdBook -->|reads| WorkingDirs
 
   mdBook -->|writes| BookHTML
    
 
   BuildScript -->|copies to| Output
    
 
   Scraper -->|HTTP GET| DeepWiki
 
   BuildScript -->|auto-detect| GitRemote
    
    style Executables fill:#f0f0f0
    style Workspace fill:#f0f0f0
    style Output fill:#e8f5e9

System Composition

Diagram: Container Internal Structure and Component Relationships

The container is structured into three main areas: executables in /usr/local/bin/, working files in /workspace/, and outputs in /output/. The build-docs.sh orchestrator coordinates all components, with persistent results written to the mounted volume.

Sources: Dockerfile:1-34 scripts/build-docs.sh:1-310

Core Components

The system consists of five primary components, each with a specific responsibility in the documentation generation pipeline.

ComponentTypeLocationPrimary Responsibility
build-docs.shShell Script/usr/local/bin/Pipeline orchestration and configuration management
deepwiki-scraper.pyPython Script/usr/local/bin/Wiki content extraction and markdown conversion
process-template.pyPython Script/usr/local/bin/Template variable substitution
mdbookRust Binary/usr/local/bin/HTML documentation generation
mdbook-mermaidRust Binary/usr/local/bin/Mermaid diagram rendering

Component Interaction Map

Diagram: Component Interaction and Data Flow

This diagram shows how build-docs.sh coordinates the three processing components sequentially, with data flowing through working directories before final output to the mounted volume.

Sources: scripts/build-docs.sh:1-310 Dockerfile:20-33

Execution Flow

The system follows a strictly sequential execution model, with each step depending on the output of the previous step. This design simplifies error handling and allows for debugging at intermediate stages.

Build Script Orchestration

The build-docs.sh script orchestrates the entire pipeline through seven distinct steps:

  1. Configuration & Validation scripts/build-docs.sh:8-59

    • Auto-detects REPO from git remote if not provided
    • Sets defaults for BOOK_TITLE, BOOK_AUTHORS, GIT_REPO_URL
    • Validates required configuration
    • Computes derived URLs (DEEPWIKI_URL, badge URLs)
  2. Wiki Scraping scripts/build-docs.sh:61-65

    • Executes deepwiki-scraper.py with repository identifier
    • Writes to /workspace/wiki/ and /workspace/raw_markdown/
  3. Early Exit (Markdown-Only Mode) scripts/build-docs.sh:67-93

    • Optional: skip HTML build if MARKDOWN_ONLY=true
    • Copies markdown directly to output volume
  4. mdBook Initialization scripts/build-docs.sh:95-122

    • Creates /workspace/book/ structure
    • Generates book.toml configuration
    • Initializes src/ directory
  5. SUMMARY.md Generation scripts/build-docs.sh:124-188

    • Scans wiki directory for .md files
    • Sorts numerically by page number prefix
    • Builds hierarchical table of contents
    • Handles subsections in section-N/ directories
  6. Template Processing & Injection scripts/build-docs.sh:190-261

    • Processes header.html and footer.html with variable substitution
    • Injects processed HTML into every markdown file
    • Copies enhanced markdown to book/src/
  7. mdBook Build scripts/build-docs.sh:263-271

    • Installs mermaid assets via mdbook-mermaid install
    • Executes mdbook build
    • Generates searchable HTML in book/
  8. Output Copying scripts/build-docs.sh:273-309

    • Copies all artifacts to /output/ volume mount
    • Preserves intermediate outputs for debugging

Diagram: build-docs.sh Sequential Execution Flow

Sources: scripts/build-docs.sh:1-310

Three-Phase Pipeline Architecture

The core transformation happens in three distinct phases, each with specific inputs, processing logic, and outputs. This separation allows for independent testing and debugging of each phase.

Phase Overview

PhasePrimary ComponentInputOutputKey Operations
Phase 1: Extractiondeepwiki-scraper.pyDeepWiki HTMLMarkdown filesStructure discovery, HTML→Markdown conversion, raw diagram extraction
Phase 2: Enhancementdeepwiki-scraper.pyRaw markdown + diagramsEnhanced markdownDiagram normalization, fuzzy matching, template injection
Phase 3: Buildmdbook + mdbook-mermaidEnhanced markdownSearchable HTMLSUMMARY generation, mermaid rendering, search index

Diagram: Three-Phase Pipeline with Key Functions

For detailed documentation of each phase, see Three-Phase Pipeline.

Sources: scripts/build-docs.sh:61-271 README.md:72-77

graph TB
    subgraph Stage1["Stage 1: Builder (rust:latest)"]
RustToolchain["Rust Toolchain\ncargo, rustc"]
CargoInstall["cargo install mdbook\ncargo install mdbook-mermaid"]
Binaries["/usr/local/cargo/bin/\nmdbook\nmdbook-mermaid"]
RustToolchain --> CargoInstall
 
       CargoInstall --> Binaries
    end
    
    subgraph Stage2["Stage 2: Runtime (python:3.12-slim)"]
PythonBase["Python 3.12 Runtime"]
PipInstall["pip install requirements"]
CopyBinaries["COPY --from=builder\nmdbook binaries"]
CopyScripts["COPY Python scripts\nCOPY Shell scripts\nCOPY Templates"]
FinalImage["Final Image\n~500MB"]
PythonBase --> PipInstall
 
       PipInstall --> CopyBinaries
 
       CopyBinaries --> CopyScripts
 
       CopyScripts --> FinalImage
    end
    
 
   Binaries -.->|copy only| CopyBinaries
    
    style Stage1 fill:#ffebee
    style Stage2 fill:#e8f5e9
    style FinalImage fill:#c8e6c9

Docker Multi-Stage Build

The Docker architecture uses a multi-stage build pattern to minimize final image size while compiling Rust-based tools from source. This approach separates build-time dependencies from runtime dependencies.

Stage Architecture

Diagram: Multi-Stage Build Process

The builder stage (approximately 2GB) is discarded after compilation, and only the compiled binaries (approximately 50MB) are copied to the final image, resulting in a significantly smaller runtime image.

Container Filesystem Layout

/usr/local/bin/
├── mdbook                    # Rust binary from builder stage
├── mdbook-mermaid            # Rust binary from builder stage
├── deepwiki-scraper.py       # Python script (executable)
├── process-template.py       # Python script (executable)
└── build-docs.sh             # Shell script (executable)

/workspace/
├── templates/
│   ├── header.html           # Default header template
│   └── footer.html           # Default footer template
├── wiki/                     # Created at runtime
├── raw_markdown/             # Created at runtime
└── book/                     # Created at runtime

/output/                      # Volume mount point
└── (user-provided volume)

For detailed Docker implementation information, see Docker Multi-Stage Build.

Sources: Dockerfile:1-34

Configuration Architecture

The system is configured entirely through environment variables and volume mounts , following the Twelve-Factor App methodology. No configuration files are required; all settings have sensible defaults.

Configuration Layers

  1. Auto-Detection scripts/build-docs.sh:8-19

    • Extracts REPO from git remote get-url origin
    • Supports GitHub URLs in multiple formats (HTTPS, SSH)
  2. Environment Variables scripts/build-docs.sh:21-26

    • User-provided overrides
    • Takes precedence over auto-detection
  3. Computed Defaults scripts/build-docs.sh:40-51

    • Derives BOOK_AUTHORS from REPO owner
    • Constructs GIT_REPO_URL from REPO
    • Generates badge URLs

Diagram: Configuration Resolution Order

Key Configuration Variables

VariableDefaultSourceDescription
REPO(auto-detected)scripts/build-docs.sh:9-19GitHub repository (owner/repo)
BOOK_TITLE“Documentation”scripts/build-docs.sh23Title in book.toml
BOOK_AUTHORS(derived from REPO)scripts/build-docs.sh45Author metadata
GIT_REPO_URL(derived from REPO)scripts/build-docs.sh46Link in generated docs
MARKDOWN_ONLY“false”scripts/build-docs.sh26Skip HTML build
GENERATION_DATE(current UTC time)scripts/build-docs.sh200Timestamp in templates

For complete configuration documentation, see Configuration Reference.

Sources: scripts/build-docs.sh:8-59 README.md:31-51

Output Artifacts

The system produces four distinct output artifacts, each serving a specific purpose in the documentation workflow:

ArtifactLocationPurposeGenerated By
book//output/book/Searchable HTML documentationmdbook build
markdown//output/markdown/Enhanced markdown sourcedeepwiki-scraper.py + templates
raw_markdown//output/raw_markdown/Pre-enhancement markdown (debug)deepwiki-scraper.py (raw output)
book.toml/output/book.tomlmdBook configurationbuild-docs.sh generation

The multi-artifact design allows users to inspect intermediate stages, debug transformation issues, or use the markdown files for alternative processing workflows.

For detailed output structure documentation, see Output Structure.

Sources: scripts/build-docs.sh:273-309 README.md:53-58

Extensibility Points

The architecture provides three primary extension mechanisms:

1. Custom Templates

Users can override default header/footer templates by mounting a custom directory:

The process-template.py script performs variable substitution on any mounted templates, supporting custom branding and layout.

Sources: scripts/build-docs.sh:195-234 Dockerfile26

2. Environment Variable Configuration

All behavior can be modified through environment variables without rebuilding the Docker image. This includes metadata, URLs, and operational modes.

Sources: scripts/build-docs.sh:8-59

3. Markdown-Only Mode

Setting MARKDOWN_ONLY=true allows users to skip the HTML build entirely, enabling alternative processing pipelines or custom mdBook configurations.

Sources: scripts/build-docs.sh:67-93

For advanced customization patterns, see Advanced Topics.

Summary

The DeepWiki-to-mdBook converter implements a pipeline architecture with three distinct phases (extraction, enhancement, build), orchestrated by a shell script within a multi-stage Docker container. The system is stateless, configuration-driven, and produces multiple output artifacts for different use cases. All components communicate through the filesystem, with no runtime dependencies beyond the container image.

Key architectural principles:

  • Sequential processing : Each phase depends on the previous phase’s output
  • Stateless execution : No persistent state between runs
  • Configuration through environment : No config files required
  • Multi-stage build : Minimized runtime image size
  • Multiple outputs : Debugging and alternative workflows supported

Sources: Dockerfile:1-34 scripts/build-docs.sh:1-310 README.md:1-95

Dismiss

Refresh this wiki

Enter email to refresh