This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
System Architecture
Loading…
System Architecture
Relevant source files
This page provides a comprehensive overview of the DeepWiki-to-mdBook converter’s architecture, including its component organization, execution model, and data flow patterns. The system is designed as a containerized pipeline that transforms DeepWiki content into searchable mdBook documentation through three distinct processing phases.
For detailed information about the three-phase transformation pipeline, see Three-Phase Pipeline. For Docker-specific implementation details, see Docker Multi-Stage Build.
Architectural Overview
The system follows a pipeline architecture with three sequential phases, orchestrated by a shell script and executed within a Docker container. All components are stateless and communicate through the filesystem, with no external dependencies required at runtime.
graph TB
subgraph Docker["Docker Container (python:3.12-slim)"]
subgraph Executables["/usr/local/bin/"]
BuildScript["build-docs.sh"]
Scraper["deepwiki-scraper.py"]
TemplateProc["process-template.py"]
mdBook["mdbook"]
mdBookMermaid["mdbook-mermaid"]
end
subgraph Workspace["/workspace/"]
Templates["templates/\nheader.html\nfooter.html"]
WorkingDirs["wiki/\nraw_markdown/\nbook/"]
end
subgraph Output["/output/ (Volume Mount)"]
BookHTML["book/"]
MarkdownSrc["markdown/"]
RawMarkdown["raw_markdown/"]
BookConfig["book.toml"]
end
end
subgraph External["External Dependencies"]
DeepWiki["deepwiki.com"]
GitRemote["git remote"]
end
BuildScript -->|executes| Scraper
BuildScript -->|executes| TemplateProc
BuildScript -->|executes| mdBook
BuildScript -->|executes| mdBookMermaid
Scraper -->|writes| WorkingDirs
TemplateProc -->|reads| Templates
TemplateProc -->|outputs HTML| BuildScript
mdBook -->|reads| WorkingDirs
mdBook -->|writes| BookHTML
BuildScript -->|copies to| Output
Scraper -->|HTTP GET| DeepWiki
BuildScript -->|auto-detect| GitRemote
style Executables fill:#f0f0f0
style Workspace fill:#f0f0f0
style Output fill:#e8f5e9
System Composition
Diagram: Container Internal Structure and Component Relationships
The container is structured into three main areas: executables in /usr/local/bin/, working files in /workspace/, and outputs in /output/. The build-docs.sh orchestrator coordinates all components, with persistent results written to the mounted volume.
Sources: Dockerfile:1-34 scripts/build-docs.sh:1-310
Core Components
The system consists of five primary components, each with a specific responsibility in the documentation generation pipeline.
| Component | Type | Location | Primary Responsibility |
|---|---|---|---|
build-docs.sh | Shell Script | /usr/local/bin/ | Pipeline orchestration and configuration management |
deepwiki-scraper.py | Python Script | /usr/local/bin/ | Wiki content extraction and markdown conversion |
process-template.py | Python Script | /usr/local/bin/ | Template variable substitution |
mdbook | Rust Binary | /usr/local/bin/ | HTML documentation generation |
mdbook-mermaid | Rust Binary | /usr/local/bin/ | Mermaid diagram rendering |
Component Interaction Map
Diagram: Component Interaction and Data Flow
This diagram shows how build-docs.sh coordinates the three processing components sequentially, with data flowing through working directories before final output to the mounted volume.
Sources: scripts/build-docs.sh:1-310 Dockerfile:20-33
Execution Flow
The system follows a strictly sequential execution model, with each step depending on the output of the previous step. This design simplifies error handling and allows for debugging at intermediate stages.
Build Script Orchestration
The build-docs.sh script orchestrates the entire pipeline through seven distinct steps:
-
Configuration & Validation scripts/build-docs.sh:8-59
- Auto-detects
REPOfrom git remote if not provided - Sets defaults for
BOOK_TITLE,BOOK_AUTHORS,GIT_REPO_URL - Validates required configuration
- Computes derived URLs (
DEEPWIKI_URL, badge URLs)
- Auto-detects
-
Wiki Scraping scripts/build-docs.sh:61-65
- Executes
deepwiki-scraper.pywith repository identifier - Writes to
/workspace/wiki/and/workspace/raw_markdown/
- Executes
-
Early Exit (Markdown-Only Mode) scripts/build-docs.sh:67-93
- Optional: skip HTML build if
MARKDOWN_ONLY=true - Copies markdown directly to output volume
- Optional: skip HTML build if
-
mdBook Initialization scripts/build-docs.sh:95-122
- Creates
/workspace/book/structure - Generates
book.tomlconfiguration - Initializes
src/directory
- Creates
-
SUMMARY.md Generation scripts/build-docs.sh:124-188
- Scans wiki directory for
.mdfiles - Sorts numerically by page number prefix
- Builds hierarchical table of contents
- Handles subsections in
section-N/directories
- Scans wiki directory for
-
Template Processing & Injection scripts/build-docs.sh:190-261
- Processes
header.htmlandfooter.htmlwith variable substitution - Injects processed HTML into every markdown file
- Copies enhanced markdown to
book/src/
- Processes
-
mdBook Build scripts/build-docs.sh:263-271
- Installs mermaid assets via
mdbook-mermaid install - Executes
mdbook build - Generates searchable HTML in
book/
- Installs mermaid assets via
-
Output Copying scripts/build-docs.sh:273-309
- Copies all artifacts to
/output/volume mount - Preserves intermediate outputs for debugging
- Copies all artifacts to
Diagram: build-docs.sh Sequential Execution Flow
Sources: scripts/build-docs.sh:1-310
Three-Phase Pipeline Architecture
The core transformation happens in three distinct phases, each with specific inputs, processing logic, and outputs. This separation allows for independent testing and debugging of each phase.
Phase Overview
| Phase | Primary Component | Input | Output | Key Operations |
|---|---|---|---|---|
| Phase 1: Extraction | deepwiki-scraper.py | DeepWiki HTML | Markdown files | Structure discovery, HTML→Markdown conversion, raw diagram extraction |
| Phase 2: Enhancement | deepwiki-scraper.py | Raw markdown + diagrams | Enhanced markdown | Diagram normalization, fuzzy matching, template injection |
| Phase 3: Build | mdbook + mdbook-mermaid | Enhanced markdown | Searchable HTML | SUMMARY generation, mermaid rendering, search index |
Diagram: Three-Phase Pipeline with Key Functions
For detailed documentation of each phase, see Three-Phase Pipeline.
Sources: scripts/build-docs.sh:61-271 README.md:72-77
graph TB
subgraph Stage1["Stage 1: Builder (rust:latest)"]
RustToolchain["Rust Toolchain\ncargo, rustc"]
CargoInstall["cargo install mdbook\ncargo install mdbook-mermaid"]
Binaries["/usr/local/cargo/bin/\nmdbook\nmdbook-mermaid"]
RustToolchain --> CargoInstall
CargoInstall --> Binaries
end
subgraph Stage2["Stage 2: Runtime (python:3.12-slim)"]
PythonBase["Python 3.12 Runtime"]
PipInstall["pip install requirements"]
CopyBinaries["COPY --from=builder\nmdbook binaries"]
CopyScripts["COPY Python scripts\nCOPY Shell scripts\nCOPY Templates"]
FinalImage["Final Image\n~500MB"]
PythonBase --> PipInstall
PipInstall --> CopyBinaries
CopyBinaries --> CopyScripts
CopyScripts --> FinalImage
end
Binaries -.->|copy only| CopyBinaries
style Stage1 fill:#ffebee
style Stage2 fill:#e8f5e9
style FinalImage fill:#c8e6c9
Docker Multi-Stage Build
The Docker architecture uses a multi-stage build pattern to minimize final image size while compiling Rust-based tools from source. This approach separates build-time dependencies from runtime dependencies.
Stage Architecture
Diagram: Multi-Stage Build Process
The builder stage (approximately 2GB) is discarded after compilation, and only the compiled binaries (approximately 50MB) are copied to the final image, resulting in a significantly smaller runtime image.
Container Filesystem Layout
/usr/local/bin/
├── mdbook # Rust binary from builder stage
├── mdbook-mermaid # Rust binary from builder stage
├── deepwiki-scraper.py # Python script (executable)
├── process-template.py # Python script (executable)
└── build-docs.sh # Shell script (executable)
/workspace/
├── templates/
│ ├── header.html # Default header template
│ └── footer.html # Default footer template
├── wiki/ # Created at runtime
├── raw_markdown/ # Created at runtime
└── book/ # Created at runtime
/output/ # Volume mount point
└── (user-provided volume)
For detailed Docker implementation information, see Docker Multi-Stage Build.
Sources: Dockerfile:1-34
Configuration Architecture
The system is configured entirely through environment variables and volume mounts , following the Twelve-Factor App methodology. No configuration files are required; all settings have sensible defaults.
Configuration Layers
-
Auto-Detection scripts/build-docs.sh:8-19
- Extracts
REPOfromgit remote get-url origin - Supports GitHub URLs in multiple formats (HTTPS, SSH)
- Extracts
-
Environment Variables scripts/build-docs.sh:21-26
- User-provided overrides
- Takes precedence over auto-detection
-
Computed Defaults scripts/build-docs.sh:40-51
- Derives
BOOK_AUTHORSfromREPOowner - Constructs
GIT_REPO_URLfromREPO - Generates badge URLs
- Derives
Diagram: Configuration Resolution Order
Key Configuration Variables
| Variable | Default | Source | Description |
|---|---|---|---|
REPO | (auto-detected) | scripts/build-docs.sh:9-19 | GitHub repository (owner/repo) |
BOOK_TITLE | “Documentation” | scripts/build-docs.sh23 | Title in book.toml |
BOOK_AUTHORS | (derived from REPO) | scripts/build-docs.sh45 | Author metadata |
GIT_REPO_URL | (derived from REPO) | scripts/build-docs.sh46 | Link in generated docs |
MARKDOWN_ONLY | “false” | scripts/build-docs.sh26 | Skip HTML build |
GENERATION_DATE | (current UTC time) | scripts/build-docs.sh200 | Timestamp in templates |
For complete configuration documentation, see Configuration Reference.
Sources: scripts/build-docs.sh:8-59 README.md:31-51
Output Artifacts
The system produces four distinct output artifacts, each serving a specific purpose in the documentation workflow:
| Artifact | Location | Purpose | Generated By |
|---|---|---|---|
book/ | /output/book/ | Searchable HTML documentation | mdbook build |
markdown/ | /output/markdown/ | Enhanced markdown source | deepwiki-scraper.py + templates |
raw_markdown/ | /output/raw_markdown/ | Pre-enhancement markdown (debug) | deepwiki-scraper.py (raw output) |
book.toml | /output/book.toml | mdBook configuration | build-docs.sh generation |
The multi-artifact design allows users to inspect intermediate stages, debug transformation issues, or use the markdown files for alternative processing workflows.
For detailed output structure documentation, see Output Structure.
Sources: scripts/build-docs.sh:273-309 README.md:53-58
Extensibility Points
The architecture provides three primary extension mechanisms:
1. Custom Templates
Users can override default header/footer templates by mounting a custom directory:
The process-template.py script performs variable substitution on any mounted templates, supporting custom branding and layout.
Sources: scripts/build-docs.sh:195-234 Dockerfile26
2. Environment Variable Configuration
All behavior can be modified through environment variables without rebuilding the Docker image. This includes metadata, URLs, and operational modes.
Sources: scripts/build-docs.sh:8-59
3. Markdown-Only Mode
Setting MARKDOWN_ONLY=true allows users to skip the HTML build entirely, enabling alternative processing pipelines or custom mdBook configurations.
Sources: scripts/build-docs.sh:67-93
For advanced customization patterns, see Advanced Topics.
Summary
The DeepWiki-to-mdBook converter implements a pipeline architecture with three distinct phases (extraction, enhancement, build), orchestrated by a shell script within a multi-stage Docker container. The system is stateless, configuration-driven, and produces multiple output artifacts for different use cases. All components communicate through the filesystem, with no runtime dependencies beyond the container image.
Key architectural principles:
- Sequential processing : Each phase depends on the previous phase’s output
- Stateless execution : No persistent state between runs
- Configuration through environment : No config files required
- Multi-stage build : Minimized runtime image size
- Multiple outputs : Debugging and alternative workflows supported
Sources: Dockerfile:1-34 scripts/build-docs.sh:1-310 README.md:1-95
Dismiss
Refresh this wiki
Enter email to refresh