Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Component Reference

Loading…

Component Reference

Relevant source files

Purpose and Scope

This page provides a high-level overview of the major components in the DeepWiki-to-mdBook converter system and their responsibilities. Each component is introduced with its primary function, key files, and relationships to other components.

For detailed information about specific components:

System Component Map

The following diagram shows all major components and their organizational relationships:

Sources: scripts/build-docs.sh:1-310 README.md:84-88

graph TB
    subgraph "Entry Points"
        Dockerfile["Dockerfile\n(multi-stage build)"]
ActionYAML["action.yml\n(GitHub Action)"]
end
    
    subgraph "Orchestration Layer"
        BuildScript["build-docs.sh\n(main orchestrator)"]
end
    
    subgraph "Python Components"
        Scraper["deepwiki-scraper.py\n(content extraction)"]
TemplateProc["process-template.py\n(template rendering)"]
end
    
    subgraph "Build Tools"
        mdBook["mdbook\n(HTML generator)"]
mdBookMermaid["mdbook-mermaid\n(diagram renderer)"]
end
    
    subgraph "Configuration Assets"
        HeaderTemplate["templates/header.html"]
FooterTemplate["templates/footer.html"]
BookToml["book.toml\n(generated)"]
SummaryMd["SUMMARY.md\n(generated)"]
end
    
    subgraph "Data Directories"
        WikiDir["/workspace/wiki\n(enhanced markdown)"]
RawDir["/workspace/raw_markdown\n(pre-enhancement)"]
BookSrc["/workspace/book/src\n(mdBook input)"]
OutputDir["/output\n(final artifacts)"]
end
    
 
   Dockerfile -->|builds| BuildScript
 
   Dockerfile -->|installs| Scraper
 
   Dockerfile -->|installs| TemplateProc
 
   Dockerfile -->|compiles| mdBook
 
   Dockerfile -->|compiles| mdBookMermaid
 
   ActionYAML -->|invokes| Dockerfile
    
 
   BuildScript -->|executes| Scraper
 
   BuildScript -->|executes| TemplateProc
 
   BuildScript -->|executes| mdBook
 
   BuildScript -->|executes| mdBookMermaid
 
   BuildScript -->|generates| BookToml
 
   BuildScript -->|generates| SummaryMd
    
 
   Scraper -->|writes to| WikiDir
 
   Scraper -->|writes to| RawDir
 
   TemplateProc -->|reads| HeaderTemplate
 
   TemplateProc -->|reads| FooterTemplate
 
   TemplateProc -->|outputs HTML| BuildScript
    
 
   BuildScript -->|copies| WikiDir
 
   WikiDir -->|to| BookSrc
 
   BuildScript -->|injects templates into| BookSrc
    
 
   mdBook -->|reads| BookToml
 
   mdBook -->|reads| SummaryMd
 
   mdBook -->|reads| BookSrc
 
   mdBook -->|builds to| OutputDir
 
   mdBookMermaid -->|preprocesses| BookSrc

Core Components

build-docs.sh

Type: Shell script orchestrator
Location: scripts/build-docs.sh:1-310
Entry Point: Docker container CMD instruction

The main orchestration script that coordinates the entire build process. It performs seven sequential steps:

StepLine RangeDescription
Configurationscripts/build-docs.sh:8-60Auto-detect repository, set environment defaults
Scrapingscripts/build-docs.sh:61-65Invoke deepwiki-scraper.py to fetch content
Optional Exitscripts/build-docs.sh:67-93If MARKDOWN_ONLY=true, copy outputs and exit
mdBook Initscripts/build-docs.sh:95-122Create book.toml and directory structure
SUMMARY Generationscripts/build-docs.sh:124-188Discover files and build table of contents
Template Processingscripts/build-docs.sh:190-261Process header/footer and inject into markdown
Build & Copyscripts/build-docs.sh:263-309Run mdBook build and copy artifacts to /output

Key Responsibilities:

  • Environment variable validation and default assignment
  • Git repository auto-detection from remote URLs
  • Orchestrating execution order of Python scripts
  • Dynamic SUMMARY.md generation with numeric sorting
  • Template injection into all markdown files
  • Output directory management

Sources: scripts/build-docs.sh:1-310

deepwiki-scraper.py

Type: Python script
Location: python/deepwiki-scraper.py
Invocation: scripts/build-docs.sh65

The content extraction component that scrapes DeepWiki wiki pages and converts them to markdown with embedded diagrams.

Key Responsibilities:

  • Fetch wiki HTML from https://deepwiki.com/{REPO}
  • Parse Next.js data payload to discover wiki structure
  • Convert HTML to markdown using html2text library
  • Extract Mermaid diagrams from JavaScript payload
  • Normalize diagrams for Mermaid 11 compatibility (7-step pipeline)
  • Match diagrams to pages using fuzzy text matching
  • Write enhanced markdown to /workspace/wiki
  • Write pre-enhancement snapshot to /workspace/raw_markdown

The scraper is covered in detail in deepwiki-scraper.py.

Sources: scripts/build-docs.sh65 README.md74

process-template.py

Type: Python script
Location: python/process-template.py
Invocation: scripts/build-docs.sh:205-213 scripts/build-docs.sh:222-230

A template rendering utility that processes HTML template files with variable substitution.

Key Responsibilities:

  • Read template file from path argument
  • Parse variable assignments from command-line arguments (format: KEY=value)
  • Substitute {{VARIABLE}} placeholders with values
  • Handle conditional rendering with {{#if VARIABLE}}...{{/if}} blocks
  • Output processed HTML to stdout

Template Variables Supported:

  • REPO - Repository identifier (e.g., “owner/repo”)
  • BOOK_TITLE - Documentation title
  • BOOK_AUTHORS - Author names
  • GIT_REPO_URL - Full GitHub repository URL
  • DEEPWIKI_URL - DeepWiki page URL
  • DEEPWIKI_BADGE_URL - Badge image URL
  • GITHUB_BADGE_URL - GitHub badge URL
  • GENERATION_DATE - Build timestamp

See Template System for comprehensive documentation.

Sources: scripts/build-docs.sh:195-234 README.md51

Template Files

Type: HTML configuration files
Location: templates/header.html, templates/footer.html
Default Path: /workspace/templates/
Custom Mount: -v "$(pwd)/my-templates:/workspace/templates"

Static HTML template files that are processed by process-template.py and injected into every markdown file.

File Responsibilities:

FilePurposeInjection Point
header.htmlTop-of-page content (badges, navigation)Before markdown content
footer.htmlBottom-of-page content (metadata, links)After markdown content

Injection Logic:

[Header HTML]
<blank line>
[Original Markdown Content]
<blank line>
[Footer HTML]

Templates are injected at scripts/build-docs.sh:240-261 after all markdown files are copied to the book source directory.

Sources: scripts/build-docs.sh:195-234 scripts/build-docs.sh:240-261 README.md:39-51

mdBook and mdbook-mermaid

Type: External build tools (Rust binaries)
Location: /usr/local/bin/mdbook, /usr/local/bin/mdbook-mermaid
Compilation: Dockerfile multi-stage build

Pre-compiled tools that generate the final HTML output.

mdBook Responsibilities:

  • Read configuration from book.toml scripts/build-docs.sh:102-119
  • Parse SUMMARY.md to build navigation structure
  • Convert markdown files to HTML with search index
  • Apply theme (rust theme by default)
  • Generate table of contents sidebar
  • Create chapter navigation links

mdbook-mermaid Responsibilities:

See mdBook Integration for detailed integration documentation.

Sources: scripts/build-docs.sh:113-114 scripts/build-docs.sh266 scripts/build-docs.sh271

sequenceDiagram
    participant Docker
    participant BuildScript as "build-docs.sh"
    participant Scraper as "deepwiki-scraper.py"
    participant TemplateProc as "process-template.py"
    participant mdBook as "mdbook"
    participant FileSystem as "/output"
    
    Docker->>BuildScript: Execute CMD
    
    BuildScript->>BuildScript: Validate REPO env var
    BuildScript->>BuildScript: Auto-detect from git remote
    BuildScript->>BuildScript: Set defaults (BOOK_AUTHORS, etc)
    
    BuildScript->>Scraper: Execute with REPO arg
    Scraper->>Scraper: Fetch DeepWiki HTML
    Scraper->>Scraper: Extract wiki structure
    Scraper->>Scraper: Convert HTML to markdown
    Scraper->>Scraper: Process Mermaid diagrams
    Scraper->>FileSystem: Write /workspace/wiki/*.md
    Scraper->>FileSystem: Write /workspace/raw_markdown/*.md
    Scraper-->>BuildScript: Exit 0
    
    alt MARKDOWN_ONLY=true
        BuildScript->>FileSystem: Copy markdown to /output
        BuildScript->>Docker: Exit 0
    end
    
    BuildScript->>BuildScript: Create /workspace/book/
    BuildScript->>BuildScript: Generate book.toml
    BuildScript->>BuildScript: Scan wiki files
    BuildScript->>BuildScript: Generate SUMMARY.md
    
    BuildScript->>TemplateProc: Process header.html
    TemplateProc-->>BuildScript: Return HTML string
    BuildScript->>TemplateProc: Process footer.html
    TemplateProc-->>BuildScript: Return HTML string
    
    BuildScript->>BuildScript: Copy wiki/*.md to book/src/
    BuildScript->>BuildScript: Inject header into each .md
    BuildScript->>BuildScript: Inject footer into each .md
    
    BuildScript->>mdBook: mdbook-mermaid install
    BuildScript->>mdBook: mdbook build
    mdBook->>mdBook: Parse SUMMARY.md
    mdBook->>mdBook: Convert markdown to HTML
    mdBook->>mdBook: Build search index
    mdBook->>FileSystem: Write /workspace/book/book/
    mdBook-->>BuildScript: Exit 0
    
    BuildScript->>FileSystem: Copy book/ to /output/book/
    BuildScript->>FileSystem: Copy wiki/ to /output/markdown/
    BuildScript->>FileSystem: Copy raw_markdown/ to /output/raw_markdown/
    BuildScript->>FileSystem: Copy book.toml to /output/
    
    BuildScript-->>Docker: Exit 0

Component Execution Flow

This diagram shows the runtime execution sequence and data flow between components:

Sources: scripts/build-docs.sh:1-310

File System Organization

The following table maps logical component names to their physical locations in the Docker container and output directory:

ComponentContainer PathOutput PathDescription
Main orchestrator/usr/local/bin/build-docs.sh-Shell script entry point
Scraper/usr/local/bin/deepwiki-scraper.py-Python extraction script
Template processor/usr/local/bin/process-template.py-Python template engine
mdBook binary/usr/local/bin/mdbook-Rust-compiled tool
mdbook-mermaid binary/usr/local/bin/mdbook-mermaid-Rust-compiled preprocessor
Default templates/workspace/templates/*.html-Header/footer HTML files
Working wiki dir/workspace/wiki//output/markdown/Enhanced markdown files
Raw markdown dir/workspace/raw_markdown//output/raw_markdown/Pre-enhancement snapshot
Book workspace/workspace/book/-Temporary build directory
Book source files/workspace/book/src/-mdBook input directory
Generated config/workspace/book/book.toml/output/book.tomlmdBook configuration
Generated TOC/workspace/book/src/SUMMARY.md-Navigation structure
Built HTML/workspace/book/book//output/book/Final documentation site

Sources: scripts/build-docs.sh:27-31 scripts/build-docs.sh:274-294

graph TD
    subgraph "Docker Build Stage 1"
        RustBase["rust:latest base image"]
CargoInstall["cargo install"]
RustBase --> CargoInstall
 
       CargoInstall --> mdBookBin["mdbook binary"]
CargoInstall --> mdBookMermaidBin["mdbook-mermaid binary"]
end
    
    subgraph "Docker Build Stage 2"
        PythonBase["python:3.12-slim base image"]
PipInstall["pip install"]
PythonBase --> PipInstall
 
       PipInstall --> RequestsLib["requests library"]
PipInstall --> Html2TextLib["html2text library"]
PipInstall --> RapidFuzzLib["rapidfuzz library"]
end
    
    subgraph "Runtime Dependencies"
        BuildScript["build-docs.sh"]
ScraperPy["deepwiki-scraper.py"]
TemplatePy["process-template.py"]
BuildScript --> ScraperPy
 
       BuildScript --> TemplatePy
 
       BuildScript --> mdBookBin
 
       BuildScript --> mdBookMermaidBin
        
 
       ScraperPy --> RequestsLib
 
       ScraperPy --> Html2TextLib
 
       ScraperPy --> RapidFuzzLib
    end
    
    subgraph "Environment Inputs"
        EnvREPO["REPO env var"]
EnvBOOK_TITLE["BOOK_TITLE env var"]
EnvMARKDOWN_ONLY["MARKDOWN_ONLY env var"]
GitRemote["git remote origin"]
GitRemote -.fallback.-> EnvREPO
 
       EnvREPO --> BuildScript
 
       EnvBOOK_TITLE --> BuildScript
 
       EnvMARKDOWN_ONLY --> BuildScript
    end
    
    mdBookBin -.copied from.-> RustBase
    mdBookMermaidBin -.copied from.-> RustBase

Component Dependencies

This diagram maps the dependency relationships between components, showing which components require which other components:

Sources: scripts/build-docs.sh:8-19 README.md:14-27

Component Communication Patterns

Inter-Process Communication

All component communication uses standard Unix patterns:

PatternComponentsMechanism
Parent-child executionbuild-docs.sh → Python scriptspython3 /usr/local/bin/script.py args
Parent-child executionbuild-docs.sh → mdBook toolsmdbook build, mdbook-mermaid install
Output capturebuild-docs.shprocess-template.pyCommand substitution: VAR=$(python3 ...)
Exit statusAll → build-docs.shStandard exit codes (0 = success)
Error propagationAllset -e in bash (exit on any error)

Sources: scripts/build-docs.sh2 scripts/build-docs.sh65 scripts/build-docs.sh:205-213

File System Communication

Components communicate via shared file system locations:

Sources: scripts/build-docs.sh:27-31 scripts/build-docs.sh237 scripts/build-docs.sh:274-294

Configuration Communication

Environment variables flow unidirectionally from the container entry point to all components:

VariableSet ByRead ByUsage
REPODocker -e flagbuild-docs.shDeepWiki URL construction
BOOK_TITLEDocker -e flagbuild-docs.shbook.toml generation
BOOK_AUTHORSDocker -e flagbuild-docs.shbook.toml generation
MARKDOWN_ONLYDocker -e flagbuild-docs.shBuild mode selection
GENERATION_DATEbuild-docs.shprocess-template.pyTemplate variable
GIT_REPO_URLbuild-docs.sh (derived)process-template.pyTemplate variable
DEEPWIKI_URLbuild-docs.sh (derived)process-template.pyTemplate variable

Sources: scripts/build-docs.sh:8-60 scripts/build-docs.sh:200-230

Component Responsibilities Matrix

The following table summarizes what each component is and is not responsible for:

ComponentResponsible ForNot Responsible For
build-docs.shOrchestration, environment validation, SUMMARY generation, template injection, output copyingContent extraction, HTML rendering, diagram normalization
deepwiki-scraper.pyHTTP requests, HTML parsing, markdown conversion, diagram extraction/normalization/matchingFile system orchestration, mdBook integration, template processing
process-template.pyVariable substitution, conditional rendering in templatesFile discovery, output management, HTML generation
mdbookMarkdown to HTML conversion, search index, navigation, themingContent extraction, diagram processing, template injection
mdbook-mermaidMermaid library installation, diagram rendering configurationDiagram extraction, diagram normalization, markdown conversion
Templates (*.html)Define header/footer structure and variablesVariable substitution, file injection, content generation

Sources: scripts/build-docs.sh:1-310 README.md:72-77

Dismiss

Refresh this wiki

Enter email to refresh