This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Component Reference
Loading…
Component Reference
Relevant source files
Purpose and Scope
This page provides a high-level overview of the major components in the DeepWiki-to-mdBook converter system and their responsibilities. Each component is introduced with its primary function, key files, and relationships to other components.
For detailed information about specific components:
- Shell orchestration logic: see build-docs.sh Orchestrator
- Content extraction and diagram processing: see deepwiki-scraper.py
- Header and footer customization: see Template System
- Final HTML generation: see mdBook Integration
System Component Map
The following diagram shows all major components and their organizational relationships:
Sources: scripts/build-docs.sh:1-310 README.md:84-88
graph TB
subgraph "Entry Points"
Dockerfile["Dockerfile\n(multi-stage build)"]
ActionYAML["action.yml\n(GitHub Action)"]
end
subgraph "Orchestration Layer"
BuildScript["build-docs.sh\n(main orchestrator)"]
end
subgraph "Python Components"
Scraper["deepwiki-scraper.py\n(content extraction)"]
TemplateProc["process-template.py\n(template rendering)"]
end
subgraph "Build Tools"
mdBook["mdbook\n(HTML generator)"]
mdBookMermaid["mdbook-mermaid\n(diagram renderer)"]
end
subgraph "Configuration Assets"
HeaderTemplate["templates/header.html"]
FooterTemplate["templates/footer.html"]
BookToml["book.toml\n(generated)"]
SummaryMd["SUMMARY.md\n(generated)"]
end
subgraph "Data Directories"
WikiDir["/workspace/wiki\n(enhanced markdown)"]
RawDir["/workspace/raw_markdown\n(pre-enhancement)"]
BookSrc["/workspace/book/src\n(mdBook input)"]
OutputDir["/output\n(final artifacts)"]
end
Dockerfile -->|builds| BuildScript
Dockerfile -->|installs| Scraper
Dockerfile -->|installs| TemplateProc
Dockerfile -->|compiles| mdBook
Dockerfile -->|compiles| mdBookMermaid
ActionYAML -->|invokes| Dockerfile
BuildScript -->|executes| Scraper
BuildScript -->|executes| TemplateProc
BuildScript -->|executes| mdBook
BuildScript -->|executes| mdBookMermaid
BuildScript -->|generates| BookToml
BuildScript -->|generates| SummaryMd
Scraper -->|writes to| WikiDir
Scraper -->|writes to| RawDir
TemplateProc -->|reads| HeaderTemplate
TemplateProc -->|reads| FooterTemplate
TemplateProc -->|outputs HTML| BuildScript
BuildScript -->|copies| WikiDir
WikiDir -->|to| BookSrc
BuildScript -->|injects templates into| BookSrc
mdBook -->|reads| BookToml
mdBook -->|reads| SummaryMd
mdBook -->|reads| BookSrc
mdBook -->|builds to| OutputDir
mdBookMermaid -->|preprocesses| BookSrc
Core Components
build-docs.sh
Type: Shell script orchestrator
Location: scripts/build-docs.sh:1-310
Entry Point: Docker container CMD instruction
The main orchestration script that coordinates the entire build process. It performs seven sequential steps:
| Step | Line Range | Description |
|---|---|---|
| Configuration | scripts/build-docs.sh:8-60 | Auto-detect repository, set environment defaults |
| Scraping | scripts/build-docs.sh:61-65 | Invoke deepwiki-scraper.py to fetch content |
| Optional Exit | scripts/build-docs.sh:67-93 | If MARKDOWN_ONLY=true, copy outputs and exit |
| mdBook Init | scripts/build-docs.sh:95-122 | Create book.toml and directory structure |
| SUMMARY Generation | scripts/build-docs.sh:124-188 | Discover files and build table of contents |
| Template Processing | scripts/build-docs.sh:190-261 | Process header/footer and inject into markdown |
| Build & Copy | scripts/build-docs.sh:263-309 | Run mdBook build and copy artifacts to /output |
Key Responsibilities:
- Environment variable validation and default assignment
- Git repository auto-detection from remote URLs
- Orchestrating execution order of Python scripts
- Dynamic SUMMARY.md generation with numeric sorting
- Template injection into all markdown files
- Output directory management
Sources: scripts/build-docs.sh:1-310
deepwiki-scraper.py
Type: Python script
Location: python/deepwiki-scraper.py
Invocation: scripts/build-docs.sh65
The content extraction component that scrapes DeepWiki wiki pages and converts them to markdown with embedded diagrams.
Key Responsibilities:
- Fetch wiki HTML from
https://deepwiki.com/{REPO} - Parse Next.js data payload to discover wiki structure
- Convert HTML to markdown using
html2textlibrary - Extract Mermaid diagrams from JavaScript payload
- Normalize diagrams for Mermaid 11 compatibility (7-step pipeline)
- Match diagrams to pages using fuzzy text matching
- Write enhanced markdown to
/workspace/wiki - Write pre-enhancement snapshot to
/workspace/raw_markdown
The scraper is covered in detail in deepwiki-scraper.py.
Sources: scripts/build-docs.sh65 README.md74
process-template.py
Type: Python script
Location: python/process-template.py
Invocation: scripts/build-docs.sh:205-213 scripts/build-docs.sh:222-230
A template rendering utility that processes HTML template files with variable substitution.
Key Responsibilities:
- Read template file from path argument
- Parse variable assignments from command-line arguments (format:
KEY=value) - Substitute
{{VARIABLE}}placeholders with values - Handle conditional rendering with
{{#if VARIABLE}}...{{/if}}blocks - Output processed HTML to stdout
Template Variables Supported:
REPO- Repository identifier (e.g., “owner/repo”)BOOK_TITLE- Documentation titleBOOK_AUTHORS- Author namesGIT_REPO_URL- Full GitHub repository URLDEEPWIKI_URL- DeepWiki page URLDEEPWIKI_BADGE_URL- Badge image URLGITHUB_BADGE_URL- GitHub badge URLGENERATION_DATE- Build timestamp
See Template System for comprehensive documentation.
Sources: scripts/build-docs.sh:195-234 README.md51
Template Files
Type: HTML configuration files
Location: templates/header.html, templates/footer.html
Default Path: /workspace/templates/
Custom Mount: -v "$(pwd)/my-templates:/workspace/templates"
Static HTML template files that are processed by process-template.py and injected into every markdown file.
File Responsibilities:
| File | Purpose | Injection Point |
|---|---|---|
header.html | Top-of-page content (badges, navigation) | Before markdown content |
footer.html | Bottom-of-page content (metadata, links) | After markdown content |
Injection Logic:
[Header HTML]
<blank line>
[Original Markdown Content]
<blank line>
[Footer HTML]
Templates are injected at scripts/build-docs.sh:240-261 after all markdown files are copied to the book source directory.
Sources: scripts/build-docs.sh:195-234 scripts/build-docs.sh:240-261 README.md:39-51
mdBook and mdbook-mermaid
Type: External build tools (Rust binaries)
Location: /usr/local/bin/mdbook, /usr/local/bin/mdbook-mermaid
Compilation: Dockerfile multi-stage build
Pre-compiled tools that generate the final HTML output.
mdBook Responsibilities:
- Read configuration from
book.tomlscripts/build-docs.sh:102-119 - Parse
SUMMARY.mdto build navigation structure - Convert markdown files to HTML with search index
- Apply theme (rust theme by default)
- Generate table of contents sidebar
- Create chapter navigation links
mdbook-mermaid Responsibilities:
- Act as mdBook preprocessor scripts/build-docs.sh:113-114
- Detect mermaid code blocks in markdown
- Install JavaScript rendering libraries scripts/build-docs.sh266
- Configure client-side diagram rendering
See mdBook Integration for detailed integration documentation.
Sources: scripts/build-docs.sh:113-114 scripts/build-docs.sh266 scripts/build-docs.sh271
sequenceDiagram
participant Docker
participant BuildScript as "build-docs.sh"
participant Scraper as "deepwiki-scraper.py"
participant TemplateProc as "process-template.py"
participant mdBook as "mdbook"
participant FileSystem as "/output"
Docker->>BuildScript: Execute CMD
BuildScript->>BuildScript: Validate REPO env var
BuildScript->>BuildScript: Auto-detect from git remote
BuildScript->>BuildScript: Set defaults (BOOK_AUTHORS, etc)
BuildScript->>Scraper: Execute with REPO arg
Scraper->>Scraper: Fetch DeepWiki HTML
Scraper->>Scraper: Extract wiki structure
Scraper->>Scraper: Convert HTML to markdown
Scraper->>Scraper: Process Mermaid diagrams
Scraper->>FileSystem: Write /workspace/wiki/*.md
Scraper->>FileSystem: Write /workspace/raw_markdown/*.md
Scraper-->>BuildScript: Exit 0
alt MARKDOWN_ONLY=true
BuildScript->>FileSystem: Copy markdown to /output
BuildScript->>Docker: Exit 0
end
BuildScript->>BuildScript: Create /workspace/book/
BuildScript->>BuildScript: Generate book.toml
BuildScript->>BuildScript: Scan wiki files
BuildScript->>BuildScript: Generate SUMMARY.md
BuildScript->>TemplateProc: Process header.html
TemplateProc-->>BuildScript: Return HTML string
BuildScript->>TemplateProc: Process footer.html
TemplateProc-->>BuildScript: Return HTML string
BuildScript->>BuildScript: Copy wiki/*.md to book/src/
BuildScript->>BuildScript: Inject header into each .md
BuildScript->>BuildScript: Inject footer into each .md
BuildScript->>mdBook: mdbook-mermaid install
BuildScript->>mdBook: mdbook build
mdBook->>mdBook: Parse SUMMARY.md
mdBook->>mdBook: Convert markdown to HTML
mdBook->>mdBook: Build search index
mdBook->>FileSystem: Write /workspace/book/book/
mdBook-->>BuildScript: Exit 0
BuildScript->>FileSystem: Copy book/ to /output/book/
BuildScript->>FileSystem: Copy wiki/ to /output/markdown/
BuildScript->>FileSystem: Copy raw_markdown/ to /output/raw_markdown/
BuildScript->>FileSystem: Copy book.toml to /output/
BuildScript-->>Docker: Exit 0
Component Execution Flow
This diagram shows the runtime execution sequence and data flow between components:
Sources: scripts/build-docs.sh:1-310
File System Organization
The following table maps logical component names to their physical locations in the Docker container and output directory:
| Component | Container Path | Output Path | Description |
|---|---|---|---|
| Main orchestrator | /usr/local/bin/build-docs.sh | - | Shell script entry point |
| Scraper | /usr/local/bin/deepwiki-scraper.py | - | Python extraction script |
| Template processor | /usr/local/bin/process-template.py | - | Python template engine |
| mdBook binary | /usr/local/bin/mdbook | - | Rust-compiled tool |
| mdbook-mermaid binary | /usr/local/bin/mdbook-mermaid | - | Rust-compiled preprocessor |
| Default templates | /workspace/templates/*.html | - | Header/footer HTML files |
| Working wiki dir | /workspace/wiki/ | /output/markdown/ | Enhanced markdown files |
| Raw markdown dir | /workspace/raw_markdown/ | /output/raw_markdown/ | Pre-enhancement snapshot |
| Book workspace | /workspace/book/ | - | Temporary build directory |
| Book source files | /workspace/book/src/ | - | mdBook input directory |
| Generated config | /workspace/book/book.toml | /output/book.toml | mdBook configuration |
| Generated TOC | /workspace/book/src/SUMMARY.md | - | Navigation structure |
| Built HTML | /workspace/book/book/ | /output/book/ | Final documentation site |
Sources: scripts/build-docs.sh:27-31 scripts/build-docs.sh:274-294
graph TD
subgraph "Docker Build Stage 1"
RustBase["rust:latest base image"]
CargoInstall["cargo install"]
RustBase --> CargoInstall
CargoInstall --> mdBookBin["mdbook binary"]
CargoInstall --> mdBookMermaidBin["mdbook-mermaid binary"]
end
subgraph "Docker Build Stage 2"
PythonBase["python:3.12-slim base image"]
PipInstall["pip install"]
PythonBase --> PipInstall
PipInstall --> RequestsLib["requests library"]
PipInstall --> Html2TextLib["html2text library"]
PipInstall --> RapidFuzzLib["rapidfuzz library"]
end
subgraph "Runtime Dependencies"
BuildScript["build-docs.sh"]
ScraperPy["deepwiki-scraper.py"]
TemplatePy["process-template.py"]
BuildScript --> ScraperPy
BuildScript --> TemplatePy
BuildScript --> mdBookBin
BuildScript --> mdBookMermaidBin
ScraperPy --> RequestsLib
ScraperPy --> Html2TextLib
ScraperPy --> RapidFuzzLib
end
subgraph "Environment Inputs"
EnvREPO["REPO env var"]
EnvBOOK_TITLE["BOOK_TITLE env var"]
EnvMARKDOWN_ONLY["MARKDOWN_ONLY env var"]
GitRemote["git remote origin"]
GitRemote -.fallback.-> EnvREPO
EnvREPO --> BuildScript
EnvBOOK_TITLE --> BuildScript
EnvMARKDOWN_ONLY --> BuildScript
end
mdBookBin -.copied from.-> RustBase
mdBookMermaidBin -.copied from.-> RustBase
Component Dependencies
This diagram maps the dependency relationships between components, showing which components require which other components:
Sources: scripts/build-docs.sh:8-19 README.md:14-27
Component Communication Patterns
Inter-Process Communication
All component communication uses standard Unix patterns:
| Pattern | Components | Mechanism |
|---|---|---|
| Parent-child execution | build-docs.sh → Python scripts | python3 /usr/local/bin/script.py args |
| Parent-child execution | build-docs.sh → mdBook tools | mdbook build, mdbook-mermaid install |
| Output capture | build-docs.sh ← process-template.py | Command substitution: VAR=$(python3 ...) |
| Exit status | All → build-docs.sh | Standard exit codes (0 = success) |
| Error propagation | All | set -e in bash (exit on any error) |
Sources: scripts/build-docs.sh2 scripts/build-docs.sh65 scripts/build-docs.sh:205-213
File System Communication
Components communicate via shared file system locations:
Sources: scripts/build-docs.sh:27-31 scripts/build-docs.sh237 scripts/build-docs.sh:274-294
Configuration Communication
Environment variables flow unidirectionally from the container entry point to all components:
| Variable | Set By | Read By | Usage |
|---|---|---|---|
REPO | Docker -e flag | build-docs.sh | DeepWiki URL construction |
BOOK_TITLE | Docker -e flag | build-docs.sh | book.toml generation |
BOOK_AUTHORS | Docker -e flag | build-docs.sh | book.toml generation |
MARKDOWN_ONLY | Docker -e flag | build-docs.sh | Build mode selection |
GENERATION_DATE | build-docs.sh | process-template.py | Template variable |
GIT_REPO_URL | build-docs.sh (derived) | process-template.py | Template variable |
DEEPWIKI_URL | build-docs.sh (derived) | process-template.py | Template variable |
Sources: scripts/build-docs.sh:8-60 scripts/build-docs.sh:200-230
Component Responsibilities Matrix
The following table summarizes what each component is and is not responsible for:
| Component | Responsible For | Not Responsible For |
|---|---|---|
build-docs.sh | Orchestration, environment validation, SUMMARY generation, template injection, output copying | Content extraction, HTML rendering, diagram normalization |
deepwiki-scraper.py | HTTP requests, HTML parsing, markdown conversion, diagram extraction/normalization/matching | File system orchestration, mdBook integration, template processing |
process-template.py | Variable substitution, conditional rendering in templates | File discovery, output management, HTML generation |
mdbook | Markdown to HTML conversion, search index, navigation, theming | Content extraction, diagram processing, template injection |
mdbook-mermaid | Mermaid library installation, diagram rendering configuration | Diagram extraction, diagram normalization, markdown conversion |
Templates (*.html) | Define header/footer structure and variables | Variable substitution, file injection, content generation |
Sources: scripts/build-docs.sh:1-310 README.md:72-77
Dismiss
Refresh this wiki
Enter email to refresh