Component Reference
Relevant source files
This page provides an overview of the three major components that comprise the DeepWiki-to-mdBook Converter system and their responsibilities. Each component operates at a different layer of the technology stack (Shell, Python, Rust) and handles a specific phase of the documentation transformation pipeline.
For detailed documentation of each component's internal implementation, see:
- Shell orchestration logic: build-docs.sh Orchestrator
- Python scraping and enhancement algorithms: deepwiki-scraper.py
- Rust documentation building integration: mdBook Integration
System Component Architecture
The system consists of three primary executable components that work together in sequence, coordinated through file system operations and process execution.
Component Architecture Diagram
graph TB
subgraph "Shell Layer"
buildsh["build-docs.sh\nOrchestrator"]
end
subgraph "Python Layer"
scraper["deepwiki-scraper.py\nContent Processor"]
bs4["BeautifulSoup4\nHTML Parser"]
html2text["html2text\nMarkdown Converter"]
requests["requests\nHTTP Client"]
end
subgraph "Rust Layer"
mdbook["mdbook\nBinary"]
mermaid["mdbook-mermaid\nBinary"]
end
subgraph "Configuration Files"
booktoml["book.toml"]
summarymd["SUMMARY.md"]
end
subgraph "File System"
wikidir["$WIKI_DIR\nTemp Storage"]
outputdir["$OUTPUT_DIR\nFinal Output"]
end
buildsh -->|executes python3| scraper
buildsh -->|generates| booktoml
buildsh -->|generates| summarymd
buildsh -->|executes mdbook-mermaid| mermaid
buildsh -->|executes mdbook| mdbook
scraper -->|uses| bs4
scraper -->|uses| html2text
scraper -->|uses| requests
scraper -->|writes .md files| wikidir
mdbook -->|integrates| mermaid
mdbook -->|reads config| booktoml
mdbook -->|reads TOC| summarymd
mdbook -->|reads sources| wikidir
mdbook -->|writes HTML| outputdir
buildsh -->|copies files| outputdir
Sources: build-docs.sh:1-206 tools/deepwiki-scraper.py:1-920 Dockerfile:1-33
Component Execution Flow
This diagram shows the actual execution sequence with specific function calls and file operations that occur during a complete documentation build.
Execution Flow with Code Entities
sequenceDiagram
participant User
participant buildsh as "build-docs.sh"
participant scraper as "deepwiki-scraper.py::main()"
participant extract as "extract_wiki_structure()"
participant content as "extract_page_content()"
participant enhance as "extract_and_enhance_diagrams()"
participant mdbook as "mdbook binary"
participant mermaid as "mdbook-mermaid binary"
participant fs as "File System"
User->>buildsh: docker run -e REPO=...
buildsh->>buildsh: Parse $REPO, $BOOK_TITLE, etc
buildsh->>buildsh: Set WIKI_DIR=/workspace/wiki
buildsh->>scraper: python3 deepwiki-scraper.py $REPO $WIKI_DIR
scraper->>extract: extract_wiki_structure(repo, session)
extract->>extract: BeautifulSoup4 parsing
extract-->>scraper: pages[] array
loop For each page
scraper->>content: extract_page_content(url, session)
content->>content: convert_html_to_markdown()
content->>content: clean_deepwiki_footer()
content->>fs: Write to $WIKI_DIR/*.md
end
scraper->>enhance: extract_and_enhance_diagrams(repo, temp_dir)
enhance->>enhance: Extract diagrams from JavaScript
enhance->>enhance: Fuzzy match with progressive chunks
enhance->>fs: Update $WIKI_DIR/*.md with diagrams
scraper-->>buildsh: Exit 0
alt MARKDOWN_ONLY=true
buildsh->>fs: cp $WIKI_DIR/* $OUTPUT_DIR/
buildsh-->>User: Exit (skip mdBook)
else Full build
buildsh->>buildsh: Generate book.toml
buildsh->>buildsh: Generate SUMMARY.md from files
buildsh->>fs: mkdir $BOOK_DIR/src
buildsh->>fs: cp $WIKI_DIR/* $BOOK_DIR/src/
buildsh->>mermaid: mdbook-mermaid install $BOOK_DIR
mermaid->>fs: Install mermaid.js assets
buildsh->>mdbook: mdbook build
mdbook->>mdbook: Parse SUMMARY.md
mdbook->>mdbook: Process markdown files
mdbook->>mdbook: Render HTML with rust theme
mdbook->>fs: Write to $BOOK_DIR/book/
buildsh->>fs: cp $BOOK_DIR/book $OUTPUT_DIR/
buildsh->>fs: cp $WIKI_DIR $OUTPUT_DIR/markdown/
buildsh-->>User: Build complete
end
Sources: build-docs.sh:55-206 tools/deepwiki-scraper.py:790-916 tools/deepwiki-scraper.py:596-789
Component Responsibility Matrix
The following table details the specific responsibilities and capabilities of each component.
| Component | Type | Primary Responsibility | Key Functions/Operations | Input | Output |
|---|---|---|---|---|---|
build-docs.sh | Shell Script | Orchestration and configuration | - Parse environment variables | ||
| - Auto-detect Git repository | |||||
| - Execute scraper | |||||
- Generate book.toml | |||||
- Generate SUMMARY.md | |||||
| - Execute mdBook tools | |||||
| - Copy outputs | Environment variables | Complete documentation site | |||
deepwiki-scraper.py | Python Script | Content extraction and enhancement | - extract_wiki_structure() | ||
- extract_page_content() | |||||
- convert_html_to_markdown() | |||||
- extract_and_enhance_diagrams() | |||||
- clean_deepwiki_footer() | DeepWiki URL | Enhanced Markdown files | |||
mdbook | Rust Binary | HTML generation | - Parse SUMMARY.md | ||
| - Process Markdown | |||||
| - Apply theme | |||||
| - Generate navigation | |||||
| - Enable search | Markdown + config | HTML documentation | |||
mdbook-mermaid | Rust Binary | Diagram rendering | - Install mermaid.js | ||
| - Install CSS assets | |||||
| - Process mermaid code blocks | Markdown with mermaid | HTML with rendered diagrams |
Sources: build-docs.sh:1-206 tools/deepwiki-scraper.py:1-920 README.md:146-156
Component File Locations
Each component resides in a specific location within the repository and Docker container, with distinct installation methods.
File System Layout
graph TB
subgraph "Repository Structure"
repo["/"]
buildscript["build-docs.sh\nOrchestrator script"]
dockerfile["Dockerfile\nMulti-stage build"]
toolsdir["tools/"]
scraper_py["deepwiki-scraper.py\nMain scraper"]
requirements["requirements.txt\nPython deps"]
repo --> buildscript
repo --> dockerfile
repo --> toolsdir
toolsdir --> scraper_py
toolsdir --> requirements
end
subgraph "Docker Container"
container["/"]
usrbin["/usr/local/bin/"]
buildsh_installed["build-docs.sh"]
scraper_installed["deepwiki-scraper.py"]
mdbook_bin["mdbook"]
mermaid_bin["mdbook-mermaid"]
workspace["/workspace"]
wikidir["/workspace/wiki"]
bookdir["/workspace/book"]
outputvol["/output"]
container --> usrbin
container --> workspace
container --> outputvol
usrbin --> buildsh_installed
usrbin --> scraper_installed
usrbin --> mdbook_bin
usrbin --> mermaid_bin
workspace --> wikidir
workspace --> bookdir
end
buildscript -.->|COPY| buildsh_installed
scraper_py -.->|COPY| scraper_installed
style buildsh_installed fill:#fff9c4
style scraper_installed fill:#e8f5e9
style mdbook_bin fill:#f3e5f5
style mermaid_bin fill:#f3e5f5
Sources: Dockerfile:1-33 build-docs.sh:27-30
Component Dependencies
Each component has specific external dependencies that must be available at runtime.
| Component | Runtime | Dependencies | Installation Method |
|---|---|---|---|
build-docs.sh | bash | - Git (optional, for auto-detection) | |
| - Python 3.12+ | |||
| - mdbook binary | |||
| - mdbook-mermaid binary | Bundled in Docker | ||
deepwiki-scraper.py | Python 3.12 | - requests (HTTP client) | |
- beautifulsoup4 (HTML parsing) | |||
- html2text (Markdown conversion) | uv pip install -r requirements.txt | ||
mdbook | Native Binary | - Compiled from Rust source | |
| - No runtime dependencies | cargo install mdbook | ||
mdbook-mermaid | Native Binary | - Compiled from Rust source | |
| - No runtime dependencies | cargo install mdbook-mermaid |
Sources: Dockerfile:1-33 tools/requirements.txt README.md:154-156
Component Communication Protocol
Components communicate exclusively through the file system and process exit codes, with no direct API calls or shared memory.
Inter-Component Communication
graph LR
subgraph "Phase 1: Extraction"
buildsh1["build-docs.sh"]
scraper1["deepwiki-scraper.py"]
env["Environment:\n$REPO\n$WIKI_DIR"]
wikidir1["$WIKI_DIR/\n*.md files"]
buildsh1 -->|sets| env
env -->|python3 scraper.py $REPO $WIKI_DIR| scraper1
scraper1 -->|writes| wikidir1
scraper1 -.->|exit 0| buildsh1
end
subgraph "Phase 2: Configuration"
buildsh2["build-docs.sh"]
booktoml2["book.toml"]
summarymd2["SUMMARY.md"]
wikidir2["$WIKI_DIR/\nfile scan"]
buildsh2 -->|reads structure| wikidir2
buildsh2 -->|cat > book.toml| booktoml2
buildsh2 -->|generates from files| summarymd2
end
subgraph "Phase 3: Build"
buildsh3["build-docs.sh"]
mermaid3["mdbook-mermaid"]
mdbook3["mdbook"]
config3["book.toml\nSUMMARY.md\nsrc/*.md"]
output3["$OUTPUT_DIR/\nbook/"]
buildsh3 -->|mdbook-mermaid install| mermaid3
mermaid3 -->|writes assets| config3
buildsh3 -->|mdbook build| mdbook3
mdbook3 -->|reads| config3
mdbook3 -->|writes| output3
mdbook3 -.->|exit 0| buildsh3
end
wikidir1 -->|same files| wikidir2
Sources: build-docs.sh:55-206
Environment Variable Interface
The orchestrator component accepts configuration through environment variables, which control all aspects of system behavior.
| Variable | Purpose | Default | Used By | Set At |
|---|---|---|---|---|
$REPO | GitHub repository identifier | Auto-detected | build-docs.sh, deepwiki-scraper.py | build-docs.sh:9-19 |
$BOOK_TITLE | Documentation title | "Documentation" | build-docs.sh (book.toml) | build-docs.sh23 |
$BOOK_AUTHORS | Author name(s) | Extracted from $REPO | build-docs.sh (book.toml) | build-docs.sh:24-44 |
$GIT_REPO_URL | Source repository URL | Constructed from $REPO | build-docs.sh (book.toml) | build-docs.sh:25-45 |
$MARKDOWN_ONLY | Skip mdBook build | "false" | build-docs.sh | build-docs.sh:26-76 |
$WORK_DIR | Working directory | "/workspace" | build-docs.sh | build-docs.sh27 |
$WIKI_DIR | Temp markdown storage | "$WORK_DIR/wiki" | build-docs.sh, deepwiki-scraper.py | build-docs.sh28 |
$OUTPUT_DIR | Final output location | "/output" | build-docs.sh | build-docs.sh29 |
$BOOK_DIR | mdBook workspace | "$WORK_DIR/book" | build-docs.sh | build-docs.sh30 |
Sources: build-docs.sh:8-30 build-docs.sh:43-45
Python Module Structure
The deepwiki-scraper.py component is organized as a single-file script with a clear functional hierarchy.
Python Function Call Graph
graph TD
main["main()\nEntry point"]
extract_struct["extract_wiki_structure()\nDiscover pages"]
extract_content["extract_page_content()\nProcess single page"]
enhance["extract_and_enhance_diagrams()\nAdd diagrams"]
fetch["fetch_page()\nHTTP with retries"]
sanitize["sanitize_filename()\nClean filenames"]
convert["convert_html_to_markdown()\nHTML→MD"]
clean["clean_deepwiki_footer()\nRemove UI"]
extract_mermaid["extract_mermaid_from_nextjs_data()\nParse JS payload"]
main --> extract_struct
main --> extract_content
main --> enhance
extract_struct --> fetch
extract_content --> fetch
extract_content --> convert
convert --> clean
enhance --> fetch
enhance --> extract_mermaid
extract_content --> sanitize
Sources: tools/deepwiki-scraper.py:790-919 tools/deepwiki-scraper.py:78-125 tools/deepwiki-scraper.py:453-594 tools/deepwiki-scraper.py:596-789
graph TB
start["Start"]
detect["Auto-detect Git repository\nlines 9-19"]
validate["Validate configuration\nlines 32-53"]
step1["Step 1: Execute scraper\nline 58:\npython3 deepwiki-scraper.py"]
check{"MARKDOWN_ONLY\n== true?"}
markdown_exit["Copy markdown only\nlines 64-76\nexit 0"]
step2["Step 2: Initialize mdBook\nlines 79-106:\nmkdir, cat > book.toml"]
step3["Step 3: Generate SUMMARY.md\nlines 109-159:\nscan files, generate TOC"]
step4["Step 4: Copy sources\nlines 164-166:\ncp wiki/* src/"]
step5["Step 5: Install mermaid\nlines 169-171:\nmdbook-mermaid install"]
step6["Step 6: Build book\nlines 174-176:\nmdbook build"]
step7["Step 7: Copy outputs\nlines 179-191:\ncp to /output"]
done["Done"]
start --> detect
detect --> validate
validate --> step1
step1 --> check
check -->|yes| markdown_exit
markdown_exit --> done
check -->|no| step2
step2 --> step3
step3 --> step4
step4 --> step5
step5 --> step6
step6 --> step7
step7 --> done
Shell Script Structure
The build-docs.sh orchestrator follows a linear execution model with conditional branching for markdown-only mode.
Shell Script Execution Blocks
Sources: build-docs.sh:1-206
Cross-Component Data Formats
Data passes between components in well-defined formats through the file system.
| Data Format | Producer | Consumer | Location | Structure |
|---|---|---|---|---|
| Enhanced Markdown | deepwiki-scraper.py | mdbook | $WIKI_DIR/*.md | UTF-8 text, front matter optional, mermaid code blocks |
book.toml | build-docs.sh | mdbook | $BOOK_DIR/book.toml | TOML format, sections: [book], [output.html], [preprocessor.mermaid] |
SUMMARY.md | build-docs.sh | mdbook | $BOOK_DIR/src/SUMMARY.md | Markdown list format, relative file paths |
| File hierarchy | deepwiki-scraper.py | build-docs.sh | $WIKI_DIR/ and $WIKI_DIR/section-*/ | Root: N-title.md, Subsections: section-N/N-M-title.md |
| HTML output | mdbook | User | $OUTPUT_DIR/book/ | Complete static site with search index |
Sources: build-docs.sh:84-103 build-docs.sh:112-159 tools/deepwiki-scraper.py:849-868
graph TB
subgraph "Stage 1: rust:latest"
rust_base["rust:latest base\n~1.5 GB"]
cargo["cargo install"]
mdbook_build["mdbook binary\ncompilation"]
mermaid_build["mdbook-mermaid binary\ncompilation"]
rust_base --> cargo
cargo --> mdbook_build
cargo --> mermaid_build
end
subgraph "Stage 2: python:3.12-slim"
py_base["python:3.12-slim base\n~150 MB"]
uv_install["Install uv package manager"]
pip_install["uv pip install\nrequirements.txt"]
copy_rust["COPY --from=builder\nRust binaries"]
copy_scripts["COPY Python + Shell scripts"]
py_base --> uv_install
uv_install --> pip_install
pip_install --> copy_rust
copy_rust --> copy_scripts
end
subgraph "Final Image Contents"
final["/usr/local/bin/"]
build_sh["build-docs.sh"]
scraper_py["deepwiki-scraper.py"]
mdbook_final["mdbook"]
mermaid_final["mdbook-mermaid"]
final --> build_sh
final --> scraper_py
final --> mdbook_final
final --> mermaid_final
end
mdbook_build -.->|extract| copy_rust
mermaid_build -.->|extract| copy_rust
copy_scripts --> build_sh
copy_scripts --> scraper_py
copy_rust --> mdbook_final
copy_rust --> mermaid_final
Component Installation in Docker
The multi-stage Docker build process installs each component using its native tooling, then combines them in a minimal runtime image.
Docker Build Process
Sources: Dockerfile:1-33
Next Steps
For detailed implementation documentation of each component, see:
- build-docs.sh Orchestrator : Environment variable parsing, Git auto-detection, configuration file generation, subprocess execution, error handling
- deepwiki-scraper.py : Wiki structure discovery, HTML parsing, Markdown conversion, diagram extraction algorithms, fuzzy matching implementation
- mdBook Integration : Configuration schema, SUMMARY.md generation algorithm, mdbook-mermaid preprocessor integration, theme customization