Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

DeepWiki GitHub

Component Reference

Relevant source files

This page provides an overview of the three major components that comprise the DeepWiki-to-mdBook Converter system and their responsibilities. Each component operates at a different layer of the technology stack (Shell, Python, Rust) and handles a specific phase of the documentation transformation pipeline.

For detailed documentation of each component's internal implementation, see:

System Component Architecture

The system consists of three primary executable components that work together in sequence, coordinated through file system operations and process execution.

Component Architecture Diagram

graph TB
    subgraph "Shell Layer"
        buildsh["build-docs.sh\nOrchestrator"]
end
    
    subgraph "Python Layer"
        scraper["deepwiki-scraper.py\nContent Processor"]
bs4["BeautifulSoup4\nHTML Parser"]
html2text["html2text\nMarkdown Converter"]
requests["requests\nHTTP Client"]
end
    
    subgraph "Rust Layer"
        mdbook["mdbook\nBinary"]
mermaid["mdbook-mermaid\nBinary"]
end
    
    subgraph "Configuration Files"
        booktoml["book.toml"]
summarymd["SUMMARY.md"]
end
    
    subgraph "File System"
        wikidir["$WIKI_DIR\nTemp Storage"]
outputdir["$OUTPUT_DIR\nFinal Output"]
end
    
 
   buildsh -->|executes python3| scraper
 
   buildsh -->|generates| booktoml
 
   buildsh -->|generates| summarymd
 
   buildsh -->|executes mdbook-mermaid| mermaid
 
   buildsh -->|executes mdbook| mdbook
    
 
   scraper -->|uses| bs4
 
   scraper -->|uses| html2text
 
   scraper -->|uses| requests
 
   scraper -->|writes .md files| wikidir
    
 
   mdbook -->|integrates| mermaid
 
   mdbook -->|reads config| booktoml
 
   mdbook -->|reads TOC| summarymd
 
   mdbook -->|reads sources| wikidir
 
   mdbook -->|writes HTML| outputdir
    
 
   buildsh -->|copies files| outputdir

Sources: build-docs.sh:1-206 tools/deepwiki-scraper.py:1-920 Dockerfile:1-33

Component Execution Flow

This diagram shows the actual execution sequence with specific function calls and file operations that occur during a complete documentation build.

Execution Flow with Code Entities

sequenceDiagram
    participant User
    participant buildsh as "build-docs.sh"
    participant scraper as "deepwiki-scraper.py::main()"
    participant extract as "extract_wiki_structure()"
    participant content as "extract_page_content()"
    participant enhance as "extract_and_enhance_diagrams()"
    participant mdbook as "mdbook binary"
    participant mermaid as "mdbook-mermaid binary"
    participant fs as "File System"
    
    User->>buildsh: docker run -e REPO=...
    buildsh->>buildsh: Parse $REPO, $BOOK_TITLE, etc
    buildsh->>buildsh: Set WIKI_DIR=/workspace/wiki
    
    buildsh->>scraper: python3 deepwiki-scraper.py $REPO $WIKI_DIR
    scraper->>extract: extract_wiki_structure(repo, session)
    extract->>extract: BeautifulSoup4 parsing
    extract-->>scraper: pages[] array
    
    loop For each page
        scraper->>content: extract_page_content(url, session)
        content->>content: convert_html_to_markdown()
        content->>content: clean_deepwiki_footer()
        content->>fs: Write to $WIKI_DIR/*.md
    end
    
    scraper->>enhance: extract_and_enhance_diagrams(repo, temp_dir)
    enhance->>enhance: Extract diagrams from JavaScript
    enhance->>enhance: Fuzzy match with progressive chunks
    enhance->>fs: Update $WIKI_DIR/*.md with diagrams
    scraper-->>buildsh: Exit 0
    
    alt MARKDOWN_ONLY=true
        buildsh->>fs: cp $WIKI_DIR/* $OUTPUT_DIR/
        buildsh-->>User: Exit (skip mdBook)
    else Full build
        buildsh->>buildsh: Generate book.toml
        buildsh->>buildsh: Generate SUMMARY.md from files
        buildsh->>fs: mkdir $BOOK_DIR/src
        buildsh->>fs: cp $WIKI_DIR/* $BOOK_DIR/src/
        
        buildsh->>mermaid: mdbook-mermaid install $BOOK_DIR
        mermaid->>fs: Install mermaid.js assets
        
        buildsh->>mdbook: mdbook build
        mdbook->>mdbook: Parse SUMMARY.md
        mdbook->>mdbook: Process markdown files
        mdbook->>mdbook: Render HTML with rust theme
        mdbook->>fs: Write to $BOOK_DIR/book/
        
        buildsh->>fs: cp $BOOK_DIR/book $OUTPUT_DIR/
        buildsh->>fs: cp $WIKI_DIR $OUTPUT_DIR/markdown/
        buildsh-->>User: Build complete
    end

Sources: build-docs.sh:55-206 tools/deepwiki-scraper.py:790-916 tools/deepwiki-scraper.py:596-789

Component Responsibility Matrix

The following table details the specific responsibilities and capabilities of each component.

ComponentTypePrimary ResponsibilityKey Functions/OperationsInputOutput
build-docs.shShell ScriptOrchestration and configuration- Parse environment variables
- Auto-detect Git repository
- Execute scraper
- Generate book.toml
- Generate SUMMARY.md
- Execute mdBook tools
- Copy outputsEnvironment variablesComplete documentation site
deepwiki-scraper.pyPython ScriptContent extraction and enhancement- extract_wiki_structure()
- extract_page_content()
- convert_html_to_markdown()
- extract_and_enhance_diagrams()
- clean_deepwiki_footer()DeepWiki URLEnhanced Markdown files
mdbookRust BinaryHTML generation- Parse SUMMARY.md
- Process Markdown
- Apply theme
- Generate navigation
- Enable searchMarkdown + configHTML documentation
mdbook-mermaidRust BinaryDiagram rendering- Install mermaid.js
- Install CSS assets
- Process mermaid code blocksMarkdown with mermaidHTML with rendered diagrams

Sources: build-docs.sh:1-206 tools/deepwiki-scraper.py:1-920 README.md:146-156

Component File Locations

Each component resides in a specific location within the repository and Docker container, with distinct installation methods.

File System Layout

graph TB
    subgraph "Repository Structure"
        repo["/"]
buildscript["build-docs.sh\nOrchestrator script"]
dockerfile["Dockerfile\nMulti-stage build"]
toolsdir["tools/"]
scraper_py["deepwiki-scraper.py\nMain scraper"]
requirements["requirements.txt\nPython deps"]
repo --> buildscript
 
       repo --> dockerfile
 
       repo --> toolsdir
 
       toolsdir --> scraper_py
 
       toolsdir --> requirements
    end
    
    subgraph "Docker Container"
        container["/"]
usrbin["/usr/local/bin/"]
buildsh_installed["build-docs.sh"]
scraper_installed["deepwiki-scraper.py"]
mdbook_bin["mdbook"]
mermaid_bin["mdbook-mermaid"]
workspace["/workspace"]
wikidir["/workspace/wiki"]
bookdir["/workspace/book"]
outputvol["/output"]
container --> usrbin
 
       container --> workspace
 
       container --> outputvol
        
 
       usrbin --> buildsh_installed
 
       usrbin --> scraper_installed
 
       usrbin --> mdbook_bin
 
       usrbin --> mermaid_bin
        
 
       workspace --> wikidir
 
       workspace --> bookdir
    end
    
 
   buildscript -.->|COPY| buildsh_installed
 
   scraper_py -.->|COPY| scraper_installed
    
    style buildsh_installed fill:#fff9c4
    style scraper_installed fill:#e8f5e9
    style mdbook_bin fill:#f3e5f5
    style mermaid_bin fill:#f3e5f5

Sources: Dockerfile:1-33 build-docs.sh:27-30

Component Dependencies

Each component has specific external dependencies that must be available at runtime.

ComponentRuntimeDependenciesInstallation Method
build-docs.shbash- Git (optional, for auto-detection)
- Python 3.12+
- mdbook binary
- mdbook-mermaid binaryBundled in Docker
deepwiki-scraper.pyPython 3.12- requests (HTTP client)
- beautifulsoup4 (HTML parsing)
- html2text (Markdown conversion)uv pip install -r requirements.txt
mdbookNative Binary- Compiled from Rust source
- No runtime dependenciescargo install mdbook
mdbook-mermaidNative Binary- Compiled from Rust source
- No runtime dependenciescargo install mdbook-mermaid

Sources: Dockerfile:1-33 tools/requirements.txt README.md:154-156

Component Communication Protocol

Components communicate exclusively through the file system and process exit codes, with no direct API calls or shared memory.

Inter-Component Communication

graph LR
    subgraph "Phase 1: Extraction"
        buildsh1["build-docs.sh"]
scraper1["deepwiki-scraper.py"]
env["Environment:\n$REPO\n$WIKI_DIR"]
wikidir1["$WIKI_DIR/\n*.md files"]
buildsh1 -->|sets| env
 
       env -->|python3 scraper.py $REPO $WIKI_DIR| scraper1
 
       scraper1 -->|writes| wikidir1
 
       scraper1 -.->|exit 0| buildsh1
    end
    
    subgraph "Phase 2: Configuration"
        buildsh2["build-docs.sh"]
booktoml2["book.toml"]
summarymd2["SUMMARY.md"]
wikidir2["$WIKI_DIR/\nfile scan"]
buildsh2 -->|reads structure| wikidir2
 
       buildsh2 -->|cat > book.toml| booktoml2
 
       buildsh2 -->|generates from files| summarymd2
    end
    
    subgraph "Phase 3: Build"
        buildsh3["build-docs.sh"]
mermaid3["mdbook-mermaid"]
mdbook3["mdbook"]
config3["book.toml\nSUMMARY.md\nsrc/*.md"]
output3["$OUTPUT_DIR/\nbook/"]
buildsh3 -->|mdbook-mermaid install| mermaid3
 
       mermaid3 -->|writes assets| config3
 
       buildsh3 -->|mdbook build| mdbook3
 
       mdbook3 -->|reads| config3
 
       mdbook3 -->|writes| output3
 
       mdbook3 -.->|exit 0| buildsh3
    end
    
 
   wikidir1 -->|same files| wikidir2

Sources: build-docs.sh:55-206

Environment Variable Interface

The orchestrator component accepts configuration through environment variables, which control all aspects of system behavior.

VariablePurposeDefaultUsed BySet At
$REPOGitHub repository identifierAuto-detectedbuild-docs.sh, deepwiki-scraper.pybuild-docs.sh:9-19
$BOOK_TITLEDocumentation title"Documentation"build-docs.sh (book.toml)build-docs.sh23
$BOOK_AUTHORSAuthor name(s)Extracted from $REPObuild-docs.sh (book.toml)build-docs.sh:24-44
$GIT_REPO_URLSource repository URLConstructed from $REPObuild-docs.sh (book.toml)build-docs.sh:25-45
$MARKDOWN_ONLYSkip mdBook build"false"build-docs.shbuild-docs.sh:26-76
$WORK_DIRWorking directory"/workspace"build-docs.shbuild-docs.sh27
$WIKI_DIRTemp markdown storage"$WORK_DIR/wiki"build-docs.sh, deepwiki-scraper.pybuild-docs.sh28
$OUTPUT_DIRFinal output location"/output"build-docs.shbuild-docs.sh29
$BOOK_DIRmdBook workspace"$WORK_DIR/book"build-docs.shbuild-docs.sh30

Sources: build-docs.sh:8-30 build-docs.sh:43-45

Python Module Structure

The deepwiki-scraper.py component is organized as a single-file script with a clear functional hierarchy.

Python Function Call Graph

graph TD
    main["main()\nEntry point"]
extract_struct["extract_wiki_structure()\nDiscover pages"]
extract_content["extract_page_content()\nProcess single page"]
enhance["extract_and_enhance_diagrams()\nAdd diagrams"]
fetch["fetch_page()\nHTTP with retries"]
sanitize["sanitize_filename()\nClean filenames"]
convert["convert_html_to_markdown()\nHTML→MD"]
clean["clean_deepwiki_footer()\nRemove UI"]
extract_mermaid["extract_mermaid_from_nextjs_data()\nParse JS payload"]
main --> extract_struct
 
   main --> extract_content
 
   main --> enhance
    
 
   extract_struct --> fetch
 
   extract_content --> fetch
 
   extract_content --> convert
    
 
   convert --> clean
    
 
   enhance --> fetch
 
   enhance --> extract_mermaid
    
 
   extract_content --> sanitize

Sources: tools/deepwiki-scraper.py:790-919 tools/deepwiki-scraper.py:78-125 tools/deepwiki-scraper.py:453-594 tools/deepwiki-scraper.py:596-789

graph TB
    start["Start"]
detect["Auto-detect Git repository\nlines 9-19"]
validate["Validate configuration\nlines 32-53"]
step1["Step 1: Execute scraper\nline 58:\npython3 deepwiki-scraper.py"]
check{"MARKDOWN_ONLY\n== true?"}
markdown_exit["Copy markdown only\nlines 64-76\nexit 0"]
step2["Step 2: Initialize mdBook\nlines 79-106:\nmkdir, cat > book.toml"]
step3["Step 3: Generate SUMMARY.md\nlines 109-159:\nscan files, generate TOC"]
step4["Step 4: Copy sources\nlines 164-166:\ncp wiki/* src/"]
step5["Step 5: Install mermaid\nlines 169-171:\nmdbook-mermaid install"]
step6["Step 6: Build book\nlines 174-176:\nmdbook build"]
step7["Step 7: Copy outputs\nlines 179-191:\ncp to /output"]
done["Done"]
start --> detect
 
   detect --> validate
 
   validate --> step1
 
   step1 --> check
    
 
   check -->|yes| markdown_exit
 
   markdown_exit --> done
    
 
   check -->|no| step2
 
   step2 --> step3
 
   step3 --> step4
 
   step4 --> step5
 
   step5 --> step6
 
   step6 --> step7
 
   step7 --> done

Shell Script Structure

The build-docs.sh orchestrator follows a linear execution model with conditional branching for markdown-only mode.

Shell Script Execution Blocks

Sources: build-docs.sh:1-206

Cross-Component Data Formats

Data passes between components in well-defined formats through the file system.

Data FormatProducerConsumerLocationStructure
Enhanced Markdowndeepwiki-scraper.pymdbook$WIKI_DIR/*.mdUTF-8 text, front matter optional, mermaid code blocks
book.tomlbuild-docs.shmdbook$BOOK_DIR/book.tomlTOML format, sections: [book], [output.html], [preprocessor.mermaid]
SUMMARY.mdbuild-docs.shmdbook$BOOK_DIR/src/SUMMARY.mdMarkdown list format, relative file paths
File hierarchydeepwiki-scraper.pybuild-docs.sh$WIKI_DIR/ and $WIKI_DIR/section-*/Root: N-title.md, Subsections: section-N/N-M-title.md
HTML outputmdbookUser$OUTPUT_DIR/book/Complete static site with search index

Sources: build-docs.sh:84-103 build-docs.sh:112-159 tools/deepwiki-scraper.py:849-868

graph TB
    subgraph "Stage 1: rust:latest"
        rust_base["rust:latest base\n~1.5 GB"]
cargo["cargo install"]
mdbook_build["mdbook binary\ncompilation"]
mermaid_build["mdbook-mermaid binary\ncompilation"]
rust_base --> cargo
 
       cargo --> mdbook_build
 
       cargo --> mermaid_build
    end
    
    subgraph "Stage 2: python:3.12-slim"
        py_base["python:3.12-slim base\n~150 MB"]
uv_install["Install uv package manager"]
pip_install["uv pip install\nrequirements.txt"]
copy_rust["COPY --from=builder\nRust binaries"]
copy_scripts["COPY Python + Shell scripts"]
py_base --> uv_install
 
       uv_install --> pip_install
 
       pip_install --> copy_rust
 
       copy_rust --> copy_scripts
    end
    
    subgraph "Final Image Contents"
        final["/usr/local/bin/"]
build_sh["build-docs.sh"]
scraper_py["deepwiki-scraper.py"]
mdbook_final["mdbook"]
mermaid_final["mdbook-mermaid"]
final --> build_sh
 
       final --> scraper_py
 
       final --> mdbook_final
 
       final --> mermaid_final
    end
    
 
   mdbook_build -.->|extract| copy_rust
 
   mermaid_build -.->|extract| copy_rust
    
 
   copy_scripts --> build_sh
 
   copy_scripts --> scraper_py
 
   copy_rust --> mdbook_final
 
   copy_rust --> mermaid_final

Component Installation in Docker

The multi-stage Docker build process installs each component using its native tooling, then combines them in a minimal runtime image.

Docker Build Process

Sources: Dockerfile:1-33

Next Steps

For detailed implementation documentation of each component, see:

  • build-docs.sh Orchestrator : Environment variable parsing, Git auto-detection, configuration file generation, subprocess execution, error handling
  • deepwiki-scraper.py : Wiki structure discovery, HTML parsing, Markdown conversion, diagram extraction algorithms, fuzzy matching implementation
  • mdBook Integration : Configuration schema, SUMMARY.md generation algorithm, mdbook-mermaid preprocessor integration, theme customization