Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

DeepWiki GitHub

Development Guide

Relevant source files

This page provides guidance for developers who want to modify, extend, or contribute to the DeepWiki-to-mdBook Converter system. It covers the development environment setup, local workflow, testing procedures, and key considerations when working with the codebase.

For detailed information about the repository structure, see Project File Structure. For instructions on building the Docker image, see Building the Docker Image. For Python dependency details, see Python Dependencies.

Development Environment Requirements

The system is designed to run entirely within Docker, but local development requires the following tools:

ToolPurposeVersion
DockerContainer runtimeLatest stable
GitVersion control2.x or later
Text editor/IDECode editingAny (VS Code recommended)
PythonLocal testing (optional)3.12+
Rust toolchainLocal testing (optional)Latest stable

The Docker image handles all runtime dependencies, so local installation of Python and Rust is optional and only needed for testing individual components outside the container.

Sources: Dockerfile:1-33

Development Workflow Architecture

The following diagram shows the typical development cycle and how different components interact during development:

Development Workflow Diagram : Shows the cycle from editing code to building the Docker image to testing with mounted output volume.

graph TB
    subgraph "Development Environment"
        Editor["Code Editor"]
GitRepo["Local Git Repository"]
end
    
    subgraph "Docker Build Process"
        BuildCmd["docker build -t deepwiki-scraper ."]
Stage1["Rust Builder Stage\nCompiles mdbook binaries"]
Stage2["Python Runtime Stage\nAssembles final image"]
FinalImage["deepwiki-scraper:latest"]
end
    
    subgraph "Testing & Validation"
        RunCmd["docker run with test params"]
OutputMount["Volume mount: ./output"]
Validation["Manual inspection of output"]
end
    
    subgraph "Key Development Files"
        Dockerfile["Dockerfile"]
BuildScript["build-docs.sh"]
Scraper["tools/deepwiki-scraper.py"]
Requirements["tools/requirements.txt"]
end
    
 
   Editor -->|Edit| GitRepo
 
   GitRepo --> Dockerfile
 
   GitRepo --> BuildScript
 
   GitRepo --> Scraper
 
   GitRepo --> Requirements
    
 
   BuildCmd --> Stage1
 
   Stage1 --> Stage2
 
   Stage2 --> FinalImage
    
 
   FinalImage --> RunCmd
 
   RunCmd --> OutputMount
 
   OutputMount --> Validation
    
 
   Validation -.->|Iterate| Editor

Sources: Dockerfile:1-33 build-docs.sh:1-206

Component Development Map

This diagram bridges system concepts to actual code entities, showing which files implement which functionality:

Code Entity Mapping Diagram : Maps system functionality to specific code locations, file paths, and binaries.

graph LR
    subgraph "Entry Point Layer"
        CMD["CMD in Dockerfile:32"]
BuildDocs["build-docs.sh"]
end
    
    subgraph "Configuration Layer"
        EnvVars["Environment Variables\nREPO, BOOK_TITLE, etc."]
AutoDetect["Auto-detect logic\nbuild-docs.sh:8-19"]
Validation["Validation\nbuild-docs.sh:33-37"]
end
    
    subgraph "Processing Scripts"
        ScraperPy["deepwiki-scraper.py"]
MdBookBin["/usr/local/bin/mdbook"]
MermaidBin["/usr/local/bin/mdbook-mermaid"]
end
    
    subgraph "Configuration Generation"
        BookToml["book.toml generation\nbuild-docs.sh:85-103"]
SummaryMd["SUMMARY.md generation\nbuild-docs.sh:113-159"]
end
    
    subgraph "Dependency Management"
        ReqTxt["requirements.txt"]
UvInstall["uv pip install\nDockerfile:17"]
CargoInstall["cargo install\nDockerfile:5"]
end
    
 
   CMD --> BuildDocs
 
   BuildDocs --> EnvVars
 
   EnvVars --> AutoDetect
 
   AutoDetect --> Validation
    
 
   Validation --> ScraperPy
 
   BuildDocs --> BookToml
 
   BuildDocs --> SummaryMd
    
 
   BuildDocs --> MdBookBin
 
   MdBookBin --> MermaidBin
    
 
   ReqTxt --> UvInstall
 
   UvInstall --> ScraperPy
 
   CargoInstall --> MdBookBin
 
   CargoInstall --> MermaidBin

Sources: Dockerfile:1-33 build-docs.sh:8-19 build-docs.sh:85-103 build-docs.sh:113-159

Local Development Workflow

1. Clone and Setup

The repository has a minimal structure focused on the essential build artifacts. The .gitignore:1-2 excludes the output/ directory to prevent committing generated files.

2. Make Changes

Key files for common modifications:

Modification TypePrimary FileRelated Files
Scraping logictools/deepwiki-scraper.py-
Build orchestrationbuild-docs.sh-
Python dependenciestools/requirements.txtDockerfile:16-17
Docker build processDockerfile-
Output structurebuild-docs.shLines 179-191

3. Build Docker Image

After making changes, rebuild the Docker image:

The multi-stage build process Dockerfile:1-7 first compiles Rust binaries in a rust:latest builder stage, then Dockerfile:8-33 assembles the final python:3.12-slim image with copied binaries and Python dependencies.

4. Test Changes

Test with a real repository:

Setting MARKDOWN_ONLY=true build-docs.sh:61-76 bypasses the mdBook build phase, allowing faster iteration when testing scraping logic changes.

5. Validate Output

Inspect the generated files:

Sources: .gitignore:1-2 Dockerfile:1-33 build-docs.sh:61-76 build-docs.sh:179-191

Testing Strategies

Fast Iteration with Markdown-Only Mode

The MARKDOWN_ONLY environment variable enables a fast path for testing scraping changes:

This mode executes only Phase 1 (Markdown Extraction) and skips Phase 2 (Diagram Enhancement) and Phase 3 (mdBook Build). See Phase 1: Markdown Extraction for details on what this phase includes.

The conditional logic build-docs.sh:61-76 checks the MARKDOWN_ONLY variable and exits early after copying markdown files to /output/markdown/.

Testing Auto-Detection

The repository auto-detection logic build-docs.sh:8-19 attempts to extract the GitHub repository from Git remotes if REPO is not explicitly set:

The script checks git config --get remote.origin.url and extracts the owner/repo portion using sed pattern matching build-docs.sh16

Testing Configuration Generation

To test book.toml and SUMMARY.md generation without a full build:

The book.toml template build-docs.sh:85-103 uses shell variable substitution to inject environment variables into the TOML structure.

Sources: build-docs.sh:8-19 build-docs.sh:61-76 build-docs.sh:85-103

Debugging Techniques

Inspecting Intermediate Files

The build process creates temporary files in /workspace inside the container. To inspect them:

This allows inspection of:

  • Scraped markdown files in /workspace/wiki/
  • Generated book.toml in /workspace/book/
  • Generated SUMMARY.md in /workspace/book/src/

Adding Debug Output

Both build-docs.sh:1-206 and deepwiki-scraper.py use echo statements for progress tracking. Add additional debug output:

Testing Python Script Independently

To test the scraper without Docker:

This is useful for rapid iteration on scraping logic without rebuilding the Docker image.

Sources: build-docs.sh:1-206 tools/requirements.txt:1-4

Build Optimization Considerations

Multi-Stage Build Rationale

The Dockerfile:1-7 uses a separate Rust builder stage to:

  1. Compile mdbook and mdbook-mermaid with a full Rust toolchain
  2. Discard the ~1.5 GB builder stage after compilation
  3. Copy only the compiled binaries Dockerfile:20-21 to the final image

This reduces the final image size from ~1.5 GB to ~300-400 MB while still providing both Python and Rust tools. See Docker Multi-Stage Build for architectural details.

Dependency Management with uv

The Dockerfile13 copies uv from the official Astral image and uses it Dockerfile17 to install Python dependencies with --no-cache flag:

This approach:

  • Provides faster dependency resolution than pip
  • Reduces layer size with --no-cache
  • Installs system-wide with --system flag

Image Layer Ordering

The Dockerfile orders operations to maximize layer caching:

  1. Copy uv binary (rarely changes)
  2. Install Python dependencies (changes with requirements.txt)
  3. Copy Rust binaries (changes when rebuilding Rust stage)
  4. Copy Python scripts (changes frequently during development)

This ordering means modifying deepwiki-scraper.py only invalidates the final layers Dockerfile:24-29 not the entire dependency installation.

Sources: Dockerfile:1-33

Common Development Tasks

Adding a New Environment Variable

To add a new configuration option:

  1. Define default in build-docs.sh:21-30:

  2. Add to configuration display build-docs.sh:47-53:

  3. Use in downstream processing as needed

  4. Document in Configuration Reference

Modifying SUMMARY.md Generation

The table of contents generation logic build-docs.sh:113-159 uses bash loops and file discovery:

To modify the structure:

  1. Adjust the file pattern matching
  2. Modify the section detection logic
  3. Update the markdown output format
  4. Test with repositories that have different hierarchical structures

Adding New Python Dependencies

  1. Add to tools/requirements.txt:1-4 with version constraint:
new-package>=1.0.0
  1. Rebuild Docker image (triggers Dockerfile17)

  2. Update Python Dependencies documentation

  3. Import and use in deepwiki-scraper.py

Sources: build-docs.sh:21-30 build-docs.sh:113-159 tools/requirements.txt:1-4 Dockerfile17

File Modification Guidelines

Modifying build-docs.sh

The orchestrator script uses several idioms:

PatternPurposeExample
set -eExit on errorbuild-docs.sh2
"${VAR:-default}"Default valuesbuild-docs.sh:22-26
$(command)Command substitutionbuild-docs.sh12
echo ""Visual spacingbuild-docs.sh47
mkdir -pSafe directory creationbuild-docs.sh64

Maintain these patterns for consistency. The script is designed to be readable and self-documenting with clear step labels build-docs.sh:4-6

Modifying Dockerfile

Key considerations:

Modifying Python Scripts

When editing tools/deepwiki-scraper.py:

  • The script is executed via build-docs.sh58 with two arguments: REPO and output directory
  • It must be Python 3.12 compatible Dockerfile8
  • It has access to dependencies from tools/requirements.txt:1-4
  • It should write output to the specified directory argument
  • It should use print() for progress output that appears in build logs

Sources: build-docs.sh2 build-docs.sh58 Dockerfile:1-33 tools/requirements.txt:1-4

Integration Testing

End-to-End Test

Validate the complete pipeline:

Testing Configuration Variants

Test different repository configurations:

Sources: build-docs.sh:8-19 build-docs.sh:61-76

Contributing Guidelines

When submitting changes:

  1. Test locally : Build and run the Docker image with multiple test repositories
  2. Validate output : Ensure markdown files are properly formatted and the HTML site builds correctly
  3. Check backwards compatibility : Existing repositories should continue to work
  4. Update documentation : Modify relevant wiki pages if changing behavior
  5. Follow existing patterns : Match the coding style in build-docs.sh:1-206

The system is designed to be "fully generic" - it should work with any DeepWiki repository without modification. Test that your changes maintain this property.

Sources: build-docs.sh:1-206

Troubleshooting Development Issues

Build Failures

SymptomLikely CauseSolution
Rust compilation failsNetwork issues, incompatible versionsCheck rust:latest image availability
Python package install failsVersion conflicts in requirements.txtVerify package versions are compatible
mdbook not foundBinary copy failedCheck Dockerfile:20-21 paths
Permission denied on scriptsMissing chmod +xVerify Dockerfile:25-29

Runtime Failures

SymptomLikely CauseSolution
"REPO must be set" errorAuto-detection failed, no REPO env varCheck build-docs.sh:33-36 validation logic
Scraper crashesDeepWiki site structure changedDebug deepwiki-scraper.py with local testing
SUMMARY.md is emptyNo markdown files foundVerify scraper output in /workspace/wiki/
mdBook build failsInvalid markdown syntaxInspect markdown files for issues

Output Validation Checklist

After a successful build, verify:

  • output/markdown/ contains .md files
  • Section directories exist (e.g., output/markdown/section-4/)
  • output/book/index.html exists and opens in browser
  • Navigation menu appears in generated site
  • Search functionality works
  • Mermaid diagrams render correctly
  • Links between pages work
  • "Edit this file" links point to correct GitHub URLs

Sources: build-docs.sh:33-36 Dockerfile:20-21 Dockerfile:25-29