Development Guide
Relevant source files
This page provides guidance for developers who want to modify, extend, or contribute to the DeepWiki-to-mdBook Converter system. It covers the development environment setup, local workflow, testing procedures, and key considerations when working with the codebase.
For detailed information about the repository structure, see Project File Structure. For instructions on building the Docker image, see Building the Docker Image. For Python dependency details, see Python Dependencies.
Development Environment Requirements
The system is designed to run entirely within Docker, but local development requires the following tools:
| Tool | Purpose | Version |
|---|---|---|
| Docker | Container runtime | Latest stable |
| Git | Version control | 2.x or later |
| Text editor/IDE | Code editing | Any (VS Code recommended) |
| Python | Local testing (optional) | 3.12+ |
| Rust toolchain | Local testing (optional) | Latest stable |
The Docker image handles all runtime dependencies, so local installation of Python and Rust is optional and only needed for testing individual components outside the container.
Sources: Dockerfile:1-33
Development Workflow Architecture
The following diagram shows the typical development cycle and how different components interact during development:
Development Workflow Diagram : Shows the cycle from editing code to building the Docker image to testing with mounted output volume.
graph TB
subgraph "Development Environment"
Editor["Code Editor"]
GitRepo["Local Git Repository"]
end
subgraph "Docker Build Process"
BuildCmd["docker build -t deepwiki-scraper ."]
Stage1["Rust Builder Stage\nCompiles mdbook binaries"]
Stage2["Python Runtime Stage\nAssembles final image"]
FinalImage["deepwiki-scraper:latest"]
end
subgraph "Testing & Validation"
RunCmd["docker run with test params"]
OutputMount["Volume mount: ./output"]
Validation["Manual inspection of output"]
end
subgraph "Key Development Files"
Dockerfile["Dockerfile"]
BuildScript["build-docs.sh"]
Scraper["tools/deepwiki-scraper.py"]
Requirements["tools/requirements.txt"]
end
Editor -->|Edit| GitRepo
GitRepo --> Dockerfile
GitRepo --> BuildScript
GitRepo --> Scraper
GitRepo --> Requirements
BuildCmd --> Stage1
Stage1 --> Stage2
Stage2 --> FinalImage
FinalImage --> RunCmd
RunCmd --> OutputMount
OutputMount --> Validation
Validation -.->|Iterate| Editor
Sources: Dockerfile:1-33 build-docs.sh:1-206
Component Development Map
This diagram bridges system concepts to actual code entities, showing which files implement which functionality:
Code Entity Mapping Diagram : Maps system functionality to specific code locations, file paths, and binaries.
graph LR
subgraph "Entry Point Layer"
CMD["CMD in Dockerfile:32"]
BuildDocs["build-docs.sh"]
end
subgraph "Configuration Layer"
EnvVars["Environment Variables\nREPO, BOOK_TITLE, etc."]
AutoDetect["Auto-detect logic\nbuild-docs.sh:8-19"]
Validation["Validation\nbuild-docs.sh:33-37"]
end
subgraph "Processing Scripts"
ScraperPy["deepwiki-scraper.py"]
MdBookBin["/usr/local/bin/mdbook"]
MermaidBin["/usr/local/bin/mdbook-mermaid"]
end
subgraph "Configuration Generation"
BookToml["book.toml generation\nbuild-docs.sh:85-103"]
SummaryMd["SUMMARY.md generation\nbuild-docs.sh:113-159"]
end
subgraph "Dependency Management"
ReqTxt["requirements.txt"]
UvInstall["uv pip install\nDockerfile:17"]
CargoInstall["cargo install\nDockerfile:5"]
end
CMD --> BuildDocs
BuildDocs --> EnvVars
EnvVars --> AutoDetect
AutoDetect --> Validation
Validation --> ScraperPy
BuildDocs --> BookToml
BuildDocs --> SummaryMd
BuildDocs --> MdBookBin
MdBookBin --> MermaidBin
ReqTxt --> UvInstall
UvInstall --> ScraperPy
CargoInstall --> MdBookBin
CargoInstall --> MermaidBin
Sources: Dockerfile:1-33 build-docs.sh:8-19 build-docs.sh:85-103 build-docs.sh:113-159
Local Development Workflow
1. Clone and Setup
The repository has a minimal structure focused on the essential build artifacts. The .gitignore:1-2 excludes the output/ directory to prevent committing generated files.
2. Make Changes
Key files for common modifications:
| Modification Type | Primary File | Related Files |
|---|---|---|
| Scraping logic | tools/deepwiki-scraper.py | - |
| Build orchestration | build-docs.sh | - |
| Python dependencies | tools/requirements.txt | Dockerfile:16-17 |
| Docker build process | Dockerfile | - |
| Output structure | build-docs.sh | Lines 179-191 |
3. Build Docker Image
After making changes, rebuild the Docker image:
The multi-stage build process Dockerfile:1-7 first compiles Rust binaries in a rust:latest builder stage, then Dockerfile:8-33 assembles the final python:3.12-slim image with copied binaries and Python dependencies.
4. Test Changes
Test with a real repository:
Setting MARKDOWN_ONLY=true build-docs.sh:61-76 bypasses the mdBook build phase, allowing faster iteration when testing scraping logic changes.
5. Validate Output
Inspect the generated files:
Sources: .gitignore:1-2 Dockerfile:1-33 build-docs.sh:61-76 build-docs.sh:179-191
Testing Strategies
Fast Iteration with Markdown-Only Mode
The MARKDOWN_ONLY environment variable enables a fast path for testing scraping changes:
This mode executes only Phase 1 (Markdown Extraction) and skips Phase 2 (Diagram Enhancement) and Phase 3 (mdBook Build). See Phase 1: Markdown Extraction for details on what this phase includes.
The conditional logic build-docs.sh:61-76 checks the MARKDOWN_ONLY variable and exits early after copying markdown files to /output/markdown/.
Testing Auto-Detection
The repository auto-detection logic build-docs.sh:8-19 attempts to extract the GitHub repository from Git remotes if REPO is not explicitly set:
The script checks git config --get remote.origin.url and extracts the owner/repo portion using sed pattern matching build-docs.sh16
Testing Configuration Generation
To test book.toml and SUMMARY.md generation without a full build:
The book.toml template build-docs.sh:85-103 uses shell variable substitution to inject environment variables into the TOML structure.
Sources: build-docs.sh:8-19 build-docs.sh:61-76 build-docs.sh:85-103
Debugging Techniques
Inspecting Intermediate Files
The build process creates temporary files in /workspace inside the container. To inspect them:
This allows inspection of:
- Scraped markdown files in
/workspace/wiki/ - Generated
book.tomlin/workspace/book/ - Generated
SUMMARY.mdin/workspace/book/src/
Adding Debug Output
Both build-docs.sh:1-206 and deepwiki-scraper.py use echo statements for progress tracking. Add additional debug output:
Testing Python Script Independently
To test the scraper without Docker:
This is useful for rapid iteration on scraping logic without rebuilding the Docker image.
Sources: build-docs.sh:1-206 tools/requirements.txt:1-4
Build Optimization Considerations
Multi-Stage Build Rationale
The Dockerfile:1-7 uses a separate Rust builder stage to:
- Compile
mdbookandmdbook-mermaidwith a full Rust toolchain - Discard the ~1.5 GB builder stage after compilation
- Copy only the compiled binaries Dockerfile:20-21 to the final image
This reduces the final image size from ~1.5 GB to ~300-400 MB while still providing both Python and Rust tools. See Docker Multi-Stage Build for architectural details.
Dependency Management with uv
The Dockerfile13 copies uv from the official Astral image and uses it Dockerfile17 to install Python dependencies with --no-cache flag:
This approach:
- Provides faster dependency resolution than pip
- Reduces layer size with
--no-cache - Installs system-wide with
--systemflag
Image Layer Ordering
The Dockerfile orders operations to maximize layer caching:
- Copy
uvbinary (rarely changes) - Install Python dependencies (changes with
requirements.txt) - Copy Rust binaries (changes when rebuilding Rust stage)
- Copy Python scripts (changes frequently during development)
This ordering means modifying deepwiki-scraper.py only invalidates the final layers Dockerfile:24-29 not the entire dependency installation.
Sources: Dockerfile:1-33
Common Development Tasks
Adding a New Environment Variable
To add a new configuration option:
-
Define default in build-docs.sh:21-30:
-
Add to configuration display build-docs.sh:47-53:
-
Use in downstream processing as needed
-
Document in Configuration Reference
Modifying SUMMARY.md Generation
The table of contents generation logic build-docs.sh:113-159 uses bash loops and file discovery:
To modify the structure:
- Adjust the file pattern matching
- Modify the section detection logic
- Update the markdown output format
- Test with repositories that have different hierarchical structures
Adding New Python Dependencies
- Add to tools/requirements.txt:1-4 with version constraint:
new-package>=1.0.0
-
Rebuild Docker image (triggers Dockerfile17)
-
Update Python Dependencies documentation
-
Import and use in
deepwiki-scraper.py
Sources: build-docs.sh:21-30 build-docs.sh:113-159 tools/requirements.txt:1-4 Dockerfile17
File Modification Guidelines
Modifying build-docs.sh
The orchestrator script uses several idioms:
| Pattern | Purpose | Example |
|---|---|---|
set -e | Exit on error | build-docs.sh2 |
"${VAR:-default}" | Default values | build-docs.sh:22-26 |
$(command) | Command substitution | build-docs.sh12 |
echo "" | Visual spacing | build-docs.sh47 |
mkdir -p | Safe directory creation | build-docs.sh64 |
Maintain these patterns for consistency. The script is designed to be readable and self-documenting with clear step labels build-docs.sh:4-6
Modifying Dockerfile
Key considerations:
- Keep stages separate Dockerfile:1-2 vs Dockerfile8
- Use
COPY --from=builderDockerfile:20-21 for cross-stage artifact copying - Set executable permissions Dockerfile:25-29 for scripts
- Use
WORKDIRDockerfile10 to establish consistent working directory - Keep
CMDDockerfile32 as the default entrypoint
Modifying Python Scripts
When editing tools/deepwiki-scraper.py:
- The script is executed via build-docs.sh58 with two arguments:
REPOand output directory - It must be Python 3.12 compatible Dockerfile8
- It has access to dependencies from tools/requirements.txt:1-4
- It should write output to the specified directory argument
- It should use
print()for progress output that appears in build logs
Sources: build-docs.sh2 build-docs.sh58 Dockerfile:1-33 tools/requirements.txt:1-4
Integration Testing
End-to-End Test
Validate the complete pipeline:
Testing Configuration Variants
Test different repository configurations:
Sources: build-docs.sh:8-19 build-docs.sh:61-76
Contributing Guidelines
When submitting changes:
- Test locally : Build and run the Docker image with multiple test repositories
- Validate output : Ensure markdown files are properly formatted and the HTML site builds correctly
- Check backwards compatibility : Existing repositories should continue to work
- Update documentation : Modify relevant wiki pages if changing behavior
- Follow existing patterns : Match the coding style in build-docs.sh:1-206
The system is designed to be "fully generic" - it should work with any DeepWiki repository without modification. Test that your changes maintain this property.
Sources: build-docs.sh:1-206
Troubleshooting Development Issues
Build Failures
| Symptom | Likely Cause | Solution |
|---|---|---|
| Rust compilation fails | Network issues, incompatible versions | Check rust:latest image availability |
| Python package install fails | Version conflicts in requirements.txt | Verify package versions are compatible |
mdbook not found | Binary copy failed | Check Dockerfile:20-21 paths |
| Permission denied on scripts | Missing chmod +x | Verify Dockerfile:25-29 |
Runtime Failures
| Symptom | Likely Cause | Solution |
|---|---|---|
| "REPO must be set" error | Auto-detection failed, no REPO env var | Check build-docs.sh:33-36 validation logic |
| Scraper crashes | DeepWiki site structure changed | Debug deepwiki-scraper.py with local testing |
| SUMMARY.md is empty | No markdown files found | Verify scraper output in /workspace/wiki/ |
| mdBook build fails | Invalid markdown syntax | Inspect markdown files for issues |
Output Validation Checklist
After a successful build, verify:
output/markdown/contains.mdfiles- Section directories exist (e.g.,
output/markdown/section-4/) output/book/index.htmlexists and opens in browser- Navigation menu appears in generated site
- Search functionality works
- Mermaid diagrams render correctly
- Links between pages work
- "Edit this file" links point to correct GitHub URLs
Sources: build-docs.sh:33-36 Dockerfile:20-21 Dockerfile:25-29