Quick Start
Relevant source files
This page provides practical instructions for running the DeepWiki-to-mdBook Converter using Docker. It covers building the image, running basic conversions, and accessing the output. For detailed configuration options, see Configuration Reference. For understanding what happens internally, see System Architecture.
Prerequisites
The following must be available on your system:
| Requirement | Purpose |
|---|---|
| Docker | Runs the containerized conversion system |
| Internet connection | Required to fetch content from DeepWiki.com |
| Disk space | ~500MB for Docker image, variable for output |
Sources: README.md:17-20
Building the Docker Image
The system is distributed as a Dockerfile that must be built before use. The build process compiles Rust tools (mdBook, mdbook-mermaid) and installs Python dependencies.
The build process uses multi-stage Docker builds and takes approximately 5-10 minutes on first run. Subsequent builds use Docker layer caching for faster completion.
Note: For detailed information about the Docker build architecture, see Docker Multi-Stage Build.
Sources: README.md:29-31 build-docs.sh:1-5
Basic Usage Pattern
The converter runs as a Docker container that takes environment variables as input and produces output files in a mounted volume.
sequenceDiagram
participant User
participant Docker
participant Container as "deepwiki-scraper\ncontainer"
participant DeepWiki as "deepwiki.com"
participant OutputVol as "/output volume"
User->>Docker: docker run --rm -e REPO=...
Docker->>Container: Start with env vars
Container->>Container: build-docs.sh orchestrates
Container->>DeepWiki: HTTP requests for wiki pages
DeepWiki-->>Container: HTML content + JS payload
Container->>Container: deepwiki-scraper.py extracts
Container->>Container: mdbook build (unless MARKDOWN_ONLY)
Container->>OutputVol: Write markdown/ and book/
Container-->>Docker: Exit (status 0)
Docker-->>User: Container removed (--rm)
User->>OutputVol: Access generated files
User Interaction Flow
Sources: README.md:24-39 build-docs.sh:1-206
Minimal Command
The absolute minimum command requires only the REPO environment variable:
This command:
- Uses
-e REPO="owner/repo"to specify which GitHub repository's wiki to extract - Mounts the current directory's
output/subdirectory to/outputin the container - Uses
--rmto automatically remove the container after completion - Generates default values for
BOOK_TITLE,BOOK_AUTHORS, andGIT_REPO_URL
Sources: README.md:34-38 build-docs.sh:22-26
Environment Variable Configuration
Sources: build-docs.sh:8-53 README.md:42-51
The following table describes each environment variable:
| Variable | Required | Default Behavior | Example |
|---|---|---|---|
REPO | Yes* | Auto-detected from Git remote if available | facebook/react |
BOOK_TITLE | No | "Documentation" | "React Internals" |
BOOK_AUTHORS | No | Extracted from REPO owner | "Meta Open Source" |
GIT_REPO_URL | No | Constructed as https://github.com/{REPO} | Custom fork URL |
MARKDOWN_ONLY | No | "false" (build full HTML) | "true" for debugging |
REPOis required unless running from a Git repository with a GitHub remote, in which case it is auto-detected via build-docs.sh:8-19
Sources: README.md:42-51 build-docs.sh:8-53
Common Usage Patterns
Pattern 1: Complete Documentation Build
Generate both Markdown source and HTML documentation:
Produces:
/output/markdown/- Source Markdown files with diagrams/output/book/- Complete HTML site with search and navigation/output/book.toml- mdBook configuration
Use when: You want a deployable documentation website.
Sources: README.md:74-87 build-docs.sh:178-192
Pattern 2: Markdown-Only Mode (Fast Iteration)
Extract only Markdown files, skipping the HTML build phase:
Produces:
/output/markdown/- Source Markdown files only
Use when:
- Debugging diagram placement
- Testing content extraction
- You only need Markdown files
- Faster iteration cycles (~3-5x faster than full build)
Skips: Phases 3 (mdBook build) as controlled by build-docs.sh:61-76
Sources: README.md:55-72 build-docs.sh:61-76
Pattern 3: Custom Output Directory
Mount a different output location:
This writes output to /home/user/docs/rust instead of ./output.
Sources: README.md:200-207
Pattern 4: Minimal Configuration with Auto-Detection
If running from a Git repository directory:
The system extracts the repository from git config --get remote.origin.url via build-docs.sh:8-19 This only works when running the Docker command from within a Git repository with a GitHub remote configured.
Sources: build-docs.sh:8-19 README.md53
Output Structure
Complete Build Output
When MARKDOWN_ONLY=false (default), the output structure is:
output/
├── markdown/ # Source Markdown files
│ ├── 1-overview.md
│ ├── 2-quick-start.md
│ ├── section-3/
│ │ ├── 3-1-subsection.md
│ │ └── 3-2-subsection.md
│ └── ...
├── book/ # Generated HTML documentation
│ ├── index.html
│ ├── 1-overview.html
│ ├── searchindex.js
│ ├── mermaid/ # Diagram rendering assets
│ └── ...
└── book.toml # mdBook configuration
Sources: README.md:89-120 build-docs.sh:178-192
File Naming Convention
Files follow the pattern {number}-{title}.md where:
{number}is the hierarchical page number (e.g.,1,2-1,3-2){title}is a URL-safe version of the page title
Subsection files are organized in section-{N}/ subdirectories, where {N} is the parent section number.
Examples from README.md:115-119:
1-overview.md- Top-level page 12-1-workspace-and-crates.md- Subsection 1 of section 2section-4/4-1-logical-planning.md- Subsection 1 of section 4, stored in subdirectory
Sources: README.md:115-119
Viewing the Output
Serving HTML Documentation Locally
After a complete build, serve the HTML site using Python's built-in HTTP server:
Then open http://localhost:8000 in your browser.
The generated site includes:
- Full-text search via
searchindex.js - Responsive navigation sidebar with page hierarchy
- Rendered Mermaid diagrams
- "Edit this page" links to the GitHub repository
- Dark/light theme toggle
Sources: README.md:83-86 build-docs.sh:203-204
Accessing Markdown Files
Markdown files can be read directly or used with other tools:
Sources: README.md:100-113
Execution Flow
Sources: build-docs.sh:1-206 README.md:121-145
Quick Troubleshooting
"REPO must be set"
Error message: ERROR: REPO must be set or run from within a Git repository
Cause: The REPO environment variable was not provided and could not be auto-detected.
Solution:
Sources: build-docs.sh:32-37
"No wiki pages found"
Cause: The repository may not be indexed by DeepWiki.
Solution: Verify the wiki exists by visiting https://deepwiki.com/owner/repo in a browser. Not all GitHub repositories have DeepWiki documentation.
Sources: README.md:160-161
Connection Timeouts
Cause: Network issues or DeepWiki service unavailable.
Solution: The scraper includes automatic retries (3 attempts per page). Wait and retry the command. Check your internet connection.
Sources: README.md:171-172
mdBook Build Fails
Error symptoms: Build completes Phase 1 and 2 but fails during Phase 3.
Solutions:
-
Ensure Docker has sufficient memory (2GB+ recommended)
-
Try
MARKDOWN_ONLY=trueto verify extraction works independently: -
Check Docker logs for Rust compilation errors
Sources: README.md:174-177
Diagrams Not Appearing
Cause: Fuzzy matching may not find appropriate placement context for some diagrams.
Debugging approach:
Not all diagrams can be matched—typically ~48 out of ~461 diagrams have sufficient context for accurate placement.
Sources: README.md:166-169 README.md:132-135
Next Steps
After successfully generating documentation:
- Review Configuration Reference for advanced configuration options
- Explore System Architecture to understand the three-phase processing model
- See Output Structure for detailed information about generated files
- Read Markdown-Only Mode for debugging and iteration workflows
Sources: README.md:1-233