Building the Docker Image
Relevant source files
This page provides instructions for building the Docker image locally from source. It covers the build process, multi-stage build architecture, verification steps, and troubleshooting common build issues.
For information about the architectural rationale behind the multi-stage build strategy, see Docker Multi-Stage Build. For information about running the pre-built image, see Quick Start.
Overview
The DeepWiki-to-mdBook converter is packaged as a Docker image that combines Python runtime components with Rust-compiled binaries. Building the image locally requires Docker and typically takes 5-15 minutes depending on network speed and CPU performance. The build process compiles two Rust applications (mdbook and mdbook-mermaid) from source, then creates a minimal Python-based runtime image with these compiled binaries.
Basic Build Command
To build the Docker image from the repository root:
This command reads the Dockerfile at the repository root and produces a tagged image named deepwiki-scraper. The build process automatically executes both stages defined in the Dockerfile.
Sources: Dockerfile:1-33
Build Process Architecture
The following diagram shows the complete build workflow, mapping from natural language concepts to the actual Docker commands and files involved:
Sources: Dockerfile:1-33
graph TD
User[Developer] -->|docker build -t deepwiki-scraper .| DockerCLI["Docker CLI"]
DockerCLI -->|Reads| Dockerfile["Dockerfile\n(repository root)"]
Dockerfile -->|Stage 1: FROM rust:latest AS builder| Stage1["Stage 1: Rust Builder\nImage: rust:latest"]
Dockerfile -->|Stage 2: FROM python:3.12-slim| Stage2["Stage 2: Final Assembly\nImage: python:3.12-slim"]
Stage1 -->|RUN cargo install mdbook| CargoBuildMdBook["cargo install mdbook\n→ /usr/local/cargo/bin/mdbook"]
Stage1 -->|RUN cargo install mdbook-mermaid| CargoBuildMermaid["cargo install mdbook-mermaid\n→ /usr/local/cargo/bin/mdbook-mermaid"]
Stage2 -->|COPY --from=ghcr.io/astral-sh/uv:latest| UVCopy["Copy /uv and /uvx\n→ /bin/"]
Stage2 -->|COPY tools/requirements.txt| ReqCopy["Copy requirements.txt\n→ /tmp/requirements.txt"]
Stage2 -->|RUN uv pip install --system| PythonDeps["Install Python packages:\nrequests, beautifulsoup4, html2text"]
CargoBuildMdBook -->|COPY --from=builder| BinaryCopy1["Copy to /usr/local/bin/mdbook"]
CargoBuildMermaid -->|COPY --from=builder| BinaryCopy2["Copy to /usr/local/bin/mdbook-mermaid"]
Stage2 -->|COPY tools/deepwiki-scraper.py| ScraperCopy["Copy to /usr/local/bin/deepwiki-scraper.py"]
Stage2 -->|COPY build-docs.sh| BuildScriptCopy["Copy to /usr/local/bin/build-docs.sh"]
Stage2 -->|RUN chmod +x| MakeExecutable["Set execute permissions"]
BinaryCopy1 --> FinalImage["Final Image:\ndeepwiki-scraper"]
BinaryCopy2 --> FinalImage
PythonDeps --> FinalImage
ScraperCopy --> FinalImage
BuildScriptCopy --> FinalImage
MakeExecutable --> FinalImage
FinalImage -->|CMD| DefaultEntrypoint["/usr/local/bin/build-docs.sh"]
Stage-by-Stage Build Details
Stage 1: Rust Builder
Stage 1 uses the rust:latest base image (approximately 1.5 GB) to compile the Rust applications. This stage is ephemeral and discarded after binary extraction.
graph LR
subgraph "Stage 1 Build Context"
BaseImage["rust:latest\n~1.5 GB"]
CargoEnv["Cargo toolchain\nPre-installed"]
BaseImage --> CargoEnv
CargoEnv -->|cargo install mdbook| BuildMdBook["Compile mdbook\nfrom crates.io"]
CargoEnv -->|cargo install mdbook-mermaid| BuildMermaid["Compile mdbook-mermaid\nfrom crates.io"]
BuildMdBook --> Binary1["/usr/local/cargo/bin/mdbook\n(~20-30 MB)"]
BuildMermaid --> Binary2["/usr/local/cargo/bin/mdbook-mermaid\n(~10-20 MB)"]
end
subgraph "Extracted Artifacts"
Binary1 -.->|Copied to Stage 2| FinalBin1["/usr/local/bin/mdbook"]
Binary2 -.->|Copied to Stage 2| FinalBin2["/usr/local/bin/mdbook-mermaid"]
end
The cargo install commands download source code from crates.io, compile with optimization flags, and place the resulting binaries in /usr/local/cargo/bin/. This compilation typically takes 3-8 minutes depending on CPU performance.
Key Dockerfile directives:
- Line 2:
FROM rust:latest AS builder- Establishes the builder stage - Line 5:
RUN cargo install mdbook mdbook-mermaid- Compiles both tools in a single command
Sources: Dockerfile:1-5
Stage 2: Final Image Assembly
Stage 2 creates the production image using python:3.12-slim (approximately 150 MB) as the base and layers in all necessary runtime components:
| Layer | Purpose | Size Impact | Dockerfile Lines |
|---|---|---|---|
| Base image | Python 3.12 runtime | ~150 MB | Line 8 |
| uv package manager | Fast Python dependency installation | ~10 MB | Line 13 |
| Python dependencies | requests, beautifulsoup4, html2text | ~20 MB | Lines 16-17 |
| Rust binaries | mdbook and mdbook-mermaid executables | ~30-50 MB | Lines 20-21 |
| Python scripts | deepwiki-scraper.py | ~10 KB | Lines 24-25 |
| Shell scripts | build-docs.sh orchestrator | ~5 KB | Lines 28-29 |
| Total | Final image size | ~300-400 MB | - |
Key Dockerfile directives:
- Line 8:
FROM python:3.12-slim- Establishes the final stage base - Line 13:
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/- Imports uv from external image - Lines 20-21:
COPY --from=builder- Extracts Rust binaries from Stage 1 - Line 32:
CMD ["/usr/local/bin/build-docs.sh"]- Sets default entrypoint
Sources: Dockerfile:8-33
Python Dependency Installation
The image uses uv instead of pip for faster and more reliable dependency installation. The dependencies are defined in tools/requirements.txt:
requests>=2.31.0
beautifulsoup4>=4.12.0
html2text>=2020.1.16
The installation command uses these flags:
--system: Installs packages system-wide (not in a virtual environment)--no-cache: Avoids caching to reduce image size
Sources: Dockerfile:13-17 tools/requirements.txt:1-4
Build Verification
After building the image, verify that all components are correctly installed:
Expected outputs:
whichcommands should return/usr/local/bin/<binary-name>- Python import test should print
Dependencies OK - Script permissions should show
-rwxr-xr-x(executable)
Sources: Dockerfile:20-29
graph TB
subgraph "Repository Files"
RepoRoot["Repository Root"]
Dockerfile_Src["Dockerfile"]
BuildScript["build-docs.sh"]
ToolsDir["tools/"]
Scraper["tools/deepwiki-scraper.py"]
Reqs["tools/requirements.txt"]
RepoRoot --> Dockerfile_Src
RepoRoot --> BuildScript
RepoRoot --> ToolsDir
ToolsDir --> Scraper
ToolsDir --> Reqs
end
subgraph "Stage 1 Build Products"
CargoOutput["/usr/local/cargo/bin/"]
MdBookBin["mdbook binary"]
MermaidBin["mdbook-mermaid binary"]
CargoOutput --> MdBookBin
CargoOutput --> MermaidBin
end
subgraph "Final Image Filesystem"
UsrBin["/usr/local/bin/"]
BinDir["/bin/"]
TmpDir["/tmp/"]
MdBookFinal["/usr/local/bin/mdbook"]
MermaidFinal["/usr/local/bin/mdbook-mermaid"]
BuildFinal["/usr/local/bin/build-docs.sh"]
ScraperFinal["/usr/local/bin/deepwiki-scraper.py"]
UVFinal["/bin/uv"]
UVXFinal["/bin/uvx"]
ReqsFinal["/tmp/requirements.txt"]
UsrBin --> MdBookFinal
UsrBin --> MermaidFinal
UsrBin --> BuildFinal
UsrBin --> ScraperFinal
BinDir --> UVFinal
BinDir --> UVXFinal
TmpDir --> ReqsFinal
end
MdBookBin -.->|COPY --from=builder| MdBookFinal
MermaidBin -.->|COPY --from=builder| MermaidFinal
BuildScript -.->|COPY| BuildFinal
Scraper -.->|COPY| ScraperFinal
Reqs -.->|COPY| ReqsFinal
File and Binary Locations in Final Image
The following diagram maps the repository structure to the final image filesystem layout:
Sources: Dockerfile:13-28
Common Build Issues and Solutions
Issue: Cargo Installation Timeout
Symptom: Build fails during Stage 1 with network timeout errors:
error: failed to download `mdbook`
Solution: Increase Docker build timeout or retry the build. The crates.io registry occasionally experiences high load.
Issue: Out of Disk Space
Symptom: Build fails with "no space left on device" error.
Solution: The Rust builder stage requires approximately 2-3 GB of temporary space. Clean up Docker resources:
Issue: Platform Mismatch
Symptom: Built image doesn't run on target platform (e.g., building on ARM Mac but running on x86_64 Linux).
Solution: Specify the target platform explicitly:
Note: Cross-platform builds require QEMU emulation and will be significantly slower.
Issue: Python Dependency Installation Fails
Symptom: Stage 2 fails during uv pip install:
error: Failed to download distribution
Solution: Check network connectivity and retry. If issues persist, build without cache:
Sources: Dockerfile:16-17
Build Customization Options
Building with Different Python Version
To use a different Python version, modify line 8 of the Dockerfile:
Then rebuild:
Building with Specific mdBook Versions
To pin specific versions of the Rust tools, modify line 5 of the Dockerfile:
Reducing Build Time for Development
During development, you can cache the Rust builder stage by building it separately:
Sources: Dockerfile:2-8
Image Size Analysis
The following table breaks down the final image size by component:
| Component | Approximate Size | Optimization Notes |
|---|---|---|
| python:3.12-slim base | 150 MB | Minimal Python distribution |
| System libraries (libc, etc.) | 20 MB | Required by Python and binaries |
| Python packages | 15-20 MB | requests, beautifulsoup4, html2text |
| uv package manager | 8-10 MB | Faster than pip |
| mdbook binary | 20-30 MB | Statically linked Rust binary |
| mdbook-mermaid binary | 10-20 MB | Statically linked Rust binary |
| Python scripts | 50-100 KB | deepwiki-scraper.py |
| Shell scripts | 5-10 KB | build-docs.sh |
| Total | ~300-400 MB | Multi-stage build discards ~1.5 GB |
The multi-stage build reduces the image size by approximately 75% compared to a single-stage build that would include the entire Rust toolchain.
Sources: Dockerfile:2-8
Building for Production
For production deployments, consider these additional steps:
-
Tag with version numbers:
-
Scan for vulnerabilities:
-
Push to registry:
-
Generate SBOM (Software Bill of Materials):
Sources: Dockerfile:1-33