Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Docker Multi-Stage Build

Loading…

Docker Multi-Stage Build

Relevant source files

Purpose and Scope

This document explains the Docker multi-stage build strategy used to create the deepwiki-scraper container image. It details how the system combines a Rust toolchain for compiling documentation tools with a Python runtime for web scraping, while optimizing the final image size.

For information about how the container orchestrates the build process, see build-docs.sh Orchestrator. For details on the Python scraper implementation, see deepwiki-scraper.py.

Multi-Stage Build Strategy

The Dockerfile implements a two-stage build pattern that separates compilation from runtime. Stage 1 uses a full Rust development environment to compile mdBook binaries from source. Stage 2 creates a minimal Python runtime and extracts only the compiled binaries, discarding the build toolchain.

Build Stages Flow

graph TD
    subgraph Stage1["Stage 1: Rust Builder (rust:latest)"]
RustBase["rust:latest base image\n~1.5 GB with toolchain"]
CargoInstall["cargo install mdbook\ncargo install mdbook-mermaid"]
Binaries["/usr/local/cargo/bin/\nmdbook\nmdbook-mermaid"]
RustBase --> CargoInstall
 
       CargoInstall --> Binaries
    end
    
    subgraph Stage2["Stage 2: Python Runtime (python:3.12-slim)"]
PyBase["python:3.12-slim base\n~150 MB"]
UVInstall["Copy uv from\nghcr.io/astral-sh/uv:latest"]
PipInstall["uv pip install --system\nrequirements.txt"]
CopyBinaries["COPY --from=builder\n/usr/local/cargo/bin/"]
CopyScripts["COPY tools/ and\nbuild-docs.sh"]
PyBase --> UVInstall
 
       UVInstall --> PipInstall
 
       PipInstall --> CopyBinaries
 
       CopyBinaries --> CopyScripts
    end
    
 
   Binaries -.->|Extract only binaries| CopyBinaries
    
 
   CopyScripts --> FinalImage["Final Image\n~300-400 MB"]
Stage1 -.->|Discarded after build| Discard["Discard"]
style RustBase fill:#f5f5f5
    style PyBase fill:#f5f5f5
    style FinalImage fill:#e8e8e8
    style Discard fill:#fff,stroke-dasharray: 5 5

Sources: Dockerfile:1-33

Stage 1: Rust Builder

Stage 1 uses rust:latest as the base image, providing the complete Rust toolchain including cargo, the Rust package manager and build tool.

Rust Builder Configuration

AspectDetails
Base Imagerust:latest
Size~1.5 GB (includes rustc, cargo, stdlib)
Build Commandscargo install mdbook, cargo install mdbook-mermaid
Output Location/usr/local/cargo/bin/
Stage Identifierbuilder

The cargo install commands fetch mdBook and mdbook-mermaid source from crates.io, compile them from source, and install the resulting binaries to /usr/local/cargo/bin/.

flowchart LR
    subgraph BuilderStage["builder stage"]
CratesIO["crates.io\n(package registry)"]
CargoFetch["cargo fetch\n(download sources)"]
CargoCompile["cargo build --release\n(compile to binary)"]
CargoInstallBin["Install to\n/usr/local/cargo/bin/"]
CratesIO --> CargoFetch
 
       CargoFetch --> CargoCompile
 
       CargoCompile --> CargoInstallBin
    end
    
 
   CargoInstallBin --> MdBookBin["mdbook binary"]
CargoInstallBin --> MermaidBin["mdbook-mermaid binary"]
MdBookBin -.->|Copied to Stage 2| NextStage["NextStage"]
MermaidBin -.->|Copied to Stage 2| NextStage

Sources: Dockerfile:1-5

Stage 2: Python Runtime Assembly

Stage 2 builds the final runtime image starting from python:3.12-slim, a minimal Python base image that omits development headers and unnecessary packages.

Python Runtime Components

graph TB
    subgraph PythonBase["python:3.12-slim"]
PyInterpreter["Python 3.12 interpreter"]
PyStdlib["Python standard library"]
BaseUtils["Essential utilities\n(bash, sh, coreutils)"]
end
    
    subgraph InstalledTools["Installed via COPY"]
UV["uv package manager\n/bin/uv, /bin/uvx"]
PyDeps["Python packages\n(requests, beautifulsoup4, html2text)"]
RustBins["Rust binaries\n(mdbook, mdbook-mermaid)"]
Scripts["Application scripts\n(deepwiki-scraper.py, build-docs.sh)"]
end
    
 
   PythonBase --> UV
 
   UV --> PyDeps
 
   PythonBase --> RustBins
 
   PythonBase --> Scripts
    
 
   PyDeps --> Runtime["Runtime Environment"]
RustBins --> Runtime
 
   Scripts --> Runtime

The installation sequence follows a specific order:

  1. Copy uv Dockerfile13 - Multi-stage copy from ghcr.io/astral-sh/uv:latest
  2. Install Python dependencies Dockerfile:16-17 - Uses uv pip install --system --no-cache
  3. Copy Rust binaries Dockerfile:20-21 - Extracts from builder stage
  4. Copy application scripts Dockerfile:24-29 - Adds Python scraper and orchestrator

Sources: Dockerfile:8-29

Binary Extraction and Integration

The critical optimization occurs at Dockerfile:20-21 where the COPY --from=builder directive extracts only the compiled binaries without any build dependencies.

Binary Extraction Pattern

Source (Stage 1)Destination (Stage 2)Purpose
/usr/local/cargo/bin/mdbook/usr/local/bin/mdbookDocumentation builder executable
/usr/local/cargo/bin/mdbook-mermaid/usr/local/bin/mdbook-mermaidMermaid preprocessor executable
flowchart LR
    subgraph BuilderFS["Builder Filesystem"]
CargoDir["/usr/local/cargo/bin/"]
MdBookSrc["mdbook\n(compiled binary)"]
MermaidSrc["mdbook-mermaid\n(compiled binary)"]
CargoDir --> MdBookSrc
 
       CargoDir --> MermaidSrc
    end
    
    subgraph RuntimeFS["Runtime Filesystem"]
BinDir["/usr/local/bin/"]
MdBookDst["mdbook\n(extracted)"]
MermaidDst["mdbook-mermaid\n(extracted)"]
BinDir --> MdBookDst
 
       BinDir --> MermaidDst
    end
    
 
   MdBookSrc -.->|COPY --from=builder| MdBookDst
 
   MermaidSrc -.->|COPY --from=builder| MermaidDst
    
    subgraph Discarded["Discarded (not copied)"]
RustToolchain["rustc compiler"]
CargoTool["cargo build tool"]
SourceFiles["mdBook source files"]
BuildCache["cargo build cache"]
end

Both binaries are statically linked or contain all necessary Rust runtime dependencies, allowing them to execute in the Python base image without the Rust toolchain.

Sources: Dockerfile:19-21

Python Dependency Installation

Python dependencies are installed using uv, a fast Python package installer written in Rust. The dependencies are defined in tools/requirements.txt:1-4

Python Dependencies

PackageVersionPurpose
requests≥2.31.0HTTP client for scraping DeepWiki
beautifulsoup4≥4.12.0HTML parsing and navigation
html2text≥2020.1.16HTML to Markdown conversion

The installation command Dockerfile17 uses these flags:

  • --system: Install to system Python (not virtualenv)
  • --no-cache: Don’t cache downloaded packages (reduces image size)
  • -r /tmp/requirements.txt: Read dependencies from file

Sources: Dockerfile:16-17 tools/requirements.txt:1-4

graph LR
    subgraph Approach1["Single-Stage Approach (Hypothetical)"]
Single["rust:latest + Python\n~2+ GB"]
end
    
    subgraph Approach2["Multi-Stage Approach (Actual)"]
Builder["Stage 1: rust:latest\n~1.5 GB\n(discarded)"]
Runtime["Stage 2: python:3.12-slim\n+ binaries + dependencies\n~300-400 MB"]
Builder -.->|Extract binaries only| Runtime
    end
    
 
   Single -->|Contains unnecessary build toolchain| Waste["Wasted Space"]
Runtime -->|Contains only runtime essentials| Efficient["Efficient"]
style Single fill:#f5f5f5
    style Builder fill:#f5f5f5
    style Runtime fill:#e8e8e8
    style Waste fill:#fff,stroke-dasharray: 5 5
    style Efficient fill:#fff,stroke-dasharray: 5 5

Image Size Optimization

The multi-stage strategy achieves significant size reduction by discarding the build environment.

Size Comparison

Size Breakdown of Final Image

ComponentApproximate Size
Python 3.12 slim base~150 MB
Python packages (requests, BeautifulSoup4, html2text)~20 MB
mdBook binary~8 MB
mdbook-mermaid binary~6 MB
uv package manager~10 MB
Application scripts<1 MB
Total~300-400 MB

Sources: Dockerfile:1-33 README.md156

graph TB
    subgraph Filesystem["/usr/local/bin/"]
BuildScript["build-docs.sh\n(orchestrator)"]
ScraperScript["deepwiki-scraper.py\n(Python scraper)"]
MdBookBin["mdbook\n(Rust binary)"]
MermaidBin["mdbook-mermaid\n(Rust binary)"]
UVBin["uv\n(Python installer)"]
end
    
    subgraph SystemPython["/usr/local/lib/python3.12/"]
Requests["requests package"]
BS4["beautifulsoup4 package"]
Html2Text["html2text package"]
end
    
    subgraph Execution["Execution Flow"]
Docker["docker run\n(CMD)"]
Docker --> BuildScript
 
       BuildScript -->|python| ScraperScript
 
       BuildScript -->|subprocess| MdBookBin
 
       MdBookBin -->|preprocessor| MermaidBin
 
       ScraperScript --> Requests
 
       ScraperScript --> BS4
 
       ScraperScript --> Html2Text
    end

Runtime Environment Structure

The final image contains a hybrid Python-Rust runtime where Python scripts can execute Rust binaries as subprocesses.

Runtime Component Locations

The entrypoint Dockerfile32 executes /usr/local/bin/build-docs.sh, which orchestrates calls to both Python and Rust components. The script can execute:

  • python /usr/local/bin/deepwiki-scraper.py for web scraping
  • mdbook init for initialization
  • mdbook build for HTML generation
  • mdbook-mermaid install for asset installation

Sources: Dockerfile:28-32 build-docs.sh

Container Execution Model

When the container runs, Docker executes the CMD Dockerfile32 which invokes build-docs.sh. This shell script has access to all binaries in /usr/local/bin/ (automatically on $PATH).

Process Tree During Execution

graph TD
    Docker["docker run\n(container init)"]
Docker --> CMD["CMD: build-docs.sh"]
CMD --> Phase1["Phase 1:\npython deepwiki-scraper.py"]
CMD --> Phase2["Phase 2: mdbook init"]
CMD --> Phase3["Phase 3: mdbook-mermaid install"]
CMD --> Phase4["Phase 4: mdbook build"]
Phase1 --> PyProc["Python 3.12 process"]
PyProc --> ReqLib["requests.get()"]
PyProc --> BS4Lib["BeautifulSoup()"]
PyProc --> H2TLib["html2text.HTML2Text()"]
Phase2 --> MdBookProc1["mdbook binary process"]
Phase3 --> MermaidProc["mdbook-mermaid binary process"]
Phase4 --> MdBookProc2["mdbook binary process"]
MdBookProc2 --> MermaidPreproc["mdbook-mermaid\n(as preprocessor)"]

Sources: Dockerfile32 README.md:122-145

Dismiss

Refresh this wiki

Enter email to refresh