Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Project Structure

Loading…

Project Structure

Relevant source files

This document describes the repository’s file organization, detailing the purpose of each file and directory in the codebase. Understanding this structure is essential for developers who want to modify or extend the system.

For information about running tests, see page 13.2. For details about the Python dependencies, see page 13.3.

Repository Layout

The repository follows a clean, organized structure that separates Python code, shell scripts, and HTML templates into dedicated directories.

graph TB
    Root["Repository Root"]
Root --> GitIgnore[".gitignore"]
Root --> Dockerfile["Dockerfile"]
Root --> README["README.md"]
Root --> PythonDir["python/"]
Root --> ScriptsDir["scripts/"]
Root --> TemplatesDir["templates/"]
Root --> GithubDir[".github/"]
Root --> OutputDir["output/"]
PythonDir --> Scraper["deepwiki-scraper.py"]
PythonDir --> ProcessTemplate["process-template.py"]
PythonDir --> Requirements["requirements.txt"]
PythonDir --> TestsDir["tests/"]
ScriptsDir --> BuildScript["build-docs.sh"]
ScriptsDir --> RunTests["run-tests.sh"]
TemplatesDir --> Header["header.html"]
TemplatesDir --> Footer["footer.html"]
TemplatesDir --> TemplateREADME["README.md"]
GithubDir --> Workflows["workflows/"]
OutputDir --> MarkdownOut["markdown/"]
OutputDir --> RawMarkdownOut["raw_markdown/"]
OutputDir --> BookOut["book/"]
OutputDir --> ConfigOut["book.toml"]
style Root fill:#f9f9f9,stroke:#333
    style PythonDir fill:#e8f5e9,stroke:#388e3c
    style ScriptsDir fill:#fff4e1,stroke:#f57c00
    style TemplatesDir fill:#e1f5ff,stroke:#0288d1
    style OutputDir fill:#ffe0b2,stroke:#e64a19

Physical File Hierarchy

Sources: README.md:84-88 .gitignore:1-7

Root Directory Files

The repository root contains the primary configuration and documentation files that define the system’s build behavior.

FileTypePurpose
.gitignoreConfigExcludes generated output and temporary files
DockerfileBuildMulti-stage Docker build specification
README.mdDocsQuick start guide and configuration reference

.gitignore

Excludes build artifacts and temporary files from version control:

  • output/ - Generated documentation artifacts
  • *.pyc and __pycache__/ - Python bytecode
  • .env - Local environment variables
  • .DS_Store - macOS metadata
  • tmp/ - Temporary working directory

Sources: .gitignore:1-7

Dockerfile

Implements a two-stage build pattern to optimize image size. The builder stage compiles Rust binaries (mdbook, mdbook-mermaid), and the final stage creates a Python runtime with only the necessary executables.

Sources: README.md78

README.md

Primary documentation file containing quick start instructions, configuration reference, and high-level system overview. Serves as the entry point for new users.

Sources: README.md:1-95

graph TB
    PythonDir["python/"]
PythonDir --> Scraper["deepwiki-scraper.py"]
PythonDir --> ProcessTemplate["process-template.py"]
PythonDir --> Requirements["requirements.txt"]
PythonDir --> TestsDir["tests/"]
TestsDir --> TemplateTest["test_template_processing.py"]
TestsDir --> MermaidTest["test_mermaid_normalization.py"]
TestsDir --> NumberingTest["test_page_numbering.py"]
Scraper --> ExtractWikiStructure["extract_wiki_structure()"]
Scraper --> ExtractPageContent["extract_page_content()"]
Scraper --> ExtractMermaid["extract_mermaid_from_nextjs_data()"]
Scraper --> NormalizeDiagram["normalize_mermaid_diagram()"]
Scraper --> ExtractAndEnhance["extract_and_enhance_diagrams()"]
ProcessTemplate --> ProcessFile["process_template_file()"]
ProcessTemplate --> SubstituteVars["substitute_variables()"]

Python Directory

The python/ directory contains all Python scripts, their dependencies, and test suites.

Python Directory Structure

Sources: README.md85

deepwiki-scraper.py

Core Python module for content extraction and diagram processing. Implements the Phase 1 (markdown extraction) and Phase 2 (diagram enhancement) logic of the pipeline.

Key Functions:

FunctionPurpose
sanitize_filename()Convert page titles to filesystem-safe names
fetch_page()HTTP client with retry logic and error handling
discover_subsections()Recursively probe for nested wiki pages
extract_wiki_structure()Build hierarchical page structure from DeepWiki
clean_deepwiki_footer()Remove DeepWiki UI elements from markdown
convert_html_to_markdown()HTML→Markdown conversion via html2text
extract_mermaid_from_nextjs_data()Extract diagrams from Next.js JavaScript payload
normalize_mermaid_diagram()Seven-step normalization for Mermaid 11 compatibility
extract_page_content()Main content extraction and markdown generation
extract_and_enhance_diagrams()Fuzzy matching and diagram injection
main()Entry point with temporary directory management

The scraper uses a temporary directory pattern to ensure atomic operations. Files are written to tempfile.TemporaryDirectory(), enhanced in-place, then moved to the final output location.

Sources: README.md85

process-template.py

Template processing script that performs variable substitution in header and footer HTML files. Supports conditional rendering and automatic variable detection.

Key Functions:

FunctionPurpose
process_template_file()Main template processing entry point
substitute_variables()Replace {{VARIABLE}} placeholders with values

Template variables include: {{REPO}}, {{BOOK_TITLE}}, {{BOOK_AUTHORS}}, {{GIT_REPO_URL}}, {{DEEPWIKI_URL}}, {{GENERATION_DATE}}.

Sources: README.md51

requirements.txt

Python dependencies for the scraper and template processor:

  • requests>=2.31.0 - HTTP client for fetching wiki pages
  • beautifulsoup4>=4.12.0 - HTML parsing library
  • html2text>=2020.1.16 - HTML-to-Markdown converter

Installed via uv pip install during Docker build for faster, more reliable installation.

Sources: README.md85

tests/

Test suite for Python components. Contains unit tests for template processing, Mermaid normalization, and page numbering logic. See page 13.2 for details on running tests.

Sources: README.md82

Scripts Directory

The scripts/ directory contains shell scripts for orchestration and testing.

Scripts Directory Structure

Sources: README.md82 README.md86

build-docs.sh

Main orchestration script that coordinates the three-phase pipeline. Invoked as the Docker container’s entry point.

Execution Flow:

  1. Auto-detection - Detect REPO from git remote if not provided
  2. Configuration - Parse environment variables and set defaults
  3. Phase 1 - Execute deepwiki-scraper.py to extract markdown
  4. Phase 2 - Process templates and generate book.toml, SUMMARY.md
  5. Phase 3 - Run mdbook build to generate HTML (unless MARKDOWN_ONLY=true)
  6. Cleanup - Copy outputs to /output volume

Environment Variables:

  • REPO - GitHub repository (owner/repo format)
  • BOOK_TITLE - Documentation title
  • BOOK_AUTHORS - Author metadata
  • GIT_REPO_URL - Repository URL for edit links
  • DEEPWIKI_URL - DeepWiki page URL
  • MARKDOWN_ONLY - Skip HTML build for debugging

Critical Paths:

  • WORK_DIR=/workspace - Working directory
  • WIKI_DIR=/workspace/wiki - Temporary markdown location
  • OUTPUT_DIR=/output - Volume mount for outputs
  • BOOK_DIR=/workspace/book - mdBook source directory

Sources: README.md:34-37 README.md86

run-tests.sh

Test execution script that runs pytest on the Python test suite. Provides colored output and detailed test results.

Sources: README.md82

graph TB
    TemplatesDir["templates/"]
TemplatesDir --> Header["header.html"]
TemplatesDir --> Footer["footer.html"]
TemplatesDir --> TemplateREADME["README.md"]
Header --> Variables["Template variables:\n{{REPO}}\n{{BOOK_TITLE}}\n{{GIT_REPO_URL}}\n{{DEEPWIKI_URL}}\n{{GENERATION_DATE}}"]
Footer --> Variables

Templates Directory

The templates/ directory contains HTML template files for header and footer customization.

Templates Directory Structure

Sources: README.md87

header.html

HTML template injected at the beginning of each markdown file. Supports variable substitution for dynamic content like repository links and generation timestamps.

Sources: README.md:40-51

footer.html

HTML template injected at the end of each markdown file. Supports the same variable substitution as header.html.

Sources: README.md:40-51

README.md

Documentation for the template system, including variable reference and customization examples.

Sources: README.md51

graph TB
    Output["output/"]
Output --> Markdown["markdown/"]
Output --> RawMarkdown["raw_markdown/"]
Output --> Book["book/"]
Output --> Config["book.toml"]
Markdown --> MainPages["Main pages:\n1-overview.md\n2-quick-start.md"]
Markdown --> Sections["Subsection dirs:\nsection-2/\nsection-3/"]
Sections --> SubPages["Subsection pages:\n2-1-docker.md\n3-1-environment.md"]
RawMarkdown --> RawPages["Pre-enhanced\nmarkdown files\n(for debugging)"]
Book --> Index["index.html"]
Book --> CSS["css/"]
Book --> JS["mermaid.min.js"]
Book --> Search["searchindex.js"]

Output Directory (Generated)

The output/ directory is created at runtime and excluded from version control. It contains all generated artifacts produced by the build pipeline.

Output Structure

Sources: README.md:54-59

markdown/

Contains enhanced markdown source files with injected diagrams and processed templates. Files are organized hierarchically with subsections in section-N/ subdirectories.

Main Pages:

  • Format: {number}-{slug}.md (e.g., 1-overview.md)
  • Location: output/markdown/

Subsection Pages:

  • Format: section-{main}/{number}-{slug}.md
  • Location: output/markdown/section-{N}/
  • Example: section-3/3-2-environment-variables.md

Sources: README.md56

raw_markdown/

Pre-enhancement markdown files for debugging purposes. Contains the output of Phase 1 before diagram injection and template processing. Useful for troubleshooting diagram matching issues.

Sources: README.md57

book/

Complete HTML documentation site generated by mdBook. Self-contained static website with:

  • Navigation sidebar generated from SUMMARY.md
  • Full-text search via searchindex.js
  • Rendered Mermaid diagrams via mdbook-mermaid
  • Edit-on-GitHub links from GIT_REPO_URL
  • Responsive Rust theme

The entire directory can be served by any static file server or deployed to GitHub Pages.

Sources: README.md55

book.toml

mdBook configuration file with repository-specific metadata. Dynamically generated during Phase 2 of the build pipeline. Contains book title, authors, theme settings, and preprocessor configuration.

Sources: README.md58

graph TB
    BuildContext["Docker Build Context"]
BuildContext --> Included["Included in Image"]
BuildContext --> Excluded["Excluded"]
Included --> DockerfileBuild["Dockerfile\n(Build instructions)"]
Included --> ToolsCopy["tools/\n(COPY instruction)"]
Included --> ScriptCopy["build-docs.sh\n(COPY instruction)"]
ToolsCopy --> ReqInstall["requirements.txt\n→ uv pip install"]
ToolsCopy --> ScraperInstall["deepwiki-scraper.py\n→ /usr/local/bin/"]
ScriptCopy --> BuildInstall["build-docs.sh\n→ /usr/local/bin/"]
Excluded --> GitIgnored["output/\n(git-ignored)"]
Excluded --> GitFiles[".git/\n(implicit)"]
Excluded --> Readme["README.md\n(not referenced)"]
style BuildContext fill:#f9f9f9,stroke:#333
    style Included fill:#e8f5e9,stroke:#388e3c
    style Excluded fill:#ffebee,stroke:#c62828

Docker Build Context

The Docker build process includes only the files needed for container construction. Understanding this context is important for build optimization.

Build Context Inclusion

Copy Operations:

  1. Dockerfile16 - COPY tools/requirements.txt /tmp/requirements.txt
  2. Dockerfile24 - COPY tools/deepwiki-scraper.py /usr/local/bin/
  3. Dockerfile28 - COPY build-docs.sh /usr/local/bin/

Not Copied:

  • .gitignore - only used by Git
  • output/ - generated at runtime
  • .git/ - version control metadata
  • Any documentation files (README, LICENSE)

Sources: Dockerfile:16-28 .gitignore:1-2

graph TB
    subgraph BuildTime["Build-Time Dependencies"]
DF["Dockerfile"]
Req["tools/requirements.txt"]
Scraper["tools/deepwiki-scraper.py"]
BuildSh["build-docs.sh"]
DF -->|COPY [Line 16]| Req
 
       DF -->|RUN install [Line 17]| Req
 
       DF -->|COPY [Line 24]| Scraper
 
       DF -->|COPY [Line 28]| BuildSh
 
       DF -->|CMD [Line 32]| BuildSh
    end
    
    subgraph Runtime["Run-Time Dependencies"]
BuildShRun["build-docs.sh\n(Entry point)"]
ScraperExec["deepwiki-scraper.py\n(Phase 1-2)"]
MdBook["mdbook\n(Phase 3)"]
MdBookMermaid["mdbook-mermaid\n(Phase 3)"]
BuildShRun -->|python3 [Line 58]| ScraperExec
 
       BuildShRun -->|mdbook-mermaid install [Line 171]| MdBookMermaid
 
       BuildShRun -->|mdbook build [Line 176]| MdBook
        
 
       ScraperExec -->|import requests| Req
 
       ScraperExec -->|import bs4| Req
 
       ScraperExec -->|import html2text| Req
    end
    
    subgraph Generated["Generated Artifacts"]
WikiDir["$WIKI_DIR/\n(Temp markdown)"]
BookToml["book.toml\n(Config)"]
Summary["SUMMARY.md\n(TOC)"]
OutputDir["output/\n(Final artifacts)"]
ScraperExec -->|sys.argv[2]| WikiDir
 
       BuildShRun -->|cat > [Line 85]| BookToml
 
       BuildShRun -->|Lines 113-159| Summary
 
       BuildShRun -->|cp [Lines 184-191]| OutputDir
    end
    
 
   BuildTime --> Runtime
 
   Runtime --> Generated
    
    style DF fill:#e1f5ff,stroke:#0288d1
    style BuildShRun fill:#fff4e1,stroke:#f57c00
    style ScraperExec fill:#e8f5e9,stroke:#388e3c
    style OutputDir fill:#ffe0b2,stroke:#e64a19

File Dependency Graph

This diagram maps the relationships between files and shows which files depend on or reference others.

Sources: Dockerfile:1-33 build-docs.sh:1-206 tools/deepwiki-scraper.py:1-920 tools/requirements.txt:1-4

File Size and Complexity Metrics

Understanding the relative complexity of each component helps developers identify which files require the most attention during modifications.

FileLinesPurposeComplexity
tools/deepwiki-scraper.py920Content extraction and diagram matchingHigh
build-docs.sh206Orchestration and configurationMedium
Dockerfile33Multi-stage build specificationLow
tools/requirements.txt4Dependency listMinimal
.gitignore2Git exclusion ruleMinimal

Key Observations:

Sources: tools/deepwiki-scraper.py:1-920 build-docs.sh:1-206 Dockerfile:1-33 tools/requirements.txt:1-4 .gitignore:1-2

Dismiss

Refresh this wiki

Enter email to refresh