This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Project Structure
Loading…
Project Structure
Relevant source files
This document describes the repository’s file organization, detailing the purpose of each file and directory in the codebase. Understanding this structure is essential for developers who want to modify or extend the system.
For information about running tests, see page 13.2. For details about the Python dependencies, see page 13.3.
Repository Layout
The repository follows a clean, organized structure that separates Python code, shell scripts, and HTML templates into dedicated directories.
graph TB
Root["Repository Root"]
Root --> GitIgnore[".gitignore"]
Root --> Dockerfile["Dockerfile"]
Root --> README["README.md"]
Root --> PythonDir["python/"]
Root --> ScriptsDir["scripts/"]
Root --> TemplatesDir["templates/"]
Root --> GithubDir[".github/"]
Root --> OutputDir["output/"]
PythonDir --> Scraper["deepwiki-scraper.py"]
PythonDir --> ProcessTemplate["process-template.py"]
PythonDir --> Requirements["requirements.txt"]
PythonDir --> TestsDir["tests/"]
ScriptsDir --> BuildScript["build-docs.sh"]
ScriptsDir --> RunTests["run-tests.sh"]
TemplatesDir --> Header["header.html"]
TemplatesDir --> Footer["footer.html"]
TemplatesDir --> TemplateREADME["README.md"]
GithubDir --> Workflows["workflows/"]
OutputDir --> MarkdownOut["markdown/"]
OutputDir --> RawMarkdownOut["raw_markdown/"]
OutputDir --> BookOut["book/"]
OutputDir --> ConfigOut["book.toml"]
style Root fill:#f9f9f9,stroke:#333
style PythonDir fill:#e8f5e9,stroke:#388e3c
style ScriptsDir fill:#fff4e1,stroke:#f57c00
style TemplatesDir fill:#e1f5ff,stroke:#0288d1
style OutputDir fill:#ffe0b2,stroke:#e64a19
Physical File Hierarchy
Sources: README.md:84-88 .gitignore:1-7
Root Directory Files
The repository root contains the primary configuration and documentation files that define the system’s build behavior.
| File | Type | Purpose |
|---|---|---|
.gitignore | Config | Excludes generated output and temporary files |
Dockerfile | Build | Multi-stage Docker build specification |
README.md | Docs | Quick start guide and configuration reference |
.gitignore
Excludes build artifacts and temporary files from version control:
output/- Generated documentation artifacts*.pycand__pycache__/- Python bytecode.env- Local environment variables.DS_Store- macOS metadatatmp/- Temporary working directory
Sources: .gitignore:1-7
Dockerfile
Implements a two-stage build pattern to optimize image size. The builder stage compiles Rust binaries (mdbook, mdbook-mermaid), and the final stage creates a Python runtime with only the necessary executables.
Sources: README.md78
README.md
Primary documentation file containing quick start instructions, configuration reference, and high-level system overview. Serves as the entry point for new users.
Sources: README.md:1-95
graph TB
PythonDir["python/"]
PythonDir --> Scraper["deepwiki-scraper.py"]
PythonDir --> ProcessTemplate["process-template.py"]
PythonDir --> Requirements["requirements.txt"]
PythonDir --> TestsDir["tests/"]
TestsDir --> TemplateTest["test_template_processing.py"]
TestsDir --> MermaidTest["test_mermaid_normalization.py"]
TestsDir --> NumberingTest["test_page_numbering.py"]
Scraper --> ExtractWikiStructure["extract_wiki_structure()"]
Scraper --> ExtractPageContent["extract_page_content()"]
Scraper --> ExtractMermaid["extract_mermaid_from_nextjs_data()"]
Scraper --> NormalizeDiagram["normalize_mermaid_diagram()"]
Scraper --> ExtractAndEnhance["extract_and_enhance_diagrams()"]
ProcessTemplate --> ProcessFile["process_template_file()"]
ProcessTemplate --> SubstituteVars["substitute_variables()"]
Python Directory
The python/ directory contains all Python scripts, their dependencies, and test suites.
Python Directory Structure
Sources: README.md85
deepwiki-scraper.py
Core Python module for content extraction and diagram processing. Implements the Phase 1 (markdown extraction) and Phase 2 (diagram enhancement) logic of the pipeline.
Key Functions:
| Function | Purpose |
|---|---|
sanitize_filename() | Convert page titles to filesystem-safe names |
fetch_page() | HTTP client with retry logic and error handling |
discover_subsections() | Recursively probe for nested wiki pages |
extract_wiki_structure() | Build hierarchical page structure from DeepWiki |
clean_deepwiki_footer() | Remove DeepWiki UI elements from markdown |
convert_html_to_markdown() | HTML→Markdown conversion via html2text |
extract_mermaid_from_nextjs_data() | Extract diagrams from Next.js JavaScript payload |
normalize_mermaid_diagram() | Seven-step normalization for Mermaid 11 compatibility |
extract_page_content() | Main content extraction and markdown generation |
extract_and_enhance_diagrams() | Fuzzy matching and diagram injection |
main() | Entry point with temporary directory management |
The scraper uses a temporary directory pattern to ensure atomic operations. Files are written to tempfile.TemporaryDirectory(), enhanced in-place, then moved to the final output location.
Sources: README.md85
process-template.py
Template processing script that performs variable substitution in header and footer HTML files. Supports conditional rendering and automatic variable detection.
Key Functions:
| Function | Purpose |
|---|---|
process_template_file() | Main template processing entry point |
substitute_variables() | Replace {{VARIABLE}} placeholders with values |
Template variables include: {{REPO}}, {{BOOK_TITLE}}, {{BOOK_AUTHORS}}, {{GIT_REPO_URL}}, {{DEEPWIKI_URL}}, {{GENERATION_DATE}}.
Sources: README.md51
requirements.txt
Python dependencies for the scraper and template processor:
requests>=2.31.0- HTTP client for fetching wiki pagesbeautifulsoup4>=4.12.0- HTML parsing libraryhtml2text>=2020.1.16- HTML-to-Markdown converter
Installed via uv pip install during Docker build for faster, more reliable installation.
Sources: README.md85
tests/
Test suite for Python components. Contains unit tests for template processing, Mermaid normalization, and page numbering logic. See page 13.2 for details on running tests.
Sources: README.md82
Scripts Directory
The scripts/ directory contains shell scripts for orchestration and testing.
Scripts Directory Structure
Sources: README.md82 README.md86
build-docs.sh
Main orchestration script that coordinates the three-phase pipeline. Invoked as the Docker container’s entry point.
Execution Flow:
- Auto-detection - Detect
REPOfrom git remote if not provided - Configuration - Parse environment variables and set defaults
- Phase 1 - Execute
deepwiki-scraper.pyto extract markdown - Phase 2 - Process templates and generate
book.toml,SUMMARY.md - Phase 3 - Run
mdbook buildto generate HTML (unlessMARKDOWN_ONLY=true) - Cleanup - Copy outputs to
/outputvolume
Environment Variables:
REPO- GitHub repository (owner/repo format)BOOK_TITLE- Documentation titleBOOK_AUTHORS- Author metadataGIT_REPO_URL- Repository URL for edit linksDEEPWIKI_URL- DeepWiki page URLMARKDOWN_ONLY- Skip HTML build for debugging
Critical Paths:
WORK_DIR=/workspace- Working directoryWIKI_DIR=/workspace/wiki- Temporary markdown locationOUTPUT_DIR=/output- Volume mount for outputsBOOK_DIR=/workspace/book- mdBook source directory
Sources: README.md:34-37 README.md86
run-tests.sh
Test execution script that runs pytest on the Python test suite. Provides colored output and detailed test results.
Sources: README.md82
graph TB
TemplatesDir["templates/"]
TemplatesDir --> Header["header.html"]
TemplatesDir --> Footer["footer.html"]
TemplatesDir --> TemplateREADME["README.md"]
Header --> Variables["Template variables:\n{{REPO}}\n{{BOOK_TITLE}}\n{{GIT_REPO_URL}}\n{{DEEPWIKI_URL}}\n{{GENERATION_DATE}}"]
Footer --> Variables
Templates Directory
The templates/ directory contains HTML template files for header and footer customization.
Templates Directory Structure
Sources: README.md87
header.html
HTML template injected at the beginning of each markdown file. Supports variable substitution for dynamic content like repository links and generation timestamps.
Sources: README.md:40-51
footer.html
HTML template injected at the end of each markdown file. Supports the same variable substitution as header.html.
Sources: README.md:40-51
README.md
Documentation for the template system, including variable reference and customization examples.
Sources: README.md51
graph TB
Output["output/"]
Output --> Markdown["markdown/"]
Output --> RawMarkdown["raw_markdown/"]
Output --> Book["book/"]
Output --> Config["book.toml"]
Markdown --> MainPages["Main pages:\n1-overview.md\n2-quick-start.md"]
Markdown --> Sections["Subsection dirs:\nsection-2/\nsection-3/"]
Sections --> SubPages["Subsection pages:\n2-1-docker.md\n3-1-environment.md"]
RawMarkdown --> RawPages["Pre-enhanced\nmarkdown files\n(for debugging)"]
Book --> Index["index.html"]
Book --> CSS["css/"]
Book --> JS["mermaid.min.js"]
Book --> Search["searchindex.js"]
Output Directory (Generated)
The output/ directory is created at runtime and excluded from version control. It contains all generated artifacts produced by the build pipeline.
Output Structure
Sources: README.md:54-59
markdown/
Contains enhanced markdown source files with injected diagrams and processed templates. Files are organized hierarchically with subsections in section-N/ subdirectories.
Main Pages:
- Format:
{number}-{slug}.md(e.g.,1-overview.md) - Location:
output/markdown/
Subsection Pages:
- Format:
section-{main}/{number}-{slug}.md - Location:
output/markdown/section-{N}/ - Example:
section-3/3-2-environment-variables.md
Sources: README.md56
raw_markdown/
Pre-enhancement markdown files for debugging purposes. Contains the output of Phase 1 before diagram injection and template processing. Useful for troubleshooting diagram matching issues.
Sources: README.md57
book/
Complete HTML documentation site generated by mdBook. Self-contained static website with:
- Navigation sidebar generated from
SUMMARY.md - Full-text search via
searchindex.js - Rendered Mermaid diagrams via
mdbook-mermaid - Edit-on-GitHub links from
GIT_REPO_URL - Responsive Rust theme
The entire directory can be served by any static file server or deployed to GitHub Pages.
Sources: README.md55
book.toml
mdBook configuration file with repository-specific metadata. Dynamically generated during Phase 2 of the build pipeline. Contains book title, authors, theme settings, and preprocessor configuration.
Sources: README.md58
graph TB
BuildContext["Docker Build Context"]
BuildContext --> Included["Included in Image"]
BuildContext --> Excluded["Excluded"]
Included --> DockerfileBuild["Dockerfile\n(Build instructions)"]
Included --> ToolsCopy["tools/\n(COPY instruction)"]
Included --> ScriptCopy["build-docs.sh\n(COPY instruction)"]
ToolsCopy --> ReqInstall["requirements.txt\n→ uv pip install"]
ToolsCopy --> ScraperInstall["deepwiki-scraper.py\n→ /usr/local/bin/"]
ScriptCopy --> BuildInstall["build-docs.sh\n→ /usr/local/bin/"]
Excluded --> GitIgnored["output/\n(git-ignored)"]
Excluded --> GitFiles[".git/\n(implicit)"]
Excluded --> Readme["README.md\n(not referenced)"]
style BuildContext fill:#f9f9f9,stroke:#333
style Included fill:#e8f5e9,stroke:#388e3c
style Excluded fill:#ffebee,stroke:#c62828
Docker Build Context
The Docker build process includes only the files needed for container construction. Understanding this context is important for build optimization.
Build Context Inclusion
Copy Operations:
- Dockerfile16 -
COPY tools/requirements.txt /tmp/requirements.txt - Dockerfile24 -
COPY tools/deepwiki-scraper.py /usr/local/bin/ - Dockerfile28 -
COPY build-docs.sh /usr/local/bin/
Not Copied:
.gitignore- only used by Gitoutput/- generated at runtime.git/- version control metadata- Any documentation files (README, LICENSE)
Sources: Dockerfile:16-28 .gitignore:1-2
graph TB
subgraph BuildTime["Build-Time Dependencies"]
DF["Dockerfile"]
Req["tools/requirements.txt"]
Scraper["tools/deepwiki-scraper.py"]
BuildSh["build-docs.sh"]
DF -->|COPY [Line 16]| Req
DF -->|RUN install [Line 17]| Req
DF -->|COPY [Line 24]| Scraper
DF -->|COPY [Line 28]| BuildSh
DF -->|CMD [Line 32]| BuildSh
end
subgraph Runtime["Run-Time Dependencies"]
BuildShRun["build-docs.sh\n(Entry point)"]
ScraperExec["deepwiki-scraper.py\n(Phase 1-2)"]
MdBook["mdbook\n(Phase 3)"]
MdBookMermaid["mdbook-mermaid\n(Phase 3)"]
BuildShRun -->|python3 [Line 58]| ScraperExec
BuildShRun -->|mdbook-mermaid install [Line 171]| MdBookMermaid
BuildShRun -->|mdbook build [Line 176]| MdBook
ScraperExec -->|import requests| Req
ScraperExec -->|import bs4| Req
ScraperExec -->|import html2text| Req
end
subgraph Generated["Generated Artifacts"]
WikiDir["$WIKI_DIR/\n(Temp markdown)"]
BookToml["book.toml\n(Config)"]
Summary["SUMMARY.md\n(TOC)"]
OutputDir["output/\n(Final artifacts)"]
ScraperExec -->|sys.argv[2]| WikiDir
BuildShRun -->|cat > [Line 85]| BookToml
BuildShRun -->|Lines 113-159| Summary
BuildShRun -->|cp [Lines 184-191]| OutputDir
end
BuildTime --> Runtime
Runtime --> Generated
style DF fill:#e1f5ff,stroke:#0288d1
style BuildShRun fill:#fff4e1,stroke:#f57c00
style ScraperExec fill:#e8f5e9,stroke:#388e3c
style OutputDir fill:#ffe0b2,stroke:#e64a19
File Dependency Graph
This diagram maps the relationships between files and shows which files depend on or reference others.
Sources: Dockerfile:1-33 build-docs.sh:1-206 tools/deepwiki-scraper.py:1-920 tools/requirements.txt:1-4
File Size and Complexity Metrics
Understanding the relative complexity of each component helps developers identify which files require the most attention during modifications.
| File | Lines | Purpose | Complexity |
|---|---|---|---|
tools/deepwiki-scraper.py | 920 | Content extraction and diagram matching | High |
build-docs.sh | 206 | Orchestration and configuration | Medium |
Dockerfile | 33 | Multi-stage build specification | Low |
tools/requirements.txt | 4 | Dependency list | Minimal |
.gitignore | 2 | Git exclusion rule | Minimal |
Key Observations:
- 90% of code is in the Python scraper tools/deepwiki-scraper.py:1-920
- Shell script handles high-level orchestration build-docs.sh:1-206
- Dockerfile is minimal due to multi-stage optimization Dockerfile:1-33
- No configuration files in repository root (all generated at runtime)
Sources: tools/deepwiki-scraper.py:1-920 build-docs.sh:1-206 Dockerfile:1-33 tools/requirements.txt:1-4 .gitignore:1-2
Dismiss
Refresh this wiki
Enter email to refresh