Auto-Detection Features
Relevant source files
This document describes the automatic detection and configuration mechanisms in the DeepWiki-to-mdBook converter system. These features enable the system to operate with minimal user configuration by intelligently inferring repository metadata, generating sensible defaults, and dynamically discovering file structures.
For information about manually configuring these values, see Configuration Reference. For details on how SUMMARY.md generation works, see SUMMARY.md Generation.
Overview
The system implements three primary auto-detection capabilities:
- Git Repository Detection : Automatically identifies the GitHub repository from Git remote URLs
- Configuration Defaults : Generates book metadata from detected repository information
- File Structure Discovery : Dynamically builds table of contents from actual file hierarchies
These features allow the system to run with a single docker run command in many cases, with all necessary configuration inferred from context.
Git Repository Auto-Detection
Detection Mechanism
The system attempts to auto-detect the GitHub repository when the REPO environment variable is not provided. This detection occurs in the shell orchestrator and follows a specific fallback sequence.
Git Repository Auto-Detection Flow
flowchart TD
Start["build-docs.sh execution"]
CheckRepo{"REPO env\nvariable set?"}
UseRepo["Use $REPO value"]
CheckGit{"Git repository\ndetected?"}
GetRemote["Execute:\ngit config --get\nremote.origin.url"]
CheckRemote{"Remote URL\nfound?"}
ExtractOwnerRepo["Apply regex pattern:\ns#.*github\.com[:/]([^/]+/[^/\.]+)\n(\.git)?.*#\1#"]
SetRepo["Set REPO variable\nto owner/repo"]
ErrorExit["Exit with error:\nREPO must be set"]
Start --> CheckRepo
CheckRepo -->|Yes| UseRepo
CheckRepo -->|No| CheckGit
CheckGit -->|No| ErrorExit
CheckGit -->|Yes| GetRemote
GetRemote --> CheckRemote
CheckRemote -->|No| ErrorExit
CheckRemote -->|Yes| ExtractOwnerRepo
ExtractOwnerRepo --> SetRepo
UseRepo --> Continue["Continue with\nbuild process"]
SetRepo --> Continue
Sources: build-docs.sh:8-19
Implementation Details
The auto-detection logic is implemented in the shell script's initialization section:
| Detection Step | Shell Command | Purpose |
|---|---|---|
| Check Git repository | git rev-parse --git-dir > /dev/null 2>&1 | Verify current directory is a Git repository |
| Retrieve remote URL | git config --get remote.origin.url | Get the origin remote URL |
| Extract repository | sed -E 's#.*github\.com<FileRef file-url="https://github.com/jzombie/deepwiki-to-mdbook/blob/135bed35/#LNaN-LNaN" NaN file-path="">Hii</FileRef>(\.git)?.*#\1#' | Parse owner/repo from various URL formats |
The regex pattern in the sed command handles multiple GitHub URL formats:
- HTTPS:
https://github.com/owner/repo.git - SSH:
git@github.com:owner/repo.git - HTTPS without .git:
https://github.com/owner/repo - SSH without .git:
git@github.com:owner/repo
Sources: build-docs.sh:8-19
Supported URL Formats
The detection regex supports the following GitHub remote URL patterns:
The regex captures the repository path between github.com and any optional .git suffix, handling both : (SSH) and / (HTTPS) separators.
Sources: build-docs.sh:14-16
Configuration Defaults Generation
flowchart LR
REPO["$REPO\n(owner/repo)"]
Extract["Parse repository\ncomponents"]
REPO_OWNER["$REPO_OWNER\n(cut -d'/' -f1)"]
REPO_NAME["$REPO_NAME\n(cut -d'/' -f2)"]
DefaultAuthors["BOOK_AUTHORS\ndefault: $REPO_OWNER"]
DefaultURL["GIT_REPO_URL\ndefault: https://github.com/$REPO"]
DefaultTitle["BOOK_TITLE\ndefault: Documentation"]
FinalAuthors["Final BOOK_AUTHORS"]
FinalURL["Final GIT_REPO_URL"]
FinalTitle["Final BOOK_TITLE"]
REPO --> Extract
Extract --> REPO_OWNER
Extract --> REPO_NAME
REPO_OWNER --> DefaultAuthors
REPO --> DefaultURL
DefaultAuthors -->|Override if env var set| FinalAuthors
DefaultURL -->|Override if env var set| FinalURL
DefaultTitle -->|Override if env var set| FinalTitle
FinalAuthors --> BookToml["book.toml\ngeneration"]
FinalURL --> BookToml
FinalTitle --> BookToml
Metadata Derivation
Once the REPO variable is determined (either from environment or auto-detection), the system generates additional configuration values with intelligent defaults:
Configuration Default Generation Flow
Sources: build-docs.sh:39-45
Default Value Table
| Configuration Variable | Default Value Expression | Example Result | Override Behavior |
|---|---|---|---|
REPO_OWNER | `$(echo "$REPO" | cut -d'/' -f1)` | jzombie |
REPO_NAME | `$(echo "$REPO" | cut -d'/' -f2)` | deepwiki-to-mdbook |
BOOK_AUTHORS | ${BOOK_AUTHORS:=$REPO_OWNER} | jzombie | Environment variable takes precedence |
GIT_REPO_URL | ${GIT_REPO_URL:=https://github.com/$REPO} | https://github.com/jzombie/deepwiki-to-mdbook | Environment variable takes precedence |
BOOK_TITLE | ${BOOK_TITLE:-Documentation} | Documentation | Environment variable takes precedence |
The shell parameter expansion syntax ${VAR:=default} assigns the default value only if VAR is unset or null, enabling environment variable overrides.
Sources: build-docs.sh:21-26 build-docs.sh:39-45
book.toml Generation
The auto-detected and default values are incorporated into the dynamically generated book.toml configuration file:
The git-repository-url field enables mdBook to generate "Edit this page" links that direct users to the appropriate GitHub repository file.
Sources: build-docs.sh:85-103 README.md99
File Structure Discovery
Dynamic SUMMARY.md Generation
The system automatically discovers the file hierarchy and generates a table of contents without requiring manual configuration. This process analyzes the scraped markdown files to determine their structure.
File Structure Discovery Algorithm
flowchart TD
Start["Begin SUMMARY.md\ngeneration"]
FindFirst["Find first .md file:\nls $WIKI_DIR/*.md /head -1"]
ExtractTitle["Extract title: head -1 file/ sed 's/^# //'"]
WriteIntro["Write introduction entry\nto SUMMARY.md"]
IterateFiles["Iterate all .md files\nin $WIKI_DIR"]
SkipFirst{"Is this\nfirst page?"}
ExtractNum["Extract section number:\ngrep -oE '^[0-9]+'"]
CheckSubdir{"Does section-N\ndirectory exist?"}
WriteSection["Write section header:\n# Title"]
WriteMain["Write main entry:\n- [Title](filename.md)"]
IterateSubs["Iterate subsection files:\nsection-N/*.md"]
WriteSubentry["Write subsection:\n - [Subtitle](section-N/file.md)"]
WriteStandalone["Write standalone entry:\n- [Title](filename.md)"]
NextFile{"More files?"}
Done["SUMMARY.md complete"]
Start --> FindFirst
FindFirst --> ExtractTitle
ExtractTitle --> WriteIntro
WriteIntro --> IterateFiles
IterateFiles --> SkipFirst
SkipFirst -->|Yes| NextFile
SkipFirst -->|No| ExtractNum
ExtractNum --> CheckSubdir
CheckSubdir -->|Yes| WriteSection
WriteSection --> WriteMain
WriteMain --> IterateSubs
IterateSubs --> WriteSubentry
WriteSubentry --> NextFile
CheckSubdir -->|No| WriteStandalone
WriteStandalone --> NextFile
NextFile -->|Yes| SkipFirst
NextFile -->|No| Done
Sources: build-docs.sh:112-159
Directory Structure Detection
The file structure discovery algorithm recognizes two organizational patterns:
Recognized File Hierarchy Patterns
$WIKI_DIR/
├── 1-overview.md # Main page (becomes introduction)
├── 2-architecture.md # Main page with subsections
├── 3-components.md # Standalone page
├── section-2/ # Subsection directory
│ ├── 2-1-system-design.md
│ └── 2-2-data-flow.md
└── section-4/ # Another subsection directory
├── 4-1-phase-one.md
└── 4-2-phase-two.md
The algorithm uses the following detection logic:
| Pattern Element | Detection Method | Code Reference |
|---|---|---|
| Main pages | for file in "$WIKI_DIR"/*.md | build-docs.sh126 |
| Section number | `echo "$filename" | grep -oE '^[0-9]+'` |
| Subsection directory | [ -d "$WIKI_DIR/section-$section_num" ] | build-docs.sh138 |
| Subsection files | for subfile in "$section_dir"/*.md | build-docs.sh147 |
Sources: build-docs.sh:126-157
Title Extraction
Page titles are automatically extracted from the first line of each markdown file using the following approach:
This command:
- Reads the first line of the file with
head -1 - Removes the markdown heading syntax
#withsed 's/^# //' - Assigns the result to the
titlevariable for use in SUMMARY.md
Sources: build-docs.sh134 build-docs.sh150
Generated SUMMARY.md Example
Given the file structure shown above, the system generates:
The generation process outputs a count of entries: Generated SUMMARY.md with N entries where N is determined by grep -c '\<FileRef file-url="https://github.com/jzombie/deepwiki-to-mdbook/blob/135bed35/' src/SUMMARY.md.\n\nSources#LNaN-LNaN" NaN file-path="' src/SUMMARY.md`.\n\nSources">Hii
Auto-Detection in CI/CD Context
Docker Container Limitations
The Git repository auto-detection feature has limitations when running inside a Docker container. The detection logic executes within the container's filesystem, which typically does not include the host's Git repository unless explicitly mounted.
Auto-Detection Context Comparison
| Execution Context | Git Repository Available | Auto-Detection Works | Recommended Usage |
|---|---|---|---|
| Host machine with Git repository | ✓ Yes | ✓ Yes | Local development/testing |
| Docker container (default) | ✗ No | ✗ No | Must provide REPO env var |
| Docker with volume mount of Git repo | ✓ Yes | ⚠ Partial | Not recommended (complexity) |
| CI/CD pipeline (GitHub Actions, etc.) | ⚠ Varies | ⚠ Conditional | Use explicit REPO for reliability |
For production and CI/CD usage, explicitly setting the REPO environment variable is recommended:
Sources: build-docs.sh:8-36 README.md:47-53
Implementation Code References
Shell Variable Initialization
The complete auto-detection and default generation sequence:
Sources: build-docs.sh:8-45
Error Handling
The system validates that a repository is available (either from environment or auto-detection) before proceeding:
This validation ensures the system fails fast with a clear error message if configuration is insufficient.
Sources: build-docs.sh:33-37