Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

DeepWiki GitHub

Auto-Detection Features

Relevant source files

This document describes the automatic detection and configuration mechanisms in the DeepWiki-to-mdBook converter system. These features enable the system to operate with minimal user configuration by intelligently inferring repository metadata, generating sensible defaults, and dynamically discovering file structures.

For information about manually configuring these values, see Configuration Reference. For details on how SUMMARY.md generation works, see SUMMARY.md Generation.

Overview

The system implements three primary auto-detection capabilities:

  1. Git Repository Detection : Automatically identifies the GitHub repository from Git remote URLs
  2. Configuration Defaults : Generates book metadata from detected repository information
  3. File Structure Discovery : Dynamically builds table of contents from actual file hierarchies

These features allow the system to run with a single docker run command in many cases, with all necessary configuration inferred from context.

Git Repository Auto-Detection

Detection Mechanism

The system attempts to auto-detect the GitHub repository when the REPO environment variable is not provided. This detection occurs in the shell orchestrator and follows a specific fallback sequence.

Git Repository Auto-Detection Flow

flowchart TD
    Start["build-docs.sh execution"]
CheckRepo{"REPO env\nvariable set?"}
UseRepo["Use $REPO value"]
CheckGit{"Git repository\ndetected?"}
GetRemote["Execute:\ngit config --get\nremote.origin.url"]
CheckRemote{"Remote URL\nfound?"}
ExtractOwnerRepo["Apply regex pattern:\ns#.*github\.com[:/]([^/]+/[^/\.]+)\n(\.git)?.*#\1#"]
SetRepo["Set REPO variable\nto owner/repo"]
ErrorExit["Exit with error:\nREPO must be set"]
Start --> CheckRepo
 
   CheckRepo -->|Yes| UseRepo
 
   CheckRepo -->|No| CheckGit
 
   CheckGit -->|No| ErrorExit
 
   CheckGit -->|Yes| GetRemote
 
   GetRemote --> CheckRemote
 
   CheckRemote -->|No| ErrorExit
 
   CheckRemote -->|Yes| ExtractOwnerRepo
 
   ExtractOwnerRepo --> SetRepo
 
   UseRepo --> Continue["Continue with\nbuild process"]
SetRepo --> Continue

Sources: build-docs.sh:8-19

Implementation Details

The auto-detection logic is implemented in the shell script's initialization section:

Detection StepShell CommandPurpose
Check Git repositorygit rev-parse --git-dir > /dev/null 2>&1Verify current directory is a Git repository
Retrieve remote URLgit config --get remote.origin.urlGet the origin remote URL
Extract repositorysed -E 's#.*github\.com<FileRef file-url="https://github.com/jzombie/deepwiki-to-mdbook/blob/135bed35/#LNaN-LNaN" NaN file-path="">Hii</FileRef>(\.git)?.*#\1#'Parse owner/repo from various URL formats

The regex pattern in the sed command handles multiple GitHub URL formats:

  • HTTPS: https://github.com/owner/repo.git
  • SSH: git@github.com:owner/repo.git
  • HTTPS without .git: https://github.com/owner/repo
  • SSH without .git: git@github.com:owner/repo

Sources: build-docs.sh:8-19

Supported URL Formats

The detection regex supports the following GitHub remote URL patterns:

The regex captures the repository path between github.com and any optional .git suffix, handling both : (SSH) and / (HTTPS) separators.

Sources: build-docs.sh:14-16

Configuration Defaults Generation

flowchart LR
    REPO["$REPO\n(owner/repo)"]
Extract["Parse repository\ncomponents"]
REPO_OWNER["$REPO_OWNER\n(cut -d'/' -f1)"]
REPO_NAME["$REPO_NAME\n(cut -d'/' -f2)"]
DefaultAuthors["BOOK_AUTHORS\ndefault: $REPO_OWNER"]
DefaultURL["GIT_REPO_URL\ndefault: https://github.com/$REPO"]
DefaultTitle["BOOK_TITLE\ndefault: Documentation"]
FinalAuthors["Final BOOK_AUTHORS"]
FinalURL["Final GIT_REPO_URL"]
FinalTitle["Final BOOK_TITLE"]
REPO --> Extract
 
   Extract --> REPO_OWNER
 
   Extract --> REPO_NAME
    
 
   REPO_OWNER --> DefaultAuthors
 
   REPO --> DefaultURL
    
 
   DefaultAuthors -->|Override if env var set| FinalAuthors
 
   DefaultURL -->|Override if env var set| FinalURL
 
   DefaultTitle -->|Override if env var set| FinalTitle
    
 
   FinalAuthors --> BookToml["book.toml\ngeneration"]
FinalURL --> BookToml
 
   FinalTitle --> BookToml

Metadata Derivation

Once the REPO variable is determined (either from environment or auto-detection), the system generates additional configuration values with intelligent defaults:

Configuration Default Generation Flow

Sources: build-docs.sh:39-45

Default Value Table

Configuration VariableDefault Value ExpressionExample ResultOverride Behavior
REPO_OWNER`$(echo "$REPO"cut -d'/' -f1)`jzombie
REPO_NAME`$(echo "$REPO"cut -d'/' -f2)`deepwiki-to-mdbook
BOOK_AUTHORS${BOOK_AUTHORS:=$REPO_OWNER}jzombieEnvironment variable takes precedence
GIT_REPO_URL${GIT_REPO_URL:=https://github.com/$REPO}https://github.com/jzombie/deepwiki-to-mdbookEnvironment variable takes precedence
BOOK_TITLE${BOOK_TITLE:-Documentation}DocumentationEnvironment variable takes precedence

The shell parameter expansion syntax ${VAR:=default} assigns the default value only if VAR is unset or null, enabling environment variable overrides.

Sources: build-docs.sh:21-26 build-docs.sh:39-45

book.toml Generation

The auto-detected and default values are incorporated into the dynamically generated book.toml configuration file:

The git-repository-url field enables mdBook to generate "Edit this page" links that direct users to the appropriate GitHub repository file.

Sources: build-docs.sh:85-103 README.md99

File Structure Discovery

Dynamic SUMMARY.md Generation

The system automatically discovers the file hierarchy and generates a table of contents without requiring manual configuration. This process analyzes the scraped markdown files to determine their structure.

File Structure Discovery Algorithm

flowchart TD
    Start["Begin SUMMARY.md\ngeneration"]
FindFirst["Find first .md file:\nls $WIKI_DIR/*.md /head -1"]
ExtractTitle["Extract title: head -1 file/ sed 's/^# //'"]
WriteIntro["Write introduction entry\nto SUMMARY.md"]
IterateFiles["Iterate all .md files\nin $WIKI_DIR"]
SkipFirst{"Is this\nfirst page?"}
ExtractNum["Extract section number:\ngrep -oE '^[0-9]+'"]
CheckSubdir{"Does section-N\ndirectory exist?"}
WriteSection["Write section header:\n# Title"]
WriteMain["Write main entry:\n- [Title](filename.md)"]
IterateSubs["Iterate subsection files:\nsection-N/*.md"]
WriteSubentry["Write subsection:\n - [Subtitle](section-N/file.md)"]
WriteStandalone["Write standalone entry:\n- [Title](filename.md)"]
NextFile{"More files?"}
Done["SUMMARY.md complete"]
Start --> FindFirst
 
   FindFirst --> ExtractTitle
 
   ExtractTitle --> WriteIntro
 
   WriteIntro --> IterateFiles
 
   IterateFiles --> SkipFirst
 
   SkipFirst -->|Yes| NextFile
 
   SkipFirst -->|No| ExtractNum
 
   ExtractNum --> CheckSubdir
 
   CheckSubdir -->|Yes| WriteSection
 
   WriteSection --> WriteMain
 
   WriteMain --> IterateSubs
 
   IterateSubs --> WriteSubentry
 
   WriteSubentry --> NextFile
 
   CheckSubdir -->|No| WriteStandalone
 
   WriteStandalone --> NextFile
 
   NextFile -->|Yes| SkipFirst
 
   NextFile -->|No| Done

Sources: build-docs.sh:112-159

Directory Structure Detection

The file structure discovery algorithm recognizes two organizational patterns:

Recognized File Hierarchy Patterns

$WIKI_DIR/
├── 1-overview.md                    # Main page (becomes introduction)
├── 2-architecture.md                # Main page with subsections
├── 3-components.md                  # Standalone page
├── section-2/                       # Subsection directory
│   ├── 2-1-system-design.md
│   └── 2-2-data-flow.md
└── section-4/                       # Another subsection directory
    ├── 4-1-phase-one.md
    └── 4-2-phase-two.md

The algorithm uses the following detection logic:

Pattern ElementDetection MethodCode Reference
Main pagesfor file in "$WIKI_DIR"/*.mdbuild-docs.sh126
Section number`echo "$filename"grep -oE '^[0-9]+'`
Subsection directory[ -d "$WIKI_DIR/section-$section_num" ]build-docs.sh138
Subsection filesfor subfile in "$section_dir"/*.mdbuild-docs.sh147

Sources: build-docs.sh:126-157

Title Extraction

Page titles are automatically extracted from the first line of each markdown file using the following approach:

This command:

  1. Reads the first line of the file with head -1
  2. Removes the markdown heading syntax # with sed 's/^# //'
  3. Assigns the result to the title variable for use in SUMMARY.md

Sources: build-docs.sh134 build-docs.sh150

Generated SUMMARY.md Example

Given the file structure shown above, the system generates:

The generation process outputs a count of entries: Generated SUMMARY.md with N entries where N is determined by grep -c '\<FileRef file-url="https://github.com/jzombie/deepwiki-to-mdbook/blob/135bed35/' src/SUMMARY.md.\n\nSources#LNaN-LNaN" NaN file-path="' src/SUMMARY.md`.\n\nSources">Hii

Auto-Detection in CI/CD Context

Docker Container Limitations

The Git repository auto-detection feature has limitations when running inside a Docker container. The detection logic executes within the container's filesystem, which typically does not include the host's Git repository unless explicitly mounted.

Auto-Detection Context Comparison

Execution ContextGit Repository AvailableAuto-Detection WorksRecommended Usage
Host machine with Git repository✓ Yes✓ YesLocal development/testing
Docker container (default)✗ No✗ NoMust provide REPO env var
Docker with volume mount of Git repo✓ Yes⚠ PartialNot recommended (complexity)
CI/CD pipeline (GitHub Actions, etc.)⚠ Varies⚠ ConditionalUse explicit REPO for reliability

For production and CI/CD usage, explicitly setting the REPO environment variable is recommended:

Sources: build-docs.sh:8-36 README.md:47-53

Implementation Code References

Shell Variable Initialization

The complete auto-detection and default generation sequence:

Sources: build-docs.sh:8-45

Error Handling

The system validates that a repository is available (either from environment or auto-detection) before proceeding:

This validation ensures the system fails fast with a clear error message if configuration is insufficient.

Sources: build-docs.sh:33-37