This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Mermaid Normalization
Loading…
Mermaid Normalization
Relevant source files
The Mermaid normalization pipeline transforms diagrams extracted from DeepWiki’s JavaScript payload into syntax that is compatible with Mermaid 11. DeepWiki’s diagrams often contain formatting issues, legacy syntax, and multiline constructs that newer Mermaid parsers reject. This seven-step normalization process ensures that all diagrams render correctly in mdBook’s Mermaid renderer.
For information about how diagrams are extracted from the JavaScript payload, see Phase 2: Diagram Enhancement. For information about the fuzzy matching algorithm that places diagrams in the correct markdown files, see Fuzzy Matching Algorithm.
Purpose and Scope
This page documents the seven normalization functions that transform raw Mermaid diagram code into Mermaid 11-compatible syntax. Each normalization step addresses a specific category of syntax errors or incompatibilities. The pipeline is applied to every diagram before it is injected into markdown files.
The normalization pipeline handles:
- Multiline edge labels that span multiple lines
- State diagram description syntax variations
- Flowchart node labels containing reserved characters
- Missing statement separators between consecutive nodes
- Empty node labels that lack fallback text
- Gantt chart tasks missing required task identifiers
- Additional edge case transformations (quote stripping, label merging)
Normalization Pipeline Architecture
The normalization pipeline is orchestrated by the normalize_mermaid_diagram function, which applies seven normalization passes in sequence. Each pass is idempotent and focuses on a specific syntax issue.
Pipeline Flow Diagram
graph TD
Input["Raw Diagram Text\nfrom Next.js Payload"]
Step1["normalize_mermaid_edge_labels()\nFlatten multiline edge labels"]
Step2["normalize_mermaid_state_descriptions()\nFix state syntax"]
Step3["normalize_flowchart_nodes()\nClean node labels"]
Step4["normalize_statement_separators()\nInsert newlines"]
Step5["normalize_empty_node_labels()\nAdd fallback labels"]
Step6["normalize_gantt_diagram()\nAdd synthetic task IDs"]
Output["Normalized Diagram\nMermaid 11 Compatible"]
Input --> Step1
Step1 --> Step2
Step2 --> Step3
Step3 --> Step4
Step4 --> Step5
Step5 --> Step6
Step6 --> Output
Sources: python/deepwiki-scraper.py:385-393 python/deepwiki-scraper.py:230-383
Function Name to Normalization Step Mapping
Sources: python/deepwiki-scraper.py:385-393
Step 1: Edge Label Normalization
The normalize_mermaid_edge_labels function collapses multiline edge labels into single-line labels with escaped newline sequences. Mermaid 11 rejects edge labels that span multiple physical lines.
Function : normalize_mermaid_edge_labels(diagram_text: str) -> str
Pattern Matched : Edge labels enclosed in pipes: |....|
Transformations Applied :
- Replace literal newline characters with spaces
- Replace escaped
\nsequences with spaces - Remove parentheses from labels (invalid syntax)
- Collapse multiple spaces into single spaces
| Before | After |
|---|---|
| `A –> | “Label\nLine 2” |
| `C –> | Text (note) |
| `E –> | First\nSecond\nThird |
Implementation Details :
- Only processes diagrams starting with
graphorflowchartkeywords - Uses regex pattern
\|([^|]*)\|to match edge labels - Checks for presence of
\n,(, or)before applying cleanup - Preserves labels that are already properly formatted
Sources: python/deepwiki-scraper.py:230-251
Step 2: State Description Normalization
The normalize_mermaid_state_descriptions function ensures state diagram descriptions follow the strict State : Description syntax required by Mermaid 11.
Function : normalize_mermaid_state_descriptions(diagram_text: str) -> str
Pattern Matched : State declarations with colons in state diagrams
Transformations Applied :
- Ensure single space after state name before colon
- Replace newlines in descriptions with spaces
- Replace additional colons in description with
- - Collapse multiple spaces to single space
| Before | After |
|---|---|
Idle:Waiting\nfor input | Idle : Waiting for input |
Active:Processing:data | Active : Processing - data |
Error : Multiple spaces | Error : Multiple spaces |
Implementation Details :
- Only processes diagrams starting with
statediagramkeyword - Skips lines containing
::(double colon, used for class names) - Splits each line on first colon occurrence
- Requires both prefix and suffix to be non-empty after stripping
Sources: python/deepwiki-scraper.py:253-277
Step 3: Flowchart Node Normalization
The normalize_flowchart_nodes function removes reserved characters (especially pipe |) from flowchart node labels and adds statement separators.
Function : normalize_flowchart_nodes(diagram_text: str) -> str
Pattern Matched : Node labels in brackets: ["..."]
Transformations Applied :
- Replace pipe characters
|with forward slash/ - Collapse multiple spaces to single space
- Insert newlines between consecutive statements on same line
| Before | After |
|---|---|
| `Node[“Label | With Pipes“]` |
A["Text"] B["More"] | A["Text"] |
B["More"] | |
C["Many Spaces"] | C["Many Spaces"] |
Implementation Details :
- Only processes diagrams starting with
graphorflowchartkeywords - Uses regex
\["([^"]*)"\]to match quoted node labels - Inserts newlines after closing brackets/braces/parens using regex:
(\"]|\}|\))\s+(?=[A-Za-z0-9_]) - Preserves indentation when splitting statements
Sources: python/deepwiki-scraper.py:279-301
Step 4: Statement Separator Normalization
The normalize_statement_separators function inserts newlines between consecutive Mermaid statements that have been flattened onto a single line.
Function : normalize_statement_separators(diagram_text: str) -> str
Connector Tokens Recognized :
--> ==> -.-> --x x-- o--> o-> x-> *--> <--> <-.-> <-- --o
Pattern Matched : Whitespace before a node identifier that precedes a connector
Regex Pattern : STATEMENT_BREAK_PATTERN
| Before | After |
|---|---|
A-->B B-->C C-->D | A-->B |
B-->C | |
C-->D | |
Node1-->Node2 Node3-->Node4 | Node1-->Node2 |
Node3-->Node4 |
Implementation Details :
- Only processes diagrams starting with
graphorflowchartkeywords - Defines
FLOW_CONNECTORSlist of all Mermaid connector tokens - Builds regex pattern by escaping and joining connector tokens
- Pattern:
(?<!\n)([ \t]+)(?=[A-Za-z0-9_][\w\-]*(?:\s*\[[^\]]*\])?\s*(?:CONNECTORS)(?:\|[^|]*\|)?\s*) - Preserves indentation length when inserting newlines
- Converts tabs to 4 spaces for consistent indentation
Sources: python/deepwiki-scraper.py:303-328 python/deepwiki-scraper.py:309-311
Step 5: Empty Node Label Normalization
The normalize_empty_node_labels function provides fallback text for nodes with empty labels, which Mermaid 11 rejects.
Function : normalize_empty_node_labels(diagram_text: str) -> str
Pattern Matched : Empty quoted labels: NodeId[""]
Transformation Applied :
- Use node ID as fallback label text
- Replace underscores and hyphens with spaces
- Preserve original node ID for connections
| Before | After |
|---|---|
Dead[""] | Dead["Dead"] |
User_Profile[""] | User_Profile["User Profile"] |
API-Gateway[""] | API-Gateway["API Gateway"] |
Implementation Details :
- Regex pattern:
(\b[A-Za-z0-9_]+)\[""\] - Converts underscores/hyphens to spaces for readable label:
re.sub(r'[_\-]+', ' ', node_id) - Falls back to raw node_id if cleaned version is empty
- Applied to all diagram types (not limited to flowcharts)
Sources: python/deepwiki-scraper.py:330-341 python/tests/test_mermaid_normalization.py:19-23
Step 6: Gantt Diagram Normalization
The normalize_gantt_diagram function assigns synthetic task identifiers to gantt chart tasks that are missing them, which is required by Mermaid 11.
Function : normalize_gantt_diagram(diagram_text: str) -> str
Pattern Matched : Task lines in format "Task Name" : start, end[, duration]
Transformation Applied :
- Insert synthetic task ID (
task1,task2, etc.) after colon - Only apply to tasks lacking valid identifiers
- Preserve tasks that already have IDs or use
afterdependencies
| Before | After |
|---|---|
"Design" : 2024-01-01, 2024-01-10 | "Design" : task1, 2024-01-01, 2024-01-10 |
"Code" : myTask, 2024-01-11, 5d | "Code" : myTask, 2024-01-11, 5d (unchanged) |
"Test" : after task1, 3d | "Test" : after task1, 3d (unchanged) |
Implementation Details :
- Only processes diagrams starting with
ganttkeyword - Task line regex:
^(\s*"[^"]+"\s*):\s*(.+)$ - Splits remainder on commas (max 3 parts)
- Checks if first token matches
^[A-Za-z_][\w-]*$or starts withafter - Maintains counter (
task_counter) for generating unique IDs - Reconstructs line:
"{task_name}" : {task_id}, {start}, {end}[, {duration}]
Sources: python/deepwiki-scraper.py:343-383
Step 7: Additional Preprocessing
Before the seven main normalization steps, diagrams undergo additional preprocessing in the extraction phase:
Quote Stripping : strip_wrapping_quotes(diagram_text: str) -> str
- Removes unnecessary quotation marks around edge labels:
|"text"| → |text| - Removes quotes in state transitions:
: "label" → : label
Label Merging : merge_multiline_labels(diagram_text: str) -> str
- Collapses wrapped labels inside node shapes into
\nsequences - Handles multiple shape types:
(),[],{},(()),[[]],{{}} - Skips lines containing structural tokens (arrows, keywords)
- Applied before unescaping, so works with both real and escaped newlines
Sources: python/deepwiki-scraper.py:907-1023
Main Orchestrator Function
The normalize_mermaid_diagram function orchestrates all normalization passes in the correct order.
Function Signature : normalize_mermaid_diagram(diagram_text: str) -> str
Implementation :
Key Characteristics :
- Each pass is idempotent and can be safely applied multiple times
- Passes are independent and order-dependent
- Edge label normalization must precede statement separator insertion
- Flowchart node normalization includes its own statement separator logic
- Empty label normalization should occur after other node transformations
Sources: python/deepwiki-scraper.py:385-393
graph TD
Extract["extract_and_enhance_diagrams()"]
Loop["For each diagram match"]
Unescape["Unescape sequences\n(\\\n, \\ , \<, etc)"]
Preprocess["merge_multiline_labels()\nstrip_wrapping_quotes()"]
Normalize["normalize_mermaid_diagram()"]
Context["Extract context\n(heading, anchor text)"]
Pool["Add to diagram_contexts list"]
Extract --> Loop
Loop --> Unescape
Unescape --> Preprocess
Preprocess --> Normalize
Normalize --> Context
Context --> Pool
Normalization Invocation Points
The normalization pipeline is invoked at a single location during diagram processing:
Invocation Context Diagram
Sources: python/deepwiki-scraper.py:880-1089 python/deepwiki-scraper.py:1058-1060
Testing and Validation
The normalization pipeline has dedicated unit tests covering each normalization function:
Test Coverage :
| Function | Test File | Test Cases |
|---|---|---|
normalize_statement_separators | test_mermaid_normalization.py | Newline insertion, indentation preservation |
normalize_empty_node_labels | test_mermaid_normalization.py | Empty label replacement |
normalize_flowchart_nodes | test_mermaid_normalization.py | Pipe character stripping |
normalize_mermaid_diagram | test_mermaid_normalization.py | End-to-end pipeline test |
Example Test Case (Statement Separator Normalization):
End-to-End Test : The end-to-end test validates that multiple normalization steps work together correctly:
- Input:
graph TD\n Stage1[""] --> Stage2["Stage 2"]\n Stage2 --> Stage3 Stage3 --> Stage4 - Validates empty label replacement:
Stage1["Stage1"] - Validates statement separation:
Stage2 --> Stage3andStage3 --> Stage4on separate lines
Sources: python/tests/test_mermaid_normalization.py:1-42
Common Edge Cases
The normalization pipeline handles several edge cases that commonly occur in DeepWiki diagrams:
Empty Diagram Handling :
- All normalizers check for empty/whitespace-only input
- Return original text unchanged if stripped content is empty
Diagram Type Detection :
- Each normalizer checks diagram type via first line keyword
normalize_mermaid_edge_labels: only processesgraphorflowchartnormalize_mermaid_state_descriptions: only processesstatediagramnormalize_gantt_diagram: only processesgantt- Other normalizers apply to all diagram types
Indentation Preservation :
- Statement separator normalization preserves original indentation level
- Converts tabs to 4 spaces for consistent formatting
- Inserts newlines with matching indentation
Backtick Escaping in Fence Blocks : When injecting normalized diagrams into markdown, the injection logic dynamically calculates fence length to avoid conflicts with backticks inside diagram code:
- Scans diagram for longest backtick run
- Uses
max(3, max_backticks + 1)as fence length
Sources: python/deepwiki-scraper.py:1249-1255
Dismiss
Refresh this wiki
Enter email to refresh