This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Mermaid Normalization

Loading…

Mermaid Normalization

Relevant source files

The Mermaid normalization pipeline transforms diagrams extracted from DeepWiki’s JavaScript payload into syntax that is compatible with Mermaid 11. DeepWiki’s diagrams often contain formatting issues, legacy syntax, and multiline constructs that newer Mermaid parsers reject. This seven-step normalization process ensures that all diagrams render correctly in mdBook’s Mermaid renderer.

For information about how diagrams are extracted from the JavaScript payload, see Phase 2: Diagram Enhancement. For information about the fuzzy matching algorithm that places diagrams in the correct markdown files, see Fuzzy Matching Algorithm.

Purpose and Scope

This page documents the seven normalization functions that transform raw Mermaid diagram code into Mermaid 11-compatible syntax. Each normalization step addresses a specific category of syntax errors or incompatibilities. The pipeline is applied to every diagram before it is injected into markdown files.

The normalization pipeline handles:

Multiline edge labels that span multiple lines
State diagram description syntax variations
Flowchart node labels containing reserved characters
Missing statement separators between consecutive nodes
Empty node labels that lack fallback text
Gantt chart tasks missing required task identifiers
Additional edge case transformations (quote stripping, label merging)

Normalization Pipeline Architecture

The normalization pipeline is orchestrated by the normalize_mermaid_diagram function, which applies seven normalization passes in sequence. Each pass is idempotent and focuses on a specific syntax issue.

Pipeline Flow Diagram

graph TD
    Input["Raw Diagram Text\nfrom Next.js Payload"]
Step1["normalize_mermaid_edge_labels()\nFlatten multiline edge labels"]
Step2["normalize_mermaid_state_descriptions()\nFix state syntax"]
Step3["normalize_flowchart_nodes()\nClean node labels"]
Step4["normalize_statement_separators()\nInsert newlines"]
Step5["normalize_empty_node_labels()\nAdd fallback labels"]
Step6["normalize_gantt_diagram()\nAdd synthetic task IDs"]
Output["Normalized Diagram\nMermaid 11 Compatible"]
Input --> Step1
 
   Step1 --> Step2
 
   Step2 --> Step3
 
   Step3 --> Step4
 
   Step4 --> Step5
 
   Step5 --> Step6
 
   Step6 --> Output

Sources: python/deepwiki-scraper.py:385-393 python/deepwiki-scraper.py:230-383

Function Name to Normalization Step Mapping

Sources: python/deepwiki-scraper.py:385-393

Step 1: Edge Label Normalization

The normalize_mermaid_edge_labels function collapses multiline edge labels into single-line labels with escaped newline sequences. Mermaid 11 rejects edge labels that span multiple physical lines.

Function : normalize_mermaid_edge_labels(diagram_text: str) -> str

Pattern Matched : Edge labels enclosed in pipes: |....|

Transformations Applied :

Replace literal newline characters with spaces
Replace escaped \n sequences with spaces
Remove parentheses from labels (invalid syntax)
Collapse multiple spaces into single spaces

Before	After
`A –>	“Label\nLine 2”
`C –>	Text (note)
`E –>	First\nSecond\nThird

Implementation Details :

Only processes diagrams starting with graph or flowchart keywords
Uses regex pattern \|([^|]*)\| to match edge labels
Checks for presence of \n, (, or ) before applying cleanup
Preserves labels that are already properly formatted

Sources: python/deepwiki-scraper.py:230-251

Step 2: State Description Normalization

The normalize_mermaid_state_descriptions function ensures state diagram descriptions follow the strict State : Description syntax required by Mermaid 11.

Function : normalize_mermaid_state_descriptions(diagram_text: str) -> str

Pattern Matched : State declarations with colons in state diagrams

Transformations Applied :

Ensure single space after state name before colon
Replace newlines in descriptions with spaces
Replace additional colons in description with -
Collapse multiple spaces to single space

Before	After
`Idle:Waiting\nfor input`	`Idle : Waiting for input`
`Active:Processing:data`	`Active : Processing - data`
`Error : Multiple spaces`	`Error : Multiple spaces`

Implementation Details :

Only processes diagrams starting with statediagram keyword
Skips lines containing :: (double colon, used for class names)
Splits each line on first colon occurrence
Requires both prefix and suffix to be non-empty after stripping

Sources: python/deepwiki-scraper.py:253-277

Step 3: Flowchart Node Normalization

The normalize_flowchart_nodes function removes reserved characters (especially pipe |) from flowchart node labels and adds statement separators.

Function : normalize_flowchart_nodes(diagram_text: str) -> str

Pattern Matched : Node labels in brackets: ["..."]

Transformations Applied :

Replace pipe characters | with forward slash /
Collapse multiple spaces to single space
Insert newlines between consecutive statements on same line

Before	After
`Node[“Label	With Pipes“]`
`A["Text"] B["More"]`	`A["Text"]`
`B["More"]`
`C["Many Spaces"]`	`C["Many Spaces"]`

Implementation Details :

Only processes diagrams starting with graph or flowchart keywords
Uses regex \["([^"]*)"\] to match quoted node labels
Inserts newlines after closing brackets/braces/parens using regex: (\"]|\}|\))\s+(?=[A-Za-z0-9_])
Preserves indentation when splitting statements

Sources: python/deepwiki-scraper.py:279-301

Step 4: Statement Separator Normalization

The normalize_statement_separators function inserts newlines between consecutive Mermaid statements that have been flattened onto a single line.

Function : normalize_statement_separators(diagram_text: str) -> str

Connector Tokens Recognized :

--> ==> -.-> --x x-- o--> o-> x-> *--> <--> <-.-> <-- --o

Pattern Matched : Whitespace before a node identifier that precedes a connector

Regex Pattern : STATEMENT_BREAK_PATTERN

Before	After
`A-->B B-->C C-->D`	`A-->B`
`B-->C`
`C-->D`
`Node1-->Node2 Node3-->Node4`	`Node1-->Node2`
`Node3-->Node4`

Implementation Details :

Only processes diagrams starting with graph or flowchart keywords
Defines FLOW_CONNECTORS list of all Mermaid connector tokens
Builds regex pattern by escaping and joining connector tokens
Pattern: (?<!\n)([ \t]+)(?=[A-Za-z0-9_][\w\-]*(?:\s*\[[^\]]*\])?\s*(?:CONNECTORS)(?:\|[^|]*\|)?\s*)
Preserves indentation length when inserting newlines
Converts tabs to 4 spaces for consistent indentation

Sources: python/deepwiki-scraper.py:303-328 python/deepwiki-scraper.py:309-311

Step 5: Empty Node Label Normalization

The normalize_empty_node_labels function provides fallback text for nodes with empty labels, which Mermaid 11 rejects.

Function : normalize_empty_node_labels(diagram_text: str) -> str

Pattern Matched : Empty quoted labels: NodeId[""]

Transformation Applied :

Use node ID as fallback label text
Replace underscores and hyphens with spaces
Preserve original node ID for connections

Before	After
`Dead[""]`	`Dead["Dead"]`
`User_Profile[""]`	`User_Profile["User Profile"]`
`API-Gateway[""]`	`API-Gateway["API Gateway"]`

Implementation Details :

Regex pattern: (\b[A-Za-z0-9_]+)\[""\]
Converts underscores/hyphens to spaces for readable label: re.sub(r'[_\-]+', ' ', node_id)
Falls back to raw node_id if cleaned version is empty
Applied to all diagram types (not limited to flowcharts)

Sources: python/deepwiki-scraper.py:330-341 python/tests/test_mermaid_normalization.py:19-23

Step 6: Gantt Diagram Normalization

The normalize_gantt_diagram function assigns synthetic task identifiers to gantt chart tasks that are missing them, which is required by Mermaid 11.

Function : normalize_gantt_diagram(diagram_text: str) -> str

Pattern Matched : Task lines in format "Task Name" : start, end[, duration]

Transformation Applied :

Insert synthetic task ID (task1, task2, etc.) after colon
Only apply to tasks lacking valid identifiers
Preserve tasks that already have IDs or use after dependencies

Before	After
`"Design" : 2024-01-01, 2024-01-10`	`"Design" : task1, 2024-01-01, 2024-01-10`
`"Code" : myTask, 2024-01-11, 5d`	`"Code" : myTask, 2024-01-11, 5d` (unchanged)
`"Test" : after task1, 3d`	`"Test" : after task1, 3d` (unchanged)

Implementation Details :

Only processes diagrams starting with gantt keyword
Task line regex: ^(\s*"[^"]+"\s*):\s*(.+)$
Splits remainder on commas (max 3 parts)
Checks if first token matches ^[A-Za-z_][\w-]*$ or starts with after
Maintains counter (task_counter) for generating unique IDs
Reconstructs line: "{task_name}" : {task_id}, {start}, {end}[, {duration}]

Sources: python/deepwiki-scraper.py:343-383

Step 7: Additional Preprocessing

Before the seven main normalization steps, diagrams undergo additional preprocessing in the extraction phase:

Quote Stripping : strip_wrapping_quotes(diagram_text: str) -> str

Removes unnecessary quotation marks around edge labels: |"text"| → |text|
Removes quotes in state transitions: : "label" → : label

Label Merging : merge_multiline_labels(diagram_text: str) -> str

Collapses wrapped labels inside node shapes into \n sequences
Handles multiple shape types: (), [], {}, (()), [[]], {{}}
Skips lines containing structural tokens (arrows, keywords)
Applied before unescaping, so works with both real and escaped newlines

Sources: python/deepwiki-scraper.py:907-1023

Main Orchestrator Function

The normalize_mermaid_diagram function orchestrates all normalization passes in the correct order.

Function Signature : normalize_mermaid_diagram(diagram_text: str) -> str

Implementation :

Key Characteristics :

Each pass is idempotent and can be safely applied multiple times
Passes are independent and order-dependent
Edge label normalization must precede statement separator insertion
Flowchart node normalization includes its own statement separator logic
Empty label normalization should occur after other node transformations

Sources: python/deepwiki-scraper.py:385-393

graph TD
    Extract["extract_and_enhance_diagrams()"]
Loop["For each diagram match"]
Unescape["Unescape sequences\n(\\\n, \\ , \<, etc)"]
Preprocess["merge_multiline_labels()\nstrip_wrapping_quotes()"]
Normalize["normalize_mermaid_diagram()"]
Context["Extract context\n(heading, anchor text)"]
Pool["Add to diagram_contexts list"]
Extract --> Loop
 
   Loop --> Unescape
 
   Unescape --> Preprocess
 
   Preprocess --> Normalize
 
   Normalize --> Context
 
   Context --> Pool

Normalization Invocation Points

The normalization pipeline is invoked at a single location during diagram processing:

Invocation Context Diagram

Sources: python/deepwiki-scraper.py:880-1089 python/deepwiki-scraper.py:1058-1060

Testing and Validation

The normalization pipeline has dedicated unit tests covering each normalization function:

Test Coverage :

Function	Test File	Test Cases
`normalize_statement_separators`	test_mermaid_normalization.py	Newline insertion, indentation preservation
`normalize_empty_node_labels`	test_mermaid_normalization.py	Empty label replacement
`normalize_flowchart_nodes`	test_mermaid_normalization.py	Pipe character stripping
`normalize_mermaid_diagram`	test_mermaid_normalization.py	End-to-end pipeline test

Example Test Case (Statement Separator Normalization):

End-to-End Test : The end-to-end test validates that multiple normalization steps work together correctly:

Input: graph TD\n Stage1[""] --> Stage2["Stage 2"]\n Stage2 --> Stage3 Stage3 --> Stage4
Validates empty label replacement: Stage1["Stage1"]
Validates statement separation: Stage2 --> Stage3 and Stage3 --> Stage4 on separate lines

Sources: python/tests/test_mermaid_normalization.py:1-42

Common Edge Cases

The normalization pipeline handles several edge cases that commonly occur in DeepWiki diagrams:

Empty Diagram Handling :

All normalizers check for empty/whitespace-only input
Return original text unchanged if stripped content is empty

Diagram Type Detection :

Each normalizer checks diagram type via first line keyword
normalize_mermaid_edge_labels: only processes graph or flowchart
normalize_mermaid_state_descriptions: only processes statediagram
normalize_gantt_diagram: only processes gantt
Other normalizers apply to all diagram types

Indentation Preservation :

Statement separator normalization preserves original indentation level
Converts tabs to 4 spaces for consistent formatting
Inserts newlines with matching indentation

Backtick Escaping in Fence Blocks : When injecting normalized diagrams into markdown, the injection logic dynamically calculates fence length to avoid conflicts with backticks inside diagram code:

Scans diagram for longest backtick run
Uses max(3, max_backticks + 1) as fence length

Sources: python/deepwiki-scraper.py:1249-1255

Dismiss

Refresh this wiki

Enter email to refresh

Keyboard shortcuts

deepwiki-to-mdbook Documentation