Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Mermaid Normalization

Loading…

Mermaid Normalization

Relevant source files

The Mermaid normalization pipeline transforms diagrams extracted from DeepWiki’s JavaScript payload into syntax that is compatible with Mermaid 11. DeepWiki’s diagrams often contain formatting issues, legacy syntax, and multiline constructs that newer Mermaid parsers reject. This seven-step normalization process ensures that all diagrams render correctly in mdBook’s Mermaid renderer.

For information about how diagrams are extracted from the JavaScript payload, see Phase 2: Diagram Enhancement. For information about the fuzzy matching algorithm that places diagrams in the correct markdown files, see Fuzzy Matching Algorithm.

Purpose and Scope

This page documents the seven normalization functions that transform raw Mermaid diagram code into Mermaid 11-compatible syntax. Each normalization step addresses a specific category of syntax errors or incompatibilities. The pipeline is applied to every diagram before it is injected into markdown files.

The normalization pipeline handles:

  • Multiline edge labels that span multiple lines
  • State diagram description syntax variations
  • Flowchart node labels containing reserved characters
  • Missing statement separators between consecutive nodes
  • Empty node labels that lack fallback text
  • Gantt chart tasks missing required task identifiers
  • Additional edge case transformations (quote stripping, label merging)

Normalization Pipeline Architecture

The normalization pipeline is orchestrated by the normalize_mermaid_diagram function, which applies seven normalization passes in sequence. Each pass is idempotent and focuses on a specific syntax issue.

Pipeline Flow Diagram

graph TD
    Input["Raw Diagram Text\nfrom Next.js Payload"]
Step1["normalize_mermaid_edge_labels()\nFlatten multiline edge labels"]
Step2["normalize_mermaid_state_descriptions()\nFix state syntax"]
Step3["normalize_flowchart_nodes()\nClean node labels"]
Step4["normalize_statement_separators()\nInsert newlines"]
Step5["normalize_empty_node_labels()\nAdd fallback labels"]
Step6["normalize_gantt_diagram()\nAdd synthetic task IDs"]
Output["Normalized Diagram\nMermaid 11 Compatible"]
Input --> Step1
 
   Step1 --> Step2
 
   Step2 --> Step3
 
   Step3 --> Step4
 
   Step4 --> Step5
 
   Step5 --> Step6
 
   Step6 --> Output

Sources: python/deepwiki-scraper.py:385-393 python/deepwiki-scraper.py:230-383

Function Name to Normalization Step Mapping

Sources: python/deepwiki-scraper.py:385-393

Step 1: Edge Label Normalization

The normalize_mermaid_edge_labels function collapses multiline edge labels into single-line labels with escaped newline sequences. Mermaid 11 rejects edge labels that span multiple physical lines.

Function : normalize_mermaid_edge_labels(diagram_text: str) -> str

Pattern Matched : Edge labels enclosed in pipes: |....|

Transformations Applied :

  • Replace literal newline characters with spaces
  • Replace escaped \n sequences with spaces
  • Remove parentheses from labels (invalid syntax)
  • Collapse multiple spaces into single spaces
BeforeAfter
`A –>“Label\nLine 2”
`C –>Text (note)
`E –>First\nSecond\nThird

Implementation Details :

  • Only processes diagrams starting with graph or flowchart keywords
  • Uses regex pattern \|([^|]*)\| to match edge labels
  • Checks for presence of \n, (, or ) before applying cleanup
  • Preserves labels that are already properly formatted

Sources: python/deepwiki-scraper.py:230-251

Step 2: State Description Normalization

The normalize_mermaid_state_descriptions function ensures state diagram descriptions follow the strict State : Description syntax required by Mermaid 11.

Function : normalize_mermaid_state_descriptions(diagram_text: str) -> str

Pattern Matched : State declarations with colons in state diagrams

Transformations Applied :

  • Ensure single space after state name before colon
  • Replace newlines in descriptions with spaces
  • Replace additional colons in description with -
  • Collapse multiple spaces to single space
BeforeAfter
Idle:Waiting\nfor inputIdle : Waiting for input
Active:Processing:dataActive : Processing - data
Error : Multiple spacesError : Multiple spaces

Implementation Details :

  • Only processes diagrams starting with statediagram keyword
  • Skips lines containing :: (double colon, used for class names)
  • Splits each line on first colon occurrence
  • Requires both prefix and suffix to be non-empty after stripping

Sources: python/deepwiki-scraper.py:253-277

Step 3: Flowchart Node Normalization

The normalize_flowchart_nodes function removes reserved characters (especially pipe |) from flowchart node labels and adds statement separators.

Function : normalize_flowchart_nodes(diagram_text: str) -> str

Pattern Matched : Node labels in brackets: ["..."]

Transformations Applied :

  • Replace pipe characters | with forward slash /
  • Collapse multiple spaces to single space
  • Insert newlines between consecutive statements on same line
BeforeAfter
`Node[“LabelWith Pipes“]`
A["Text"] B["More"]A["Text"]
B["More"]
C["Many Spaces"]C["Many Spaces"]

Implementation Details :

  • Only processes diagrams starting with graph or flowchart keywords
  • Uses regex \["([^"]*)"\] to match quoted node labels
  • Inserts newlines after closing brackets/braces/parens using regex: (\"]|\}|\))\s+(?=[A-Za-z0-9_])
  • Preserves indentation when splitting statements

Sources: python/deepwiki-scraper.py:279-301

Step 4: Statement Separator Normalization

The normalize_statement_separators function inserts newlines between consecutive Mermaid statements that have been flattened onto a single line.

Function : normalize_statement_separators(diagram_text: str) -> str

Connector Tokens Recognized :

--> ==> -.-> --x x-- o--> o-> x-> *--> <--> <-.-> <-- --o

Pattern Matched : Whitespace before a node identifier that precedes a connector

Regex Pattern : STATEMENT_BREAK_PATTERN

BeforeAfter
A-->B B-->C C-->DA-->B
B-->C
C-->D
Node1-->Node2 Node3-->Node4Node1-->Node2
Node3-->Node4

Implementation Details :

  • Only processes diagrams starting with graph or flowchart keywords
  • Defines FLOW_CONNECTORS list of all Mermaid connector tokens
  • Builds regex pattern by escaping and joining connector tokens
  • Pattern: (?<!\n)([ \t]+)(?=[A-Za-z0-9_][\w\-]*(?:\s*\[[^\]]*\])?\s*(?:CONNECTORS)(?:\|[^|]*\|)?\s*)
  • Preserves indentation length when inserting newlines
  • Converts tabs to 4 spaces for consistent indentation

Sources: python/deepwiki-scraper.py:303-328 python/deepwiki-scraper.py:309-311

Step 5: Empty Node Label Normalization

The normalize_empty_node_labels function provides fallback text for nodes with empty labels, which Mermaid 11 rejects.

Function : normalize_empty_node_labels(diagram_text: str) -> str

Pattern Matched : Empty quoted labels: NodeId[""]

Transformation Applied :

  • Use node ID as fallback label text
  • Replace underscores and hyphens with spaces
  • Preserve original node ID for connections
BeforeAfter
Dead[""]Dead["Dead"]
User_Profile[""]User_Profile["User Profile"]
API-Gateway[""]API-Gateway["API Gateway"]

Implementation Details :

  • Regex pattern: (\b[A-Za-z0-9_]+)\[""\]
  • Converts underscores/hyphens to spaces for readable label: re.sub(r'[_\-]+', ' ', node_id)
  • Falls back to raw node_id if cleaned version is empty
  • Applied to all diagram types (not limited to flowcharts)

Sources: python/deepwiki-scraper.py:330-341 python/tests/test_mermaid_normalization.py:19-23

Step 6: Gantt Diagram Normalization

The normalize_gantt_diagram function assigns synthetic task identifiers to gantt chart tasks that are missing them, which is required by Mermaid 11.

Function : normalize_gantt_diagram(diagram_text: str) -> str

Pattern Matched : Task lines in format "Task Name" : start, end[, duration]

Transformation Applied :

  • Insert synthetic task ID (task1, task2, etc.) after colon
  • Only apply to tasks lacking valid identifiers
  • Preserve tasks that already have IDs or use after dependencies
BeforeAfter
"Design" : 2024-01-01, 2024-01-10"Design" : task1, 2024-01-01, 2024-01-10
"Code" : myTask, 2024-01-11, 5d"Code" : myTask, 2024-01-11, 5d (unchanged)
"Test" : after task1, 3d"Test" : after task1, 3d (unchanged)

Implementation Details :

  • Only processes diagrams starting with gantt keyword
  • Task line regex: ^(\s*"[^"]+"\s*):\s*(.+)$
  • Splits remainder on commas (max 3 parts)
  • Checks if first token matches ^[A-Za-z_][\w-]*$ or starts with after
  • Maintains counter (task_counter) for generating unique IDs
  • Reconstructs line: "{task_name}" : {task_id}, {start}, {end}[, {duration}]

Sources: python/deepwiki-scraper.py:343-383

Step 7: Additional Preprocessing

Before the seven main normalization steps, diagrams undergo additional preprocessing in the extraction phase:

Quote Stripping : strip_wrapping_quotes(diagram_text: str) -> str

  • Removes unnecessary quotation marks around edge labels: |"text"| → |text|
  • Removes quotes in state transitions: : "label" → : label

Label Merging : merge_multiline_labels(diagram_text: str) -> str

  • Collapses wrapped labels inside node shapes into \n sequences
  • Handles multiple shape types: (), [], {}, (()), [[]], {{}}
  • Skips lines containing structural tokens (arrows, keywords)
  • Applied before unescaping, so works with both real and escaped newlines

Sources: python/deepwiki-scraper.py:907-1023

Main Orchestrator Function

The normalize_mermaid_diagram function orchestrates all normalization passes in the correct order.

Function Signature : normalize_mermaid_diagram(diagram_text: str) -> str

Implementation :

Key Characteristics :

  • Each pass is idempotent and can be safely applied multiple times
  • Passes are independent and order-dependent
  • Edge label normalization must precede statement separator insertion
  • Flowchart node normalization includes its own statement separator logic
  • Empty label normalization should occur after other node transformations

Sources: python/deepwiki-scraper.py:385-393

graph TD
    Extract["extract_and_enhance_diagrams()"]
Loop["For each diagram match"]
Unescape["Unescape sequences\n(\\\n, \\ , \<, etc)"]
Preprocess["merge_multiline_labels()\nstrip_wrapping_quotes()"]
Normalize["normalize_mermaid_diagram()"]
Context["Extract context\n(heading, anchor text)"]
Pool["Add to diagram_contexts list"]
Extract --> Loop
 
   Loop --> Unescape
 
   Unescape --> Preprocess
 
   Preprocess --> Normalize
 
   Normalize --> Context
 
   Context --> Pool

Normalization Invocation Points

The normalization pipeline is invoked at a single location during diagram processing:

Invocation Context Diagram

Sources: python/deepwiki-scraper.py:880-1089 python/deepwiki-scraper.py:1058-1060

Testing and Validation

The normalization pipeline has dedicated unit tests covering each normalization function:

Test Coverage :

FunctionTest FileTest Cases
normalize_statement_separatorstest_mermaid_normalization.pyNewline insertion, indentation preservation
normalize_empty_node_labelstest_mermaid_normalization.pyEmpty label replacement
normalize_flowchart_nodestest_mermaid_normalization.pyPipe character stripping
normalize_mermaid_diagramtest_mermaid_normalization.pyEnd-to-end pipeline test

Example Test Case (Statement Separator Normalization):

End-to-End Test : The end-to-end test validates that multiple normalization steps work together correctly:

  • Input: graph TD\n Stage1[""] --> Stage2["Stage 2"]\n Stage2 --> Stage3 Stage3 --> Stage4
  • Validates empty label replacement: Stage1["Stage1"]
  • Validates statement separation: Stage2 --> Stage3 and Stage3 --> Stage4 on separate lines

Sources: python/tests/test_mermaid_normalization.py:1-42

Common Edge Cases

The normalization pipeline handles several edge cases that commonly occur in DeepWiki diagrams:

Empty Diagram Handling :

  • All normalizers check for empty/whitespace-only input
  • Return original text unchanged if stripped content is empty

Diagram Type Detection :

  • Each normalizer checks diagram type via first line keyword
  • normalize_mermaid_edge_labels: only processes graph or flowchart
  • normalize_mermaid_state_descriptions: only processes statediagram
  • normalize_gantt_diagram: only processes gantt
  • Other normalizers apply to all diagram types

Indentation Preservation :

  • Statement separator normalization preserves original indentation level
  • Converts tabs to 4 spaces for consistent formatting
  • Inserts newlines with matching indentation

Backtick Escaping in Fence Blocks : When injecting normalized diagrams into markdown, the injection logic dynamically calculates fence length to avoid conflicts with backticks inside diagram code:

  • Scans diagram for longest backtick run
  • Uses max(3, max_backticks + 1) as fence length

Sources: python/deepwiki-scraper.py:1249-1255

Dismiss

Refresh this wiki

Enter email to refresh