Conversion options
All options are set on Builder before calling .build().
Preset
Builder::with_preset(preset) configures a coherent set of defaults:
Individual options below override the preset.
Segmentation strategy
Lattice finds the globally optimal segmentation using dynamic programming.
Eager is a greedy left-to-right longest-match; faster but less accurate for
compound words.
Numeral handling
NumeralStrategy controls how hanja numeral characters such as 二〇一六 are
rendered. Chinese-style numerals can represent numbers in positional or
additive notation depending on context:
Smart chooses positional notation for year-like four-digit sequences and
additive notation for quantities; use it for general-purpose documents.
Initial sound law
Applies the South Korean phonetic rule (頭音法則) to fallback readings for characters not found in any dictionary:
Homophone disambiguation window
When the same hanja character appears multiple times, Gukhanmun can mark
repeated occurrences to help readers distinguish homophones.
homophone_window sets the scope across which repetitions are tracked:
Wider windows are appropriate for dense hanja texts where the same character recurs across many sections.
First-occurrence clearing window
When enabled, first-occurrence clearing stops annotating a hanja after its first occurrence within the window. This is useful for documents that introduce each character once and then use it freely; subsequent occurrences are left as plain hangul without parenthetical hanja.
Error recovery
Relevant for HTML conversion; plain text and Markdown do not produce recoverable errors.