Conversion options

All options are set on Builder before calling .build().

Preset

Builder::with_preset(preset) configures a coherent set of defaults:

PresetDictionaryInitial sound lawHomophone window
Preset::KoKrBundled stdicttrueContextWindow::PerBlock
Preset::KoKpNonefalseContextWindow::Off

Individual options below override the preset.

Segmentation strategy

use gukhanmun::SegmentationStrategy;

builder.segmentation(SegmentationStrategy::Lattice);  // default
builder.segmentation(SegmentationStrategy::Eager);

Lattice finds the globally optimal segmentation using dynamic programming. Eager is a greedy left-to-right longest-match; faster but less accurate for compound words.

Numeral handling

NumeralStrategy controls how hanja numeral characters such as 二〇一六 are rendered. Chinese-style numerals can represent numbers in positional or additive notation depending on context:

Variant二〇一六年十一月一千二百三十四
HangulPhonetic이공일륙년십일월일천이백삼십사
PositionalArabic2016년
AdditiveArabic11월1234
Smart2016년11월1234
use gukhanmun::NumeralStrategy;

builder.numerals(NumeralStrategy::HangulPhonetic);   // default: 이공일륙
builder.numerals(NumeralStrategy::PositionalArabic); // 2016 (year-like)
builder.numerals(NumeralStrategy::AdditiveArabic);   // 11 (additive)
builder.numerals(NumeralStrategy::Smart);            // picks best per context

Smart chooses positional notation for year-like four-digit sequences and additive notation for quantities; use it for general-purpose documents.

Initial sound law

builder.initial_sound_law(true);   // enabled (Preset::KoKr default)
builder.initial_sound_law(false);  // disabled (Preset::KoKp default)

Applies the South Korean phonetic rule (頭音法則) to fallback readings for characters not found in any dictionary:

InputLaw enabled (KoKr)Law disabled (KoKp)
來日내일래일
理由이유리유
女子여자녀자

Homophone disambiguation window

When the same hanja character appears multiple times, Gukhanmun can mark repeated occurrences to help readers distinguish homophones. homophone_window sets the scope across which repetitions are tracked:

ValueBehaviour
ContextWindow::OffNo disambiguation tracking
ContextWindow::PerBlock (KoKr default)Reset at paragraph, list, and heading boundaries
ContextWindow::PerSectionReset at heading boundaries only
ContextWindow::PerDocumentTrack across the entire input
use gukhanmun::ContextWindow;

builder.homophone_window(ContextWindow::Off);
builder.homophone_window(ContextWindow::PerBlock);    // default for KoKr
builder.homophone_window(ContextWindow::PerSection);
builder.homophone_window(ContextWindow::PerDocument);

Wider windows are appropriate for dense hanja texts where the same character recurs across many sections.

First-occurrence clearing window

When enabled, first-occurrence clearing stops annotating a hanja after its first occurrence within the window. This is useful for documents that introduce each character once and then use it freely; subsequent occurrences are left as plain hangul without parenthetical hanja.

ValueBehaviour
ContextWindow::Off (default)Never clear; annotate every occurrence
ContextWindow::PerBlockClear within the same paragraph/block
ContextWindow::PerSectionClear within the same section
ContextWindow::PerDocumentClear across the entire document
builder.first_occurrence_window(ContextWindow::Off);        // default
builder.first_occurrence_window(ContextWindow::PerBlock);
builder.first_occurrence_window(ContextWindow::PerSection);
builder.first_occurrence_window(ContextWindow::PerDocument);

Error recovery

use gukhanmun::Recovery;

builder.recovery(Recovery::Strict);   // default: abort on error
builder.recovery(Recovery::Lenient);  // skip problematic fragments

Relevant for HTML conversion; plain text and Markdown do not produce recoverable errors.