odf-kit ships sensible defaults for every internal stage that consumes external input. Users who need different behavior — a specific parser for compliance, a custom normalization scheme — substitute their own implementations through documented hooks. This document explains the architecture and the conventions that make substitution and adapter authoring predictable.
odf-kit core declares zero runtime dependencies and ships defaults that work correctly for input from modern toolchains. The substitution architecture is an opt-in escape hatch, not the recommended path. Most users never touch it. Users who do — typically because of compliance requirements or unusual input shapes — have a stable, documented contract to substitute against.
The architecture is also forward-compatible with sibling packages
(odf-kit-parse5, odf-kit-classic, etc.) that bundle a specific adapter
configuration as a one-install replacement for odf-kit.
| Stage | Option name | Contract type | Default implementation |
|---|---|---|---|
| HTML normalization | normalizer |
Normalizer (string → string) |
odfKitNormalizer |
| HTML/XML parsing | parser |
Parser (string → ParsedHtmlTree) |
odfKitParser |
Future stages will be added to this table as substitution hooks are introduced. The naming and structural conventions below apply uniformly.
Some stages can be skipped entirely (passing false as the option value).
Others can be substituted but not skipped. The rule is straightforward: a
stage can be skipped if and only if its output shape matches the next
stage's expected input shape.
| Stage | Input | Output | Skippable? |
|---|---|---|---|
| Normalizer | string | string | ✅ Yes — pass normalizer: false |
| Parser | string | tree | ❌ No — the walker needs a tree, not a string |
Skipping the normalizer is meaningful when the user knows their input is already polyglot/XHTML and Tier 1 normalization would be a no-op. The next stage (the parser) still gets a string, so the chain proceeds.
Skipping the parser would leave the next stage (the walker) with a string
instead of a tree — there is no coherent way to proceed. Users substituting
a parser must always supply one; the type system enforces this by declaring
parser?: Parser (without | false) on HtmlToOdtOptions.
General rule for future stages: when adding a substitutable stage,
decide whether false is a valid option by asking — does the next stage's
expected input shape match this stage's expected output shape if the stage
is skipped? If yes, allow false. If no, the option type omits false
and the type system enforces the requirement.
Six categories of names need consistent rules. Future substitutable stages reuse these conventions verbatim.
Rule: Named after the output of the stage, with a Parsed or
Normalized prefix indicating what the function produces.
Examples (current and future):
ParsedHtmlTree— output of any HTML parserNormalizedHtml— output of any normalizer (string alias)ParsedDocxResult— output of any DOCX reader (future)ParsedXlsxResult— output of any XLSX reader (future)ExtractedZipEntries— output of any ZIP unpacker (future)
The name describes what comes out, which is what implementations must produce.
Rule: Lowercase camelCase, named after the role the substituted function plays. Shortest unambiguous name within the option object's context.
Examples:
parser— onHtmlToOdtOptionsnormalizer— onHtmlToOdtOptionsdocxReader— on a futureDocxToOdtOptions
The user thinks "I want to plug in my own parser," not "I want to substitute
a ParsedHtmlTree-producing function."
Rule: odfKit<Role> for functions, OdfKit<Role> for classes.
Examples:
odfKitNormalizer— built-in default normalizerodfKitParser— built-in default parserodfKitDocxReader— future built-in default DOCX reader
The explicit prefix marks the function as the package's default and lets users opt back in:
htmlToOdt(html, { parser: odfKitParser }); // explicit default
htmlToOdt(html, { parser: someOtherParser }); // substitutedRule: from<Library> for output adapters (library output → odf-kit
contract type). to<Library> for input adapters (odf-kit input → library
expected shape). Both forms use <Library> as the source library name in
PascalCase.
Examples:
fromParse5— converts parse5's tree toParsedHtmlTreefromHtmlparser2— converts htmlparser2's tree toParsedHtmlTreefromDom— converts a W3C DOM toParsedHtmlTree(covers browserDOMParser,linkedom,jsdom)fromMammoth— converts mammoth's output toParsedDocxResult(future)fromSheetJS— converts SheetJS's workbook toParsedXlsxResult(future)
If a single library covers multiple stages, role disambiguation is appended:
fromParse5Html, fromParse5Xml.
The from/to prefixes are mnemonic: "from parse5 to the contract" or "to
parse5 from the contract." Reads naturally in code:
htmlToOdt(html, { parser: fromParse5(parse5.parse) });For odf-kit's currently-substitutable stages, only from<Library> adapters
apply — both the normalizer and parser stages have universal input shapes
(string in), so input adapters aren't needed. The to<Library> form is
reserved for future stages with structured input. See "The Two-Direction
Adapter Principle" below.
Rule: src/adapters/<role>/from-<library>.ts (or to-<library>.ts).
Examples:
src/adapters/parser/from-parse5.tssrc/adapters/parser/from-htmlparser2.tssrc/adapters/parser/from-dom.tssrc/adapters/normalizer/from-<library>.ts(future)src/adapters/docx-reader/from-mammoth.ts(future)
Rule: tests/conformance/<role>.test.ts.
Each conformance file exports a runner function that takes an implementation and runs the full battery against it:
// tests/conformance/parser.test.ts
export function runParserConformance(parser: Parser, suiteName: string) {
describe(`${suiteName} — parser conformance`, () => {
test("parses a single element", () => { /* ... */ });
test("rejects unclosed tags", () => { /* ... */ });
// ~30 cases
});
}
// Run against odf-kit's default
runParserConformance(odfKitParser, "odf-kit default parser");When future adapters are written, they run the same suite:
import { runParserConformance } from "odf-kit/tests/conformance/parser";
import { fromParse5 } from "./from-parse5.js";
runParserConformance(fromParse5(parse5.parse), "parse5 adapter");Same suite, different implementation. Conformance is mechanical and verifiable.
| Category | Convention | Example |
|---|---|---|
| Contract type | Parsed<Subject> or Normalized<Subject> |
ParsedHtmlTree, NormalizedHtml |
| Option name | role, lowercase camelCase, shortest unambiguous | parser, normalizer |
| Default implementation | odfKit<Role> |
odfKitParser, odfKitNormalizer |
| Adapter function (output direction) | from<Library> |
fromParse5, fromHtmlparser2 |
| Adapter function (input direction) | to<Library> |
(future, when needed) |
| Adapter file | src/adapters/<role>/from-<library>.ts or to-<library>.ts |
src/adapters/parser/from-parse5.ts |
| Conformance test | tests/conformance/<role>.test.ts |
tests/conformance/parser.test.ts |
Substitution boundaries need adapters wherever the shapes differ. For each substitutable stage, there are potentially two conversion points: