[codex] Implement resolver service and /v1/subjects/resolve#4
[codex] Implement resolver service and /v1/subjects/resolve#4
Conversation
…3.1) Introduces services/resolver with the typed response envelope every resolver call must return, ahead of the actual resolution logic (fra-6al.3.2/.3/.4). ResolverEnvelope is a discriminated union keyed on outcome: - resolved: canonical SubjectRef + confidence + canonical_kind (the promotion receipt spec §6.1 requires so downstream knows which identity level was chosen) + optional lower-confidence alternatives. - ambiguous: >=2 candidates ranked by confidence desc, with an optional ambiguity_axis (issuer_vs_listing, multiple_listings, etc.) so P0.3b can key disambiguation prompts on the mode. - not_found: normalized_input preserved for debug/logging. Factory functions (resolved/ambiguous/notFound) enforce invariants TS can't: confidence in [0,1], ambiguous requires >=2 candidates, candidates ranked desc, alternatives on a resolved envelope cannot out-rank the chosen target (which would be silent winner-picking — the exact failure mode the spec calls out). 12 Node-native tests (no DB dependency) cover each outcome shape, each invariant's rejection path, type-guard exclusivity, and the bead verification examples (GOOG ambiguous, AAPL resolved with canonical listing, NOTREAL not_found).
…e of truth Review pass on d981184: - Make SUBJECT_KINDS the authoritative const tuple and derive SubjectKind as `typeof SUBJECT_KINDS[number]`. Adding or renaming a kind is now a single edit; the previous split array + literal union could drift silently because TS doesn't compare them. - Add a drift test that reads spec/finance_research_block_schema.json's $defs.SubjectKind.enum and asserts it matches SUBJECT_KINDS exactly. The TS enum is the fourth restatement of this list (alongside the Postgres enum, OpenAPI schema, and JSON block schema); this test catches the case where one of those moves and the others don't. - Rename the bead-verification test to "spec §6.1 examples: ..." and drop the redundant shape assertions — the earlier tests already cover those. Its remaining value is executable documentation of the spec's named examples, which the new title makes explicit.
Second simplify pass on c64722f: - Collapse the subject-ref.ts SUBJECT_KINDS comment to the one load-bearing line: a cross-reference to the drift test that keeps this list in sync with the spec JSON schema. The previous two sentences restated what `as const` + `typeof[number]` already visibly do. - Drop the 3-line narration above the "spec §6.1 examples" test; the renamed title already says what the comment said. - Replace the defensive optional-chain + Array.isArray guard in the drift test with a typed JSON.parse cast. assert.deepEqual fails just as loudly with a clearer diff if the schema ever drifts.
…e.test Fresh-eyes cleanup: `googlClistingClass` contained a typo (accidental "Cl" before "isting") and the paired `googListingClass` didn't indicate the Class C vs Class A distinction. The Class info already lives in each display_name string, so shorten to `googlListing` (Class A) and `googListing` (Class C) — both clearer and typo-free.
…(fra-6al.3.2)
Adds src/normalize.ts — a pure function converting raw lookup input
into a NormalizedQuery with three axis-specific forms:
- ticker_candidate: trimmed input, uppercased, only set when there's
no internal whitespace. Preserves every distinguishing character
(letters, digits, dots, hyphens) so GOOG/GOOGL and BRK.A/BRK.B
never collapse. Feeds listings.ticker lookups.
- name_candidate: lowercase, punctuation stripped, whitespace
collapsed. Still preserves letters and digits, so "goog" and
"googl" stay distinct at this axis too. Unicode letter class via
\p{L}\p{N} so accented characters survive. Feeds issuer
name/alias lookups.
- identifier_hint: pattern-level classification for CIK (1-10
digits, leading zeros stripped), ISIN (2 letters + 9 alphanumeric
+ 1 digit), LEI (20 alphanumeric). No database lookup — that
belongs to 3.3, which routes the hint to issuers.cik /
instruments.isin / issuers.lei.
15 tests cover the bead's central no-collapse invariant (GOOG vs
GOOGL, BRK.A vs BRK.B), case and whitespace insensitivity for the
same logical input, Unicode letter survival, CIK padding
normalization so "320193" and "0000320193" produce the same hint,
mutually exclusive identifier pattern matching, and the "google"
premise — the input normalizes to a form that can feed issuer,
GOOG, and GOOGL matching paths separately once 3.3 is wired.
Simplify pass on 31ecf58: - Document the disjointness invariant in detectIdentifier: the three patterns don't overlap by charset/length, so check order isn't semantic. Prevents a future edit from accidentally introducing an order-dependent classification bug. - Replace the `replace(/^0+/, "") || "0"` one-liner with an explicit length check. The trick worked (strip → falsy-or recovers the "0" edge case) but required two hops of reasoning; `stripped.length === 0` is self-evident. - Rename the "mutually exclusive by length and charset" test to "each identifier pattern classifies to its own kind" — the body asserts classification, not exclusion, so the old title oversold. - Add a pinning test that "AAPL," → ticker_candidate "AAPL,". This trailing-punctuation case is intentional graceful failure (it won't match listings.ticker, falls through to name matching), but the old code didn't document or test it. Future edits stripping punctuation from ticker_candidate would silently start collapsing share-class suffixes like "BRK.B" — this test catches that.
Adds src/lookup.ts with four entry-point resolvers plus a resolveByInput dispatcher over a discriminated-union input. Each hits the canonical DB column (listings.ticker, issuers.cik, instruments.isin, issuers.lei) and returns a ResolverEnvelope. Key design choices: - "Same canonical Apple identity" ≠ same identity level: ticker promotes to listing, CIK/LEI to issuer, ISIN to instrument. Cross-family equivalence means the three traces converge on one issuer via FKs (listings.instrument_id → instruments.issuer_id). - Unique-indexed identifiers (CIK/LEI/ISIN) carry confidence 0.99 — DB uniqueness is the guarantee. Ticker single-match caps at 0.95 because spec §4.1 says ticker is a locator, not identity. Ticker ambiguity returns 0.5 per candidate. - Ticker ambiguity axis: multiple_issuers when distinct issuer_ids appear in the result set; multiple_listings otherwise. - CIK padding normalization applies on read: "320193" and "0000320193" both strip to "320193" so the indexed lookup sees one key shape. Seeds store the stripped form. - Minimal QueryExecutor type (pg.Client / pg.Pool both satisfy it) so tests can later stub without dragging pg's full type surface. 6 Docker-backed integration tests cover the bead's verification clause: cross-family equivalence (ticker/CIK/ISIN/LEI all trace to one Apple issuer), CIK leading-zero normalization, case- insensitive ISIN/LEI lookup, unknown identifiers → not_found with structured reason codes, MIC filtering, and multiple_issuers axis when historical ticker reuse is present. Reuses the shared db/test/docker-pg.ts harness via relative import.
…n types Simplify pass on 9c6d4a4: - Extract normalizeCik() from normalize.ts; lookup.ts now imports it instead of carrying a byte-for-byte duplicate. The "320193" vs "0000320193" rule has one source of truth and the rationale comment lives with the canonical function. - Promote bootstrapDatabase() to db/test/docker-pg.ts. seed.test.ts and lookup.test.ts now share the same container+schema setup. lookup.test.ts composes it with a new local connectedClient() helper for pg.Client management. - Extract resolveByIssuerIdentifier() in lookup.ts. resolveByCik and resolveByLei collapse from ~23 lines each to 7 — they now differ only in the normalized value, column name, and reason code. ISIN stays bespoke because of the instruments→issuers join and share_class handling. - Move CONFIDENCE_UNIQUE_IDENTIFIER / CONFIDENCE_TICKER_SINGLE / CONFIDENCE_TICKER_AMBIGUOUS from lookup.ts to envelope.ts. The confidence invariant (assertConfidence) lives in envelope.ts, so pairing the values with the contract lets future resolvers (name matcher, fuzzy) import the same ceiling. - Enumerate NotFoundReason as a union ("unknown_ticker" | "unknown_cik" | "unknown_lei" | "unknown_isin" | "no_candidates" | "other"). Replaces the string-typed reason and catches typos at every callsite. - Flatten the ternary-with-await in resolveByTicker to an if/else. Easier to read than async-ternary; same SQL shape. - Drop individual exports of TickerInput/CikInput/IsinInput/LeiInput. They're implementation-internal variants of the ResolverInput union and nothing external imports them. - Fix mic_filtered → micFiltered (snake_case stray in an otherwise camelCase file). 57 tests green: 29 unit + 6 lookup integration in services/resolver; 22 in db (seed.test.ts still passes through the shared harness).
- normalizeCik("") returned "0". Empty input now stays empty. The
'0' fallback was only meant for non-empty all-zeros ("0000" → "0"),
not for missing input. Without this, resolveByCik("") returned a
not_found envelope with normalized_input: "0" — misrepresenting
what the caller actually passed. Pinned with a direct test covering
empty, "0", "0000", "0000320193", "320193".
- resolveByInput's switch had no default, so under --experimental-
strip-types a future ResolverInput variant would silently fall
through and resolve to Promise<undefined>. Added a never-typed
default that catches both at compile time (when tsc runs) and at
runtime with a clear error.
…l.3.4)
Adds src/http.ts wiring the three prior beads (envelope + lookup +
normalize) to an HTTP endpoint matching the contract in
spec/finance_research_openapi.yaml.
Four layers:
- validateResolveRequest: shape-checks `text: string` (required) and
`allow_kinds: SubjectKind[]` (optional), returning a tagged-union
{valid, request|error} so the server layer can map failures to 400
without throwing.
- dispatchFreeText: runs normalize() then routes by axis —
identifier_hint (CIK/ISIN/LEI) takes priority, ticker_candidate
falls through if no identifier match. Name/alias lookup is
deliberately deferred to fra-6al.4 (search-to-subject flow) where
the alias plane lives.
- envelopeToSubjects: maps the internal 3-way ResolverEnvelope to the
flat {subjects[], unresolved[]} shape the OpenAPI specifies.
Ambiguous outcomes spread every candidate into subjects[] at equal
confidence (0.5) instead of packing them into one primary's
alternatives — preserves spec §6.1's no-silent-winner rule by
making the client apply its own threshold.
- createResolverServer: thin node:http wrapper. 404 for anything but
POST /v1/subjects/resolve; 400 for malformed JSON or invalid
request shape; 500 with message for unexpected errors.
14 tests: 4 validation unit tests (no DB), 5 handler-level tests
with a live db client (ticker→listing, CIK→issuer, allow_kinds
filter, unknown→unresolved, ambiguous ticker), 5 end-to-end server
tests that fetch the endpoint and assert the OpenAPI-shaped response
body via a structural assertion helper. Shape helper walks each
subject, checks subject_ref.{kind,id}, display_name, confidence, and
alternatives (SubjectRef[] projection, no display/confidence).
- dispatchFreeText's inner switch on identifier_hint.kind had no default. Under --experimental-strip-types, a future IdentifierHint variant (figi/cusip/etc.) would silently fall through to the ticker_candidate branch — semantically wrong for an identifier that deserves its own lookup axis. Added a never-typed default matching the fix applied earlier to resolveByInput. - createResolverServer's per-request body read was not inside any try/catch. If a client disconnects mid-POST, readBody throws and the rejection escapes the async handler. Node 15+ crashes the process on unhandled rejections by default — unacceptable for a long-lived server. Wrapped the entire handler body in a single try/catch that maps unexpected errors to 500 (or skips responding if headers already sent). Removed the redundant inner try/catch around handleResolveSubjects.
…se notFound factory Simplify pass on 363afb7 + bf9e0f3: - Promote connectedClient(t, databaseUrl) to db/test/docker-pg.ts. Third identical copy (lookup.test.ts + http.test.ts) folds into one shared helper next to bootstrapDatabase, which already owns the test lifecycle. - Replace the hand-rolled WritableResponse structural type in services/resolver/src/http.ts with ServerResponse from node:http. The subset bought nothing — the callsite always passes a real ServerResponse — and added a concept to track. - Use the notFound() envelope factory in dispatchFreeText instead of returning a raw literal. Keeps the NotFoundReason discipline centralized; if the factory ever gains validation or a new default field, this callsite stays in sync. - Add a postResolve(base, body) helper in http.test.ts and collapse four near-identical fetch blocks through it. The wrong-method and wrong-path edge cases keep raw fetch since they target different endpoints. - Harmonize the http.test.ts seedAppleChain fixture to match lookup.test.ts's parameterized form. The drift flagged by the reuse review is resolved without extracting to a shared fixture module (agent recommended deferring to a third caller).
📝 WalkthroughWalkthroughAdds a new resolver service under Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Server as HTTP Server
participant Normalize
participant Lookup
participant DB as Database
Client->>Server: POST /v1/subjects/resolve { text, allow_kinds? }
Server->>Server: validateResolveRequest()
Server->>Normalize: normalize(text)
Normalize-->>Server: NormalizedQuery{ticker_candidate?, identifier_hint?, name_candidate}
alt identifier_hint present
Server->>Lookup: resolveByInput(identifier_hint)
else ticker candidate available
Server->>Lookup: resolveByTicker(ticker_candidate)
else
Server->>Lookup: resolveByTicker(name_candidate)
end
Lookup->>DB: query(listings/instruments/issuers)
DB-->>Lookup: rows[]
alt rows.length == 0
Lookup->>Lookup: build not_found envelope
else rows.length == 1
Lookup->>Lookup: build resolved envelope
else
Lookup->>Lookup: build ambiguous envelope
end
Lookup-->>Server: ResolverEnvelope
Server->>Server: project to ResolvedSubject[], apply allow_kinds filter
Server-->>Client: 200 { subjects[], unresolved? }
sequenceDiagram
participant Input
participant Normalize
participant Classifier
participant Lookup
participant Envelope
Input->>Normalize: raw string
Normalize->>Normalize: trim, uppercase/no-whitespace ticker candidate
Normalize->>Normalize: name_candidate (lowercase, strip punctuation)
Normalize->>Classifier: detect CIK/ISIN/LEI patterns
Classifier-->>Normalize: identifier_hint?
Normalize-->>Lookup: NormalizedQuery
Lookup->>Lookup: dispatch by identifier_hint or ticker/name
Lookup->>Envelope: construct resolved/ambiguous/not_found with invariants
Envelope-->>Lookup: ResolverEnvelope
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
services/resolver/test/http.test.ts (1)
307-331: Monkey-patchingconsole.errorglobally can leak across concurrent tests.Node's test runner may run files concurrently, and
console.erroris shared process-wide. If another test file logs during this test's window, those logs will be captured intologged(and silenced from the real console). Consider usingt.mock.method(console, "error", ...)(built into node:test) so the mock is scoped to this test and automatically restored, or spying via a stream wrapper.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/resolver/test/http.test.ts` around lines 307 - 331, The test currently monkey-patches console.error by assigning to console.error and restoring it in t.after which can leak across concurrent tests; replace this with node:test's scoped mocking using t.mock.method(console, "error", mockFn) (or t.mock.method(console, "error", (...args)=>logged.push(args)) ) and remove originalConsoleError and t.after restore logic; keep the existing assertions but read call args from the mock (or logged array populated by the scoped mock) so only this test captures console.error calls and they are automatically restored when the test ends—update references to console.error, logged, originalConsoleError, and t.after accordingly.services/resolver/src/http.ts (2)
125-135: Ticker-only not-found path loses the more specificunknown_tickerreason.When
n.ticker_candidateis set andresolveByTickerreturns anot_foundenvelope (withreason: "unknown_ticker"and a ticker-normalizednormalized_input), the code falls through and returns a freshnotFound({ reason: "no_candidates" })built fromn.trimmed. Consumers lose the axis-specific reason. Consider returning the ticker's not-found envelope when it's the only axis that ran.♻️ Proposed refinement
if (n.ticker_candidate) { const envelope = await resolveByTicker(db, n.ticker_candidate); if (!isNotFound(envelope)) return envelope; if (identifierEnvelope) return identifierEnvelope; + return envelope; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/resolver/src/http.ts` around lines 125 - 135, When a ticker lookup runs (n.ticker_candidate) and resolveByTicker returns a not-found envelope (e.g., reason "unknown_ticker"), the code currently discards it and returns a generic notFound({normalized_input: n.trimmed, reason: "no_candidates"}); update the branch so that after calling resolveByTicker you return the ticker envelope when it is not found and no more-specific identifierEnvelope exists: keep the early return for successful envelopes (if (!isNotFound(envelope)) return envelope), but change the subsequent check to if (!identifierEnvelope) return envelope (instead of falling through to notFound), thereby preserving the ticker-specific not_found reason and normalized_input from resolveByTicker.
83-90: Consider semantics of emptyallow_kindsarray andunresolvedmessaging when filtered.Two small concerns in
handleResolveSubjects:
- An empty
allow_kinds: []is accepted byvalidateResolveRequestbut then treated as "no restriction" here (length > 0is false). If callers pass[]meaning "allow nothing", results will silently leak. Consider rejecting empty arrays in validation or treating empty as no-match.- When subjects are filtered out by
allow_kinds(envelope wasresolved/ambiguous, notnot_found),unresolvedgetsrequest.textwith no signal that the match existed but was kind-filtered. For this MVP scope that may be acceptable, but worth a comment noting the asymmetry.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/resolver/src/http.ts` around lines 83 - 90, The code in handleResolveSubjects treats an empty request.allow_kinds array as "no restriction" which can leak matches; update validation or handling: either modify validateResolveRequest to reject an empty allow_kinds (require undefined or non-empty array) or change the subjects filtering logic in handleResolveSubjects to treat an explicit empty array as "allow nothing" (i.e., produce no subjects). Also address the unresolved messaging asymmetry by clarifying intent: add a brief comment in handleResolveSubjects near the subjects/unresolved logic (referencing variables subjects, unresolved, envelope, and request.text) stating that when subjects are filtered out by allow_kinds we intentionally return request.text (not a not-found message) or change unresolved to include a signal that the match was kind-filtered if you want explicit feedback.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@services/resolver/src/http.ts`:
- Around line 125-135: When a ticker lookup runs (n.ticker_candidate) and
resolveByTicker returns a not-found envelope (e.g., reason "unknown_ticker"),
the code currently discards it and returns a generic notFound({normalized_input:
n.trimmed, reason: "no_candidates"}); update the branch so that after calling
resolveByTicker you return the ticker envelope when it is not found and no
more-specific identifierEnvelope exists: keep the early return for successful
envelopes (if (!isNotFound(envelope)) return envelope), but change the
subsequent check to if (!identifierEnvelope) return envelope (instead of falling
through to notFound), thereby preserving the ticker-specific not_found reason
and normalized_input from resolveByTicker.
- Around line 83-90: The code in handleResolveSubjects treats an empty
request.allow_kinds array as "no restriction" which can leak matches; update
validation or handling: either modify validateResolveRequest to reject an empty
allow_kinds (require undefined or non-empty array) or change the subjects
filtering logic in handleResolveSubjects to treat an explicit empty array as
"allow nothing" (i.e., produce no subjects). Also address the unresolved
messaging asymmetry by clarifying intent: add a brief comment in
handleResolveSubjects near the subjects/unresolved logic (referencing variables
subjects, unresolved, envelope, and request.text) stating that when subjects are
filtered out by allow_kinds we intentionally return request.text (not a
not-found message) or change unresolved to include a signal that the match was
kind-filtered if you want explicit feedback.
In `@services/resolver/test/http.test.ts`:
- Around line 307-331: The test currently monkey-patches console.error by
assigning to console.error and restoring it in t.after which can leak across
concurrent tests; replace this with node:test's scoped mocking using
t.mock.method(console, "error", mockFn) (or t.mock.method(console, "error",
(...args)=>logged.push(args)) ) and remove originalConsoleError and t.after
restore logic; keep the existing assertions but read call args from the mock (or
logged array populated by the scoped mock) so only this test captures
console.error calls and they are automatically restored when the test
ends—update references to console.error, logged, originalConsoleError, and
t.after accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: eb2f247d-5034-4bb1-ac2d-abcb4d95bebf
📒 Files selected for processing (10)
.beads/issues.jsonldb/test/docker-pg.tsservices/resolver/README.mdservices/resolver/package.jsonservices/resolver/src/envelope.tsservices/resolver/src/http.tsservices/resolver/src/lookup.tsservices/resolver/test/envelope.test.tsservices/resolver/test/http.test.tsservices/resolver/test/lookup.test.ts
✅ Files skipped from review due to trivial changes (3)
- services/resolver/package.json
- services/resolver/README.md
- .beads/issues.jsonl
🚧 Files skipped from review as they are similar to previous changes (2)
- services/resolver/test/lookup.test.ts
- services/resolver/src/lookup.ts
What changed
services/resolverpackage for the P0.3 identity resolverresolved | ambiguous | not_found)SubjectRefoutputsPOST /v1/subjects/resolveHTTP handler matching the OpenAPI contractWhy
The app needs a deterministic identity boundary that converts user-entered lookup text and identifier-bearing records into canonical issuer / instrument / listing references. This PR lands that resolver surface and fixes a stale-listing bug where historical ticker rows could create false ambiguity or return outdated symbols.
Impact
/v1/subjects/resolvenow has working normalization, lookup, and HTTP coverageValidation
node --experimental-strip-types --test services/resolver/test/lookup.test.tsLocal note:
cd services/resolver && npm testis currently flaky in this environment because several Docker-backed tests terminate Postgres with57P01, so I am not claiming a clean full-suite pass from this machine.Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests
Chores