Skip to content

Commit 6d496fd

Browse files
authored
Merge pull request #1462 from Open-Source-Legal/claude/cross-content-search-gi0zc
Add cross-content Discover search with notes integration
2 parents 37a40ae + 6acd6b7 commit 6d496fd

22 files changed

Lines changed: 2873 additions & 14 deletions

CHANGELOG.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,49 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
99

1010
### Added
1111

12+
- **Cross-content Discover search** at `/discover/search`. The Discover hero
13+
search box now lands on a unified results page that fans out across
14+
discussions, annotations, collections (corpuses), and notes in parallel,
15+
rather than redirecting to discussion-only search.
16+
- Backend: new `searchNotesForMention` resolver in
17+
`config/graphql/search_queries.py` (visibility via
18+
`Note.objects.visible_to_user`, optional `corpusId` / `documentId`
19+
scoping, eager-loads `document`, `corpus`, and creators).
20+
`NoteType` is exported through the same `DjangoConnectionField` shape
21+
used by the other mention searches and goes through `NoteType.get_queryset`
22+
for a second visibility pass.
23+
- Frontend view: `frontend/src/views/DiscoverSearchResults.tsx` reuses
24+
`ThreadListItem` for discussions and a single shared `ResultRow` for
25+
annotations / collections / notes. Tabbed UI: All (preview of 5 each),
26+
Discussions, Annotations, Collections, Notes.
27+
- Routing: `?q=` and `?type=` are URL-driven (replace, no history spam).
28+
Result rows deep-link via `getCorpusUrl` / `getDocumentUrl`; annotation
29+
rows preserve `?ann=`, note rows pass `?note=` to the document URL.
30+
- New `?note=<id>` deep-link param wired through `selectedNoteId`
31+
(`frontend/src/graphql/cache.ts`), `QueryParams.noteId`
32+
(`frontend/src/utils/navigationUtils.ts`), and the central manager
33+
Phase 2/4 sync (`frontend/src/routing/CentralRouteManager.tsx`).
34+
`DocumentKnowledgeBase` now consumes the var: when the loaded
35+
document contains a note matching the URL id, the existing note
36+
detail modal opens once and the param is consumed (one-shot
37+
deep-link, mirrored back to the URL via Phase 4).
38+
- Hero wiring: `NewHeroSection.handleSearchSubmit` now navigates to
39+
`/discover/search?q=...`. The legacy `/discussions?search=...` route is
40+
preserved for callers that want a discussion-only listing.
41+
- Tests: `MentionSearchTestCase` in
42+
`opencontractserver/tests/test_mentions.py` covers visibility
43+
filtering, content-substring matching, corpus / document scoping,
44+
anonymous-user filtering, wrong-type global-id rejection, the
45+
`-modified` ordering contract, and both
46+
`content_preview` paths (DB-annotated `Left('content', 400)` for
47+
search results, Python fallback for per-id note fetch).
48+
- Component test: `frontend/tests/DiscoverSearchResults.ct.tsx`
49+
covers the empty-prompt state, the four-section header render
50+
after typing, and a populated-results render exercising every
51+
section's row primitives. Captures both
52+
`discover--search-results--empty-prompt.png` and
53+
`discover--search-results--with-results.png` for docs.
54+
1255
- **Loud guardrail against the `system_prompt=` foot-gun in pydantic-ai** (Issue #1451): `pydantic_ai.Agent` accepts both `system_prompt=` and `instructions=`, but the `system_prompt` value is *only* materialised into the model request when `message_history` is `None`. OpenContracts' `chat()` flow always persists the user's HUMAN message before calling `Agent.run()`, so `message_history` is never empty in practice and any `system_prompt=` argument is silently dropped — the LLM runs without any system instruction. CLAUDE.md pitfall #14 documented the workaround (use `instructions=`), but a future pydantic-ai bump that renames or re-precedences these parameters could re-introduce the regression silently.
1356
- **Single construction path** (`opencontractserver/llms/agents/pydantic_ai_factory.py`): new `make_pydantic_ai_agent(...)` factory is now the only place in the codebase that instantiates `pydantic_ai.Agent`. The factory uses a sentinel-based check (not `is not None`) to refuse `system_prompt=` outright — even `system_prompt=None` raises `TypeError` so the lesson cannot be re-learned by accident. The error message references issue #1451 and CLAUDE.md pitfall #14.
1457
- **All call sites refactored** (in `opencontractserver/llms/agents/pydantic_ai_agents.py`: `_run_structured_extraction`, the document-agent factory, and the corpus-agent factory; in `opencontractserver/tasks/memory_tasks.py`: `summarise_agent` and `curation_agent`). Five direct `PydanticAIAgent(...)` constructions in production code now route through the factory.
@@ -31,6 +74,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
3174

3275
### Fixed
3376

77+
- **Discover search section header miscounted threads when soft-deleted ones were present** (`frontend/src/views/DiscoverSearchResults.tsx`): the discussions section displayed `data.conversations.totalCount` as the count, but rendered rows are filtered client-side via `!n?.deletedAt`. With even one tombstoned thread in the results, the header read e.g. "5 threads" while only 4 rows appeared. Switched the count to `threads.length` (post-filter) so the badge always matches what the user sees.
78+
79+
- **`?note=<id>` deep-link could pin in the URL forever for documents with no notes** (`frontend/src/components/knowledge_base/document/DocumentKnowledgeBase.tsx`): the auto-open effect early-returned on `notes.length === 0`, which is *also* true while the document query is loading. The intent comment said "Clear regardless of match" but the early return skipped the clear call entirely. If the underlying document genuinely had no notes (or the loaded notes simply did not contain the deep-linked id) the reactive var stayed set, so `CentralRouteManager` Phase 4 kept writing `?note=<id>` back into the URL on every render — the deep-link became sticky with no user-visible escape hatch. Effect now gates only on `combinedData?.document` being loaded; once the query has resolved, the effect runs `find` (possibly empty) and always clears the var.
80+
81+
- **`NewHeroSection` had a dead `isAuthenticated` prop** (`frontend/src/components/landing/NewHeroSection.tsx`, `frontend/src/views/DiscoveryLanding.tsx`): the prop was declared on the interface and threaded through `DiscoveryLanding` but never read inside the component. Removed from the interface and the call site.
82+
83+
- **Discover search rows silently no-op'd on `"#"` URLs from missing slugs** (`frontend/src/views/DiscoverSearchResults.tsx`): `getDocumentUrl` / `getCorpusUrl` return `"#"` when slugs are missing on the underlying entities. Each section's `onClick` did `if (url !== "#") navigate(url)`, so the row remained visually clickable, hover styles fired, and the click was swallowed with no feedback. `ResultRow` now accepts `disabled` / `disabledReason`; sections compute `unrouteable = url === "#"` once and forward both. Disabled rows render with `opacity 0.55`, `cursor: not-allowed`, native `disabled` (and `aria-disabled`), and a tooltip explaining the missing data — replacing the silent failure with a discoverable one. `disabled:not(:disabled)` scoping on the hover style stops the disabled rows from animating on hover.
84+
3485
- **Annotation deep-links from the corpus-home Table of Contents silently no-op'd** (`frontend/src/components/corpuses/DocumentAnnotationIndex.tsx`, `frontend/src/components/knowledge_base/document/document_kb/RightPanelContent.tsx`): clicking a structural section in the corpus-home document index (e.g. "Subchapter I. Formation, p. 2") was supposed to open the document with the annotation pre-selected and scrolled into view. Instead it appeared to do nothing. Root cause: `DocumentAnnotationIndex` overloaded a single `embedded` prop with two semantics — visual layout ("render without an outer container") *and* click routing ("we are already on the document page, just rewrite `?ann=`"). The corpus-home call site (`DocumentTableOfContents.tsx:919`) needed the visual flavor but absolutely was *not* on a document page, so `handleSectionClick` took the wrong branch and wrote `?ann=<id>` onto the corpus URL — no navigation, no scroll. Fix splits the prop: `embedded` is now purely visual, and a new explicit `onDocumentPage` prop controls click routing. The single call site that's actually on a document page (`RightPanelContent.tsx`) opts in. Regression test in `frontend/src/components/corpuses/__tests__/DocumentAnnotationIndex.test.tsx` pins the new contract: a click from a corpus URL must produce a string-form `navigate("/d/.../doc?ann=<id>")` (full deep link), while a click from a document URL with `onDocumentPage` produces `navigate({ search: "...ann=<id>..." }, { replace: true })`.
3586

3687
- **Zip importer reported `success: True` even when sidecars failed (silent annotation loss)** (`opencontractserver/tasks/import_tasks.py:421`, `:1411`): `_read_sidecar` raises `ValueError` when a sidecar exceeds `ZIP_MAX_SIDECAR_SIZE_BYTES`; malformed JSON, schema failures, and missing labels for sidecar-declared annotations all bump `annotation_sidecars_errored` and append to `errors`. The success determinations only checked `files_errored` (`import_zip_with_folder_structure`) or the user-cap message (`process_documents_zip`), so callers observed `success: True, completed: True` while annotations were silently dropped — exactly the silent data-loss path called out in PR #1489 review feedback. `import_zip_with_folder_structure` now requires `annotation_sidecars_errored == 0` in addition to the existing `files_errored == 0` and user-cap check; `process_documents_zip` now requires `error_files == 0` in addition to the user-cap check. `relationship_errors` is intentionally not folded in — those are surfaced separately via `relationships_skipped` + `relationship_errors` and the documents themselves are imported correctly. Two tests in `test_sidecar_import.py` (`test_skip_pipeline_without_labels_json`) and a new regression test (`test_sidecar_error_drops_overall_success_flag`) lock down the new contract.

config/graphql/annotation_types.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -340,6 +340,20 @@ class NoteType(AnnotatePermissionsForReadMixin, DjangoObjectType):
340340
)
341341
current_version = graphene.Int(description="Current version number of the note")
342342

343+
content_preview = graphene.String(
344+
description=(
345+
"First 400 characters of the note body for list/search previews. "
346+
"Resolvers may annotate the queryset with `content_preview` to "
347+
"avoid shipping the full body over the wire."
348+
)
349+
)
350+
351+
def resolve_content_preview(self, info) -> str:
352+
annotated = getattr(self, "content_preview", None)
353+
if annotated is not None:
354+
return annotated
355+
return (self.content or "")[:400]
356+
343357
def resolve_revisions(self, info) -> Any:
344358
"""Returns all revisions for this note, ordered by version."""
345359
return self.revisions.all()

config/graphql/search_queries.py

Lines changed: 78 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
import graphene
99
from django.contrib.postgres.search import SearchQuery
1010
from django.db.models import Q
11+
from django.db.models.functions import Left
1112
from graphene_django.fields import DjangoConnectionField
1213
from graphql_jwt.decorators import login_required
1314
from graphql_relay import from_global_id
@@ -17,11 +18,12 @@
1718
AnnotationType,
1819
CorpusType,
1920
DocumentType,
21+
NoteType,
2022
SemanticSearchResultType,
2123
UserType,
2224
)
2325
from config.graphql.ratelimits import get_user_tier_rate, graphql_ratelimit_dynamic
24-
from opencontractserver.annotations.models import Annotation
26+
from opencontractserver.annotations.models import Annotation, Note
2527
from opencontractserver.constants.annotations import SEMANTIC_SEARCH_MAX_RESULTS
2628
from opencontractserver.constants.search import FTS_CONFIG
2729
from opencontractserver.corpuses.models import Corpus
@@ -75,6 +77,19 @@ class SearchQueryMixin:
7577
),
7678
)
7779

80+
search_notes_for_mention = DjangoConnectionField(
81+
NoteType,
82+
text_search=graphene.String(
83+
description="Search query to find notes by title or content"
84+
),
85+
corpus_id=graphene.ID(
86+
description="Optional corpus ID to scope search to notes in specific corpus"
87+
),
88+
document_id=graphene.ID(
89+
description="Optional document ID to scope search to notes on a specific document"
90+
),
91+
)
92+
7893
@graphql_ratelimit_dynamic(get_rate=get_user_tier_rate("READ_LIGHT"))
7994
def resolve_search_corpuses_for_mention(
8095
self, info, text_search=None, **kwargs
@@ -418,6 +433,68 @@ def resolve_search_agents_for_mention(
418433
# Order: Global first, then corpus-specific, then alphabetically by name
419434
return qs.select_related("creator", "corpus").order_by("scope", "name")
420435

436+
@graphql_ratelimit_dynamic(get_rate=get_user_tier_rate("READ_LIGHT"))
437+
def resolve_search_notes_for_mention(
438+
self, info, text_search=None, corpus_id=None, document_id=None, **kwargs
439+
) -> Any:
440+
"""
441+
Search notes by title or content.
442+
443+
SECURITY: Notes inherit visibility from document + corpus via
444+
`Note.objects.visible_to_user()`. Anonymous users only see notes whose
445+
document, corpus (if any), and the note itself are public.
446+
"""
447+
user = info.context.user
448+
449+
qs = Note.objects.visible_to_user(user)
450+
451+
# Reject malformed or wrong-type global IDs by returning an empty
452+
# queryset rather than silently filtering on a non-existent FK.
453+
if corpus_id:
454+
try:
455+
type_name, corpus_pk = from_global_id(corpus_id)
456+
except (ValueError, UnicodeDecodeError):
457+
return Note.objects.none()
458+
if type_name != "CorpusType":
459+
return Note.objects.none()
460+
qs = qs.filter(corpus_id=int(corpus_pk))
461+
462+
if document_id:
463+
try:
464+
type_name, document_pk = from_global_id(document_id)
465+
except (ValueError, UnicodeDecodeError):
466+
return Note.objects.none()
467+
if type_name != "DocumentType":
468+
return Note.objects.none()
469+
qs = qs.filter(document_id=int(document_pk))
470+
471+
if text_search:
472+
# TODO(perf): Note has no `search_vector` column today (unlike
473+
# Annotation), so `icontains` is the only available substring
474+
# matcher. This is `LIKE '%…%'` and cannot use a B-tree or GIN
475+
# index — it degrades to a sequential scan as note volume grows
476+
# and returns lower-quality matches than FTS (no stemming/rank).
477+
# The fix is to add a `SearchVectorField` + GIN index to `Note`,
478+
# backfill it, and switch this filter to `SearchQuery` /
479+
# `SearchVector` with `FTS_CONFIG` (mirroring
480+
# `resolve_search_annotations_for_mention`). Acceptable for the
481+
# small note corpora this was tested against.
482+
qs = qs.filter(
483+
Q(title__icontains=text_search) | Q(content__icontains=text_search)
484+
)
485+
486+
# Eager-load the relations the result row needs for deep-linking, and
487+
# annotate a DB-truncated preview so the wire payload doesn't ship the
488+
# full markdown body for every result.
489+
qs = qs.select_related(
490+
"document", "document__creator", "corpus", "creator"
491+
).annotate(content_preview=Left("content", 400))
492+
493+
# NoteType.get_queryset re-applies `visible_to_user` as a defensive
494+
# second pass, so callers cannot widen visibility by bypassing this
495+
# resolver.
496+
return qs.order_by("-modified")
497+
421498
# SEMANTIC SEARCH QUERIES #############################################
422499
semantic_search = graphene.List(
423500
SemanticSearchResultType,
29.6 KB
Loading
67.1 KB
Loading

frontend/src/App.tsx

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,7 @@ import { LeaderboardRoute } from "./components/routes/LeaderboardRoute";
8989
import { GlobalDiscussionsRoute } from "./components/routes/GlobalDiscussionsRoute";
9090
import { ThreadSearchRoute } from "./views/ThreadSearchRoute";
9191
import { DiscoveryLanding } from "./views/DiscoveryLanding";
92+
import { DiscoverSearchResults } from "./views/DiscoverSearchResults";
9293
import { CentralRouteManager } from "./routing/CentralRouteManager";
9394
import { CRUDModal } from "./components/widgets/CRUD/CRUDModal";
9495
import { updateAnnotationDisplayParams } from "./utils/navigationUtils";
@@ -481,6 +482,12 @@ export const App = () => {
481482
<Route path="/corpuses" element={<Corpuses />} />
482483
<Route path="/documents" element={<Documents />} />
483484

485+
{/* Cross-content Discover search */}
486+
<Route
487+
path="/discover/search"
488+
element={<DiscoverSearchResults />}
489+
/>
490+
484491
{/* Global Discussions Route (Issue #623) */}
485492
<Route
486493
path="/discussions"

frontend/src/assets/configurations/constants.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,14 @@ export const OC_SECTION_LABEL = "OC_SECTION";
209209
// Document search/picker limits
210210
export const DOCUMENT_PICKER_SEARCH_LIMIT = 20;
211211

212+
// Discover cross-content search (DiscoverSearchResults view)
213+
/** Number of results shown per section on the "All" tab (preview mode). */
214+
export const DISCOVER_SEARCH_ALL_TAB_PREVIEW = 5;
215+
/** Number of results shown when an entity tab is selected. */
216+
export const DISCOVER_SEARCH_ENTITY_TAB_LIMIT = 25;
217+
/** Debounce (ms) before firing cross-content search queries. */
218+
export const DISCOVER_SEARCH_DEBOUNCE_MS = 250;
219+
212220
// Mutation batching
213221
export const MUTATION_BATCH_SIZE = 10;
214222

frontend/src/assets/configurations/osLegalStyles.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,8 @@ export const OS_LEGAL_COLORS = {
102102
// Folder-specific colors - amber/golden theme
103103
/** Folder icon color - amber/golden for visual distinction from documents. */
104104
folderIcon: "#D97706",
105+
/** Darker amber companion to folderIcon (used as gradient end-stop). */
106+
folderIconDark: "#b45309",
105107
/** Folder background gradient - warm amber tones. */
106108
folderIconBg: "linear-gradient(135deg, #FEF3C7 0%, #FDE68A 100%)",
107109

frontend/src/components/knowledge_base/document/DocumentKnowledgeBase.tsx

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,7 @@ import { getDocument, GlobalWorkerOptions } from "pdfjs-dist";
148148
import workerSrc from "pdfjs-dist/build/pdf.worker.mjs?url";
149149
import {
150150
selectedAnnotationIds,
151+
selectedNoteId,
151152
selectedThreadId,
152153
showStructuralAnnotations,
153154
} from "../../../graphql/cache";
@@ -1622,6 +1623,22 @@ const DocumentKnowledgeBase: React.FC<DocumentKnowledgeBaseProps> = ({
16221623
}
16231624
}, [threadId, combinedData?.document]);
16241625

1626+
// Auto-open the note detail modal when ?note= deep-link is present.
1627+
const deepLinkedNoteId = useReactiveVar(selectedNoteId);
1628+
useEffect(() => {
1629+
// Wait until the document query has resolved before deciding the
1630+
// ?note=<id> param is unresolvable — `notes` is empty during the
1631+
// loading window too, and clearing then would race the load.
1632+
if (!deepLinkedNoteId || !combinedData?.document) return;
1633+
const target = notes.find((n) => n.id === deepLinkedNoteId);
1634+
if (target) setSelectedNote(target);
1635+
// Clear regardless of match: once the document is loaded, a missing
1636+
// target means the note is inaccessible, deleted, or the ID is stale —
1637+
// leaving the var set would pin ?note=<id> in the URL forever via
1638+
// CentralRouteManager.
1639+
selectedNoteId(null);
1640+
}, [deepLinkedNoteId, combinedData?.document, notes]);
1641+
16251642
// The main viewer content:
16261643
let viewerContent: JSX.Element = <></>;
16271644
if (isPdfFileType(metadata.fileType)) {

frontend/src/components/landing/NewHeroSection.tsx

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@ import {
1111
} from "../../graphql/landing-queries";
1212

1313
interface NewHeroSectionProps {
14-
isAuthenticated?: boolean;
1514
selectedCategory: string | null;
1615
onCategoryChange: (categoryId: string | null) => void;
1716
}
@@ -65,7 +64,6 @@ const FilterContainer = styled.div`
6564
`;
6665

6766
export const NewHeroSection: React.FC<NewHeroSectionProps> = ({
68-
isAuthenticated,
6967
selectedCategory,
7068
onCategoryChange,
7169
}) => {
@@ -86,7 +84,7 @@ export const NewHeroSection: React.FC<NewHeroSectionProps> = ({
8684
const handleSearchSubmit = useCallback(
8785
(value: string) => {
8886
if (value.trim()) {
89-
navigate(`/discussions?search=${encodeURIComponent(value.trim())}`);
87+
navigate(`/discover/search?q=${encodeURIComponent(value.trim())}`);
9088
}
9189
},
9290
[navigate]

0 commit comments

Comments
 (0)