Meeting notes capture decisions, action items, participant information, and the relationships between people and tasks. Yet most organizations treat them as static documents—searchable only through basic text search.
With a knowledge graph, you can run queries like: "Who attended meetings where the topic was 'budget planning'?" or "What tasks did Sarah get assigned across all meetings?"
This example shows how to build a meeting knowledge graph from Google Drive Markdown notes using LLM extraction, embedding-based entity resolution, and Neo4j, with automatic continuous updates.
Please drop CocoIndex on Github a star to support us and stay tuned for more updates. Thank you so much 🥥🤗.
Meetingnodes — one per meeting section, keyed by a stable integer id derived from(note_file, date)Personnodes — canonical organizers, participants, and task assignees, deduplicated by an embedding + LLM entity-resolution pass (so "Alice", "Alice Chen", and "alice c." collapse to a single node)Tasknodes — tasks decided in meetings (keyed by description)- Relationships:
ATTENDED—Person → Meeting(withis_organizerflag)DECIDED—Meeting → TaskASSIGNED_TO—Person → Task
The source is one or more Google Drive folders shared with a service account. The flow watches for recent changes and keeps the graph up to date incrementally.
The pipeline runs in three phases:
- Per-file extraction. Read each file from Google Drive, split it by Markdown headings (
#/##) into meeting sections, and for each section extract a structuredMeetingvia LiteLLM + instructor (date, note, organizer, participants, tasks with assignees).MeetingandTasknodes plusDECIDEDedges are declared in this phase. Raw person names are carried forward. - Person entity resolution. All raw person names from all files are deduplicated using sentence-transformer embeddings and an LLM pair resolver to produce a canonical-name mapping.
- Person-touching relations. Canonical
Personnodes are declared, thenATTENDEDandASSIGNED_TOedges are wired up using resolved names.
CocoIndex reconciles changes incrementally — re-running after editing one note only re-processes the affected sections, and the resolution phase only re-runs when the set of raw names changes.
-
A running Neo4j 5.18+ instance:
docker run -d \ -p 7474:7474 -p 7687:7687 \ -e NEO4J_AUTH=neo4j/cocoindex \ --name cocoindex-neo4j \ neo4j:5.26-community
The browser UI is at http://localhost:7474; log in with
neo4j/cocoindex.Why 5.18+? Vector index DDL (
CREATE VECTOR INDEX … OPTIONS { indexConfig: {...} }) shipped in 5.18. Older Neo4j 5 servers need thedb.index.vector.createNodeIndexprocedure, which this connector doesn't emit. The flow itself doesn't use vector indexes, but the connector requires 5.18+ for parity. -
An LLM key (defaults to OpenAI; configure via
LLM_MODELfor other providers — see LiteLLM providers). -
A Google Cloud service account with read access to the source folders, and the folder IDs you want to ingest. See Setup for Google Drive.
Copy .env.example to .env and fill in your values:
export OPENAI_API_KEY=sk-...
export GOOGLE_SERVICE_ACCOUNT_CREDENTIAL=/absolute/path/to/service_account.json
export GOOGLE_DRIVE_ROOT_FOLDER_IDS=folderId1,folderId2
export NEO4J_URI=bolt://localhost:7687
export NEO4J_USER=neo4j
export NEO4J_PASSWORD=cocoindex
export NEO4J_DATABASE=neo4j
export LLM_MODEL=openai/gpt-5.4
export RESOLUTION_LLM_MODEL=openai/gpt-5-mini # used for entity resolutionThen load it into your shell:
set -a && source .env && set +aNotes:
GOOGLE_DRIVE_ROOT_FOLDER_IDSaccepts a comma-separated list of folder IDs.LLM_MODEL/RESOLUTION_LLM_MODELare LiteLLM-prefixed model names (e.g.openai/gpt-5.4,anthropic/claude-...).
Install dependencies:
pip install -e .Update the index (run the flow once to build/update the graph):
cocoindex update mainOpen Neo4j Browser at http://localhost:7474, log in, and run Cypher queries:
// All relationships
MATCH p=()-->() RETURN p LIMIT 100
// Who attended which meetings (including organizer; one edge per attendee)
MATCH (p:Person)-[:ATTENDED]->(m:Meeting)
RETURN p.name, m.note_file, m.time, m.id
// Tasks decided in meetings
MATCH (m:Meeting)-[:DECIDED]->(t:Task)
RETURN m.note_file, m.time, t.description
// Task assignments
MATCH (p:Person)-[:ASSIGNED_TO]->(t:Task)
RETURN p.name, t.description
// Meetings someone organized
MATCH (p:Person)-[r:ATTENDED {is_organizer: true}]->(m:Meeting)
RETURN p.name, m.note_file, m.timeTo wipe the graph between runs:
MATCH (n) DETACH DELETE nRe-running cocoindex update main after editing a note only reprocesses the affected meeting sections — extraction is memoized per section, and the entity-resolution phase only re-runs when the set of raw person names changes. To keep the graph live as Drive files change, run in live mode:
cocoindex update -L main