Skip to content

cocoindex-io/meeting-notes-knowledge-graph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CocoIndex

Meeting Notes Knowledge Graph that Auto Updates

GitHub Documentation License PyPI version

PyPI Downloads CI release Link Check Discord

Step By Step Tutorial

cover

Meeting notes capture decisions, action items, participant information, and the relationships between people and tasks. Yet most organizations treat them as static documents—searchable only through basic text search.

With a knowledge graph, you can run queries like: "Who attended meetings where the topic was 'budget planning'?" or "What tasks did Sarah get assigned across all meetings?"

This example shows how to build a meeting knowledge graph from Google Drive Markdown notes using LLM extraction, embedding-based entity resolution, and Neo4j, with automatic continuous updates.

Please drop CocoIndex on Github a star to support us and stay tuned for more updates. Thank you so much 🥥🤗. GitHub

neo4j

What this builds

  • Meeting nodes — one per meeting section, keyed by a stable integer id derived from (note_file, date)
  • Person nodes — canonical organizers, participants, and task assignees, deduplicated by an embedding + LLM entity-resolution pass (so "Alice", "Alice Chen", and "alice c." collapse to a single node)
  • Task nodes — tasks decided in meetings (keyed by description)
  • Relationships:
    • ATTENDEDPerson → Meeting (with is_organizer flag)
    • DECIDEDMeeting → Task
    • ASSIGNED_TOPerson → Task

The source is one or more Google Drive folders shared with a service account. The flow watches for recent changes and keeps the graph up to date incrementally.

graph

How it works

The pipeline runs in three phases:

  1. Per-file extraction. Read each file from Google Drive, split it by Markdown headings (# / ##) into meeting sections, and for each section extract a structured Meeting via LiteLLM + instructor (date, note, organizer, participants, tasks with assignees). Meeting and Task nodes plus DECIDED edges are declared in this phase. Raw person names are carried forward.
  2. Person entity resolution. All raw person names from all files are deduplicated using sentence-transformer embeddings and an LLM pair resolver to produce a canonical-name mapping.
  3. Person-touching relations. Canonical Person nodes are declared, then ATTENDED and ASSIGNED_TO edges are wired up using resolved names.

CocoIndex reconciles changes incrementally — re-running after editing one note only re-processes the affected sections, and the resolution phase only re-runs when the set of raw names changes.

flow

Prerequisites

  • A running Neo4j 5.18+ instance:

    docker run -d \
      -p 7474:7474 -p 7687:7687 \
      -e NEO4J_AUTH=neo4j/cocoindex \
      --name cocoindex-neo4j \
      neo4j:5.26-community

    The browser UI is at http://localhost:7474; log in with neo4j / cocoindex.

    Why 5.18+? Vector index DDL (CREATE VECTOR INDEX … OPTIONS { indexConfig: {...} }) shipped in 5.18. Older Neo4j 5 servers need the db.index.vector.createNodeIndex procedure, which this connector doesn't emit. The flow itself doesn't use vector indexes, but the connector requires 5.18+ for parity.

  • An LLM key (defaults to OpenAI; configure via LLM_MODEL for other providers — see LiteLLM providers).

  • A Google Cloud service account with read access to the source folders, and the folder IDs you want to ingest. See Setup for Google Drive.

Environment

Copy .env.example to .env and fill in your values:

export OPENAI_API_KEY=sk-...
export GOOGLE_SERVICE_ACCOUNT_CREDENTIAL=/absolute/path/to/service_account.json
export GOOGLE_DRIVE_ROOT_FOLDER_IDS=folderId1,folderId2
export NEO4J_URI=bolt://localhost:7687
export NEO4J_USER=neo4j
export NEO4J_PASSWORD=cocoindex
export NEO4J_DATABASE=neo4j
export LLM_MODEL=openai/gpt-5.4
export RESOLUTION_LLM_MODEL=openai/gpt-5-mini   # used for entity resolution

Then load it into your shell:

set -a && source .env && set +a

Notes:

  • GOOGLE_DRIVE_ROOT_FOLDER_IDS accepts a comma-separated list of folder IDs.
  • LLM_MODEL / RESOLUTION_LLM_MODEL are LiteLLM-prefixed model names (e.g. openai/gpt-5.4, anthropic/claude-...).

Run

Build/update the graph

Install dependencies:

pip install -e .

Update the index (run the flow once to build/update the graph):

cocoindex update main

Browse the knowledge graph

Open Neo4j Browser at http://localhost:7474, log in, and run Cypher queries:

// All relationships
MATCH p=()-->() RETURN p LIMIT 100

// Who attended which meetings (including organizer; one edge per attendee)
MATCH (p:Person)-[:ATTENDED]->(m:Meeting)
RETURN p.name, m.note_file, m.time, m.id

// Tasks decided in meetings
MATCH (m:Meeting)-[:DECIDED]->(t:Task)
RETURN m.note_file, m.time, t.description

// Task assignments
MATCH (p:Person)-[:ASSIGNED_TO]->(t:Task)
RETURN p.name, t.description

// Meetings someone organized
MATCH (p:Person)-[r:ATTENDED {is_organizer: true}]->(m:Meeting)
RETURN p.name, m.note_file, m.time

To wipe the graph between runs:

MATCH (n) DETACH DELETE n

Incremental & continuous updates

Re-running cocoindex update main after editing a note only reprocesses the affected meeting sections — extraction is memoized per section, and the entity-resolution phase only re-runs when the set of raw person names changes. To keep the graph live as Drive files change, run in live mode:

cocoindex update -L main

About

Build a meeting knowledge graph from Google Drive using LLM extraction and graph database, with automatic continuous updates.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages