Skip to content

victoriacheng15/personal-reading-analytics

Repository files navigation

📚 Personal Reading Analytics

A self-built, fully automated data pipeline with CI/CD governance, leveraging GitHub Actions and MongoDB event sourcing to transform raw reading data into actionable insights and interactive visualizations with zero infrastructure.

Beyond standard charts, it performs an AI Delta Analysis to generate a qualitative weekly narrative, answering three specific questions:

  • Velocity: Are you reading faster or slower than usual?
  • Backlog Health: Are you clearing old debt (>1 year) or just adding new noise?
  • Chronology: Which specific years of content are you focusing on right now?

🔗 Live Analytics

👉 See Live Analytics


🌿 Engineering Principles

This project is built to reflect how I believe small, personal tools should work:

  • Zero infrastructure → No servers or hosting costs. Runs entirely on GitHub (Actions + Pages).
  • Fully automated → Scheduled GitHub Actions keep data fresh, utilizing CI/CD governance for human-in-the-loop code review and merging.
  • Observability first → Uses an Event Sourcing pattern (MongoDB) to decouple extraction from analytics, ensuring full auditability and health monitoring.
  • Cost-effective → Uses only free tiers (GitHub, Google Sheets API, MongoDB Atlas), proving powerful automation doesn’t require budget.

📚 Documentation

For all project documentation, including architectural diagrams, operational guides, and detailed schema specifications, please visit the Project Documentation. This central hub includes details on the external Observability Hub, which processes events from this pipeline (MongoDB to PostgreSQL) for Grafana visualization. Note that the Grafana instance itself is not publicly exposed.


🛠 Tech Stacks

Go Python Google Sheets API MongoDB Google Gemini Docker GitHub Actions


📊 What It Shows

Key Metrics Section:

  • Total articles: Tracking total articles across currently supported sources
  • Read rate: Percentage of articles completed with visual highlighting
  • AI Delta Analysis: Multi-dimensional analysis of reading Velocity (pace), Backlog Health (clearing old debt vs. new noise), and Chronology (era of content focus) to provide narrative context beyond raw numbers.
  • Historical Archive: A permanent record of past weekly snapshots, accessible via a context-aware selector to track progress over time.
  • Reading statistics: Read count, unread count, and average articles per month
  • Highlight badges: Top read rate source, most unread source, current month's read articles

7 Interactive Visualizations (Chart.js):

  1. Year Breakdown: Bar chart showing article distribution by publication year
  2. Read/Unread by Year: Stacked bar chart with reading progress across years
  3. Monthly Breakdown: Toggle between total articles (line chart) and by-source distribution (stacked bar)
  4. Read/Unread by Month: Seasonal reading patterns across all months
  5. Read/Unread by Source: Horizontal stacked bars comparing progress per provider
  6. Unread Age Distribution: Age buckets (<1 month, 1-3 months, 3-6 months, 6-12 months, >1 year)
  7. Unread by Year: Identifies which years have the most unread backlog

Source Analytics:

  • Per-source statistics with read/unread split and read percentages
  • Substack per-author average calculation (total articles ÷ author count)
  • Top 3 oldest unread articles with clickable links, dates, and age calculations
  • Source metadata showing when each provider was added to tracking

📖 How This Project Evolved

Learn about the journey of this project: from local-only execution, to Docker containerization, to automated GitHub Actions workflows.


🚀 Ready to Explore?

Don't just take my word for it, interact with the real data.

👉 Launch Personal Reading Analytics

About

Zero-infrastructure reading analytics pipeline. Automated data pipeline via GitHub Actions with MongoDB event sourcing for observability and interactive visualizations.

Topics

Resources

License

Stars

Watchers

Forks

Contributors