Skip to content

Latest commit

 

History

History
253 lines (189 loc) · 5.73 KB

File metadata and controls

253 lines (189 loc) · 5.73 KB

Git + Data Independence - Quick Answer

Your Question: Can I add this to git and keep data private?

✅ YES! Absolutely.


The Answer in One Image

YOUR PROJECT
│
├── CODE (goes in git - shareable) ✅
│   ├── akbs_ingest_markdown.py
│   ├── akbs_query.py
│   ├── test_akbs.py
│   ├── requirements.txt
│   ├── .gitignore ← This protects your data
│   └── *.md (documentation)
│
└── DATA (stays local - private) ❌
    ├── data/knowledge_db/  ← Never committed
    ├── processed-chapters/ ← Never committed
    └── raw-pdfs/           ← Never committed

How It Works

The .gitignore File Does All The Work

# When you run:
git add .
git commit -m "Initial commit"

# Git sees:
✅ Add: akbs_ingest_markdown.py
✅ Add: akbs_query.py  
✅ Add: requirements.txt
✅ Add: *.md files
❌ Skip: data/ (protected by .gitignore)
❌ Skip: processed-chapters/ (protected by .gitignore)
❌ Skip: *.pdf (protected by .gitignore)

Your textbook content never touches git.


Quick Start with Git

# 1. Initialize git
git init

# 2. Add code files (data is automatically ignored)
git add .gitignore
git add *.py *.md requirements.txt

# 3. Commit
git commit -m "Initial AKBS setup - code only"

# 4. Verify data is protected
git status
# Should NOT show: data/, processed-chapters/, *.pdf

# 5. (Optional) Push to GitHub
git remote add origin https://github.com/yourusername/akbs.git
git push -u origin main

Data Independence: Each User, Their Own Data

You Push to GitHub:

github.com/you/akbs/
├── akbs_ingest_markdown.py  ← Everyone gets this
├── akbs_query.py            ← Everyone gets this
├── requirements.txt         ← Everyone gets this
└── .gitignore               ← Everyone gets this

Someone Clones Your Repo:

them/akbs/
├── akbs_ingest_markdown.py  ← From your repo
├── akbs_query.py            ← From your repo
├── requirements.txt         ← From your repo
├── .gitignore               ← From your repo
└── THEY ADD THEIR OWN DATA:
    ├── data/                ← Their knowledge base
    └── processed-chapters/  ← Their textbooks

You Keep Using Locally:

you/akbs/
├── akbs_ingest_markdown.py  ← In git
├── akbs_query.py            ← In git
├── requirements.txt         ← In git
├── .gitignore               ← In git
└── YOUR PRIVATE DATA:
    ├── data/                ← NOT in git
    └── processed-chapters/  ← NOT in git

Benefits

Share code openly - Help others build knowledge bases ✅ Keep data private - Your textbooks stay on your machine ✅ Collaborate on code - Others can improve the ingestion pipeline ✅ No copyright issues - Code is yours, data never uploaded ✅ Small repo size - Code is ~50KB, data could be GBs ✅ Portable - Works with anyone's data


Test It Works

# Run the setup script
./setup.sh

# Check git status
git status

# Try to add data (should fail gracefully)
git add data/
# Git says: nothing added (ignored by .gitignore)

# Verify protection
git check-ignore data/knowledge_db
# Output: data/knowledge_db ✓

Moving Data Between Machines

Your data is independent, so move it separately:

# DON'T use git for data
# DO use direct transfer:

# Option 1: rsync
rsync -av data/ pi@raspberrypi:/home/pi/akbs/data/

# Option 2: scp
scp -r data/ pi@raspberrypi:/home/pi/akbs/

# Option 3: USB drive
cp -r data/ /media/usb/akbs-backup/

# Option 4: Zip and transfer
tar -czf my-knowledge-base.tar.gz data/
# Send via email/Dropbox/etc

Why This Architecture Matters

For Your Multi-Project Vision:

├── Sensor System (Raspberry Pi)
│   ├── Code (from git)
│   └── Data (copied separately)
│
├── Teaching Interface  
│   ├── Code (from git)
│   └── Data (same data, copied)
│
├── Simulation Platform
│   ├── Code (from git)
│   └── Data (same data, copied)
│
└── Development Machine
    ├── Code (from git - can push improvements)
    └── Data (master copy)

Code flows through git. Data flows through direct transfer.


Common Questions

Q: Can someone who clones my repo see my textbooks? A: No. If .gitignore was in place, your textbooks never went to git.

Q: Can I commit some example data? A: Yes! Create example-data/ and DON'T add it to .gitignore.

Q: What if I accidentally committed data? A: Remove it:

git rm -r --cached data/
git commit -m "Remove data"

Q: Can I version control my data separately? A: Yes! Use a separate git repo or backup system for data.

Q: How do I share my knowledge base with a colleague? A: Send the data/ directory directly (zip, rsync, etc), not through git.


Files You Got

Already included:

  • .gitignore - Protects your data automatically
  • GIT_STRATEGY.md - Detailed git workflow
  • setup.sh - Script to verify separation

Summary

What Git? Why
Code ✅ Yes Share and collaborate
Docs ✅ Yes Help others understand
.gitignore ✅ Yes Protect everyone's data
Your data ❌ No Private, copyrighted, large
Your PDFs ❌ No Private, copyrighted
Your knowledge base ❌ No Regenerate from source

Start Now

# You're ready to go:
git init
git add .gitignore *.py *.md requirements.txt
git commit -m "Initial AKBS - code only, data protected"

# Your data is safe. Your code is shareable. ✨

The architecture is designed for exactly this: shareable code, private data. 🎯