Upload any CCTV footage โ Detect violence, identify weapons, get threat assessment
Powered by spatio-temporal deep learning, real-time object detection, and annotated video output โ all in one platform.
VIGIL.AI is a production-ready AI surveillance platform that combines spatio-temporal deep learning and real-time object detection to build an intelligent violence and weapon detection system for CCTV footage.
A user uploads a video clip โ security camera feed, recorded footage โ and the system:
- Preprocesses the video using FFmpeg for format normalization
- Detects weapons using a fine-tuned YOLOv8 model on every other frame
- Classifies violence using R3D-18 (3D ResNet-18) with a 16-frame sliding window
- Returns annotated output with bounding boxes, labels, and a structured threat assessment
Think of it as a real-time AI security analyst that watches footage so humans don't have to.
CCTV Video Input (.mp4)
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ FFmpeg Preprocess โ Re-encode โ yuv420p / libx264
โ Format Normalize โ Ensures compatibility across all inputs
โโโโโโโโโโโโฌโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ YOLOv8 Detector โ Runs every 2nd frame (YOLO_STRIDE=2)
โ Weapon Detection โ Knife ยท Handgun ยท Rifle ยท Launcher
โ โ Permanent activation on first detection
โโโโโโโโโโโโฌโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ R3D-18 Classifier โ 16-frame sliding clip window
โ Violence Detection โ 4-class softmax output
โ โ 5-frame majority-vote smoothing
โโโโโโโโโโโโฌโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ app.py โ FastAPI backend + Streamlit dark UI
โ Web Interface โ Annotated video + threat assessment
โโโโโโโโโโโโโโโโโโโโโโโ
| Component | Technology | Purpose |
|---|---|---|
| Violence Classifier | R3D-18 / 3D ResNet-18 (PyTorch) | Spatio-temporal violence classification |
| Weapon Detector | YOLOv8 (Ultralytics) | Fine-tuned knife, gun, rifle, launcher detection |
| Video Processing | OpenCV + FFmpeg | Frame extraction, annotation, and encoding |
| Backend | FastAPI + Uvicorn | REST API for model inference |
| Smoothing | Majority-vote (5-frame window) | Prevent flickering predictions |
| Framework | PyTorch + torchvision | Model training and inference |
| UI | Streamlit | Web interface |
Violence-Detection-in-CCTV/
โ
โโโ violence-app/
โ โโโ backend/
โ โ โโโ app.py โ FastAPI server โ /predict/ endpoint
โ โ โโโ model.py โ Full inference pipeline (R3D-18 + YOLOv8)
โ โ โโโ processed_videos/ โ Annotated output videos
โ โ โโโ temp_videos/ โ Uploaded input videos (temp)
โ โ
โ โโโ frontend/
โ โ โโโ ui.py โ VIGIL.AI Streamlit interface
โ โ
โ โโโ live_model.py โ Live webcam inference (optional)
โ โโโ requirements.txt
โ
โโโ README.md
โโโ LICENSE
git clone https://github.com/ash-iiiiish/Violence-Detetion-in-CCTV
cd Violence-Detetion-in-CCTV/violence-apppython -m venv venv
# Windows
venv\Scripts\activate
# Mac / Linux
source venv/bin/activatepip install -r requirements.txtFFmpeg must be installed and available in your system PATH:
# Windows (via Chocolatey)
choco install ffmpeg
# macOS
brew install ffmpeg
# Ubuntu / Debian
sudo apt install ffmpegIn backend/model.py, update the paths to your local model weights:
MODEL_PATH = "path/to/best-violence.pth" # R3D-18 checkpoint
YOLO_PATH = "path/to/best-yolo.pt" # YOLOv8 weightsStart the backend:
cd backend
uvicorn app:app --reloadStart the frontend (in a new terminal):
cd frontend
streamlit run ui.pyOpen http://localhost:8501 in your browser.
โ ๏ธ Both servers must be running simultaneously.
- Multimodal input โ Upload any
.mp4CCTV video clip - Violence classification โ R3D-18 classifies footage across 4 distinct categories
- Weapon detection โ YOLOv8 identifies knives, handguns, rifles, and launchers
- Annotated output โ Bounding boxes, labels, and confidence overlays on every frame
- Threat assessment โ Structured scoring with
SAFE / HIGH / CRITICALlevels
- Spatio-temporal understanding โ 3D-CNN processes 16-frame clips to capture motion context
- Majority-vote smoothing โ 5-frame voting window prevents flickering predictions
- Permanent weapon mode โ Once a weapon is detected, the label stays active for continuity
- YOLO stride optimization โ Weapon detection runs every 2nd frame for performance
- FFmpeg preprocessing โ Auto-converts any input to a compatible format before inference
- Dark premium theme โ Professional surveillance-grade interface
- Globe inference loader โ Animated loader while pipeline runs
- Confidence bar โ Visual softmax confidence score display
- Threat level badge โ
NONE / HIGH / CRITICALcolor-coded assessment - Annotated video playback โ Watch processed output with bounding boxes directly in browser
| Class | Description | Threat Level |
|---|---|---|
NonFight |
No violent activity detected | SAFE |
Fight |
Physical altercation between subjects | HIGH |
HockeyFight |
Sport-context violent confrontation | HIGH |
MovieFight |
Scripted / cinematic fight sequence | MED |
Weaponized |
Knife ยท Handgun ยท Rifle ยท Launcher detected | CRITICAL |
| Model | Strength | Role |
|---|---|---|
| YOLOv8 (2D) | Fast, frame-level spatial detection | Weapon localization |
| R3D-18 (3D) | Understands motion across time | Violence classification |
| Combined | Spatial + temporal coverage | โ |
Weapon detected โ "Weaponized - <ViolenceClass>" [RED / CRITICAL]
Fight class โ "<FightClass>" [ORANGE / HIGH]
NonFight โ "NonFight" [GREEN / SAFE]
Input Video
โ
FFmpeg normalization
โ
Per-frame: YOLOv8 โ weapon boxes + confidence
โ
Per-clip: R3D-18 (16 frames) โ violence class + softmax score
โ
Majority vote (5-frame window) โ smoothed label
โ
Final label logic โ annotated video + JSON response
All key parameters are in backend/model.py:
| Variable | Default | Description |
|---|---|---|
IMG_SIZE |
112 |
Frame resize resolution for R3D-18 |
CLIP_LEN |
16 |
Frames per 3D-CNN inference window |
YOLO_STRIDE |
2 |
Run YOLO every N frames (performance) |
WEAPON_CONF_THRESHOLD |
0.5 |
Minimum YOLO confidence to flag a weapon |
WEAPON_RELAX_FRAMES |
30 |
Frames before weapon mode can deactivate |
VIOLENCE_SMOOTH_COUNT |
5 |
Majority-vote window size |
Add new weapon classes by fine-tuning YOLOv8 on a custom dataset:
yolo train model=yolov8n.pt data=custom_weapons.yaml epochs=50 imgsz=640Upgrade the violence classifier for more categories:
# In model.py โ update NUM_CLASSES and retrain R3D-18
NUM_CLASSES = 6 # e.g. add "Robbery", "Vandalism"Enable live webcam inference:
python live_model.pyHit the REST API directly:
curl -X POST "http://127.0.0.1:8000/predict/" \
-F "file=@your_video.mp4"Expected JSON response:
{
"prediction": "Weaponized - Fight",
"confidence": 97.43,
"video_url": "http://127.0.0.1:8000/videos/processed_1234567890.mp4"
}| Error | Fix |
|---|---|
Model not loading |
Update MODEL_PATH and YOLO_PATH in model.py |
CUDA not available |
Use CPU mode or reinstall PyTorch with CUDA support |
Video not opening |
Ensure FFmpeg is installed and available in system PATH |
Backend 500 error |
Check terminal logs from uvicorn for traceback |
Frontend can't connect |
Start uvicorn app:app --reload before launching Streamlit |
Output video won't play |
FFmpeg will re-encode to libx264 / yuv420p automatically |
Contributions are welcome! Fork this repository and submit a pull request.
- Fork the repository
- Create your feature branch:
git checkout -b feature/your-feature - Commit your changes:
git commit -m "Add your feature" - Push to the branch:
git push origin feature/your-feature - Submit a pull request
This project is licensed under the MIT License โ see the LICENSE file for details.
โญ If you found VIGIL.AI useful, please consider giving the repo a star
Built with PyTorch ยท R3D-18 ยท YOLOv8 ยท FastAPI ยท Streamlit



