LoRA fine-tuned Qwen3-8B for security log analysis, threat detection, anomaly classification, and incident triage.
SecLM transforms a general-purpose LLM into a cybersecurity analyst that understands syslog, firewall logs, cloud audit trails, WAF events, EDR telemetry, and SIEM alerts โ and can classify threats, explain attack patterns, and recommend response actions.
| Capability | Description |
|---|---|
| Log Classification | Classifies raw log entries as benign, suspicious, or malicious |
| Threat Detection | Identifies attack patterns โ brute force, lateral movement, C2 callbacks, data exfiltration, privilege escalation |
| MITRE ATT&CK Mapping | Maps detected threats to MITRE ATT&CK tactics, techniques, and sub-techniques |
| Anomaly Explanation | Explains why a log entry or sequence is anomalous in plain language |
| Incident Triage | Assigns severity (Critical / High / Medium / Low / Info) with reasoning |
| IOC Extraction | Extracts indicators of compromise โ IPs, domains, hashes, file paths, user agents |
| Response Recommendations | Suggests containment and remediation actions based on the detected threat |
| Multi-Source Analysis | Understands logs from Linux syslog, Windows Event Logs, AWS CloudTrail, firewall/IDS, WAF, Kubernetes audit, and more |
Input:
Analyze this log entry:
Mar 15 03:42:18 web-prod-03 sshd[28451]: Failed password for root from 185.220.101.42 port 44831 ssh2
Mar 15 03:42:19 web-prod-03 sshd[28453]: Failed password for root from 185.220.101.42 port 44832 ssh2
Mar 15 03:42:19 web-prod-03 sshd[28455]: Failed password for admin from 185.220.101.42 port 44833 ssh2
Mar 15 03:42:20 web-prod-03 sshd[28457]: Accepted password for admin from 185.220.101.42 port 44834 ssh2
SecLM Output:
Classification: MALICIOUS
Severity: CRITICAL
Attack Type: SSH Brute Force โ Successful Compromise
MITRE ATT&CK:
- Tactic: Initial Access (TA0001)
- Technique: Brute Force (T1110)
- Sub-technique: Password Guessing (T1110.001)
Analysis:
Rapid sequential failed SSH login attempts targeting multiple accounts (root, admin)
from a single source IP 185.220.101.42, followed by a successful authentication
for the 'admin' account. The 1-second interval between attempts indicates automated
tooling. The source IP is associated with known Tor exit nodes.
IOCs:
- Source IP: 185.220.101.42 (Tor exit node)
- Targeted accounts: root, admin
- Target host: web-prod-03
Recommended Actions:
1. IMMEDIATE: Disable the 'admin' account on web-prod-03
2. IMMEDIATE: Block 185.220.101.42 at firewall level
3. INVESTIGATE: Check for post-compromise activity โ new processes, cron jobs,
SSH keys added, lateral movement attempts
4. HARDEN: Disable password authentication, enforce key-based SSH only
5. MONITOR: Enable enhanced logging on web-prod-03 for the next 72 hours
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ SecLM Pipeline โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Raw Logs โโโ Preprocessing โโโ SecLM (Qwen3-8B โ
โ (syslog, (normalize, + LoRA adapter) โ
โ CloudTrail, chunk, โ
โ WAF, EDR, enrich) โโโ Structured Output โ
โ k8s audit) (classification, โ
โ MITRE mapping, โ
โ IOCs, actions) โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Base Model: Qwen3-8B (Apache 2.0) Fine-Tuning Method: LoRA (rank 16โ64) / QLoRA for budget GPUs Training Data: Security log datasets with expert-labeled threat classifications
| Source | Log Types |
|---|---|
| Linux | syslog, auth.log, kern.log, journald |
| Windows | Security Event Log (4624, 4625, 4688, 4720, etc.), PowerShell logs |
| Cloud โ AWS | CloudTrail, VPC Flow Logs, GuardDuty findings, WAF logs |
| Cloud โ Azure | Activity Log, NSG Flow Logs, Entra ID sign-in logs |
| Cloud โ GCP | Cloud Audit Logs, VPC Flow Logs |
| Network | Firewall logs (iptables, pf, Palo Alto, Fortinet), IDS/IPS (Snort, Suricata) |
| Web | Apache/Nginx access logs, WAF logs (ModSecurity, AWS WAF, Cloudflare) |
| Kubernetes | Audit logs, Falco alerts, OPA/Gatekeeper denials |
| Endpoint | EDR telemetry (CrowdStrike, SentinelOne, Wazuh) |
| SIEM | Pre-parsed alerts from Splunk, Elastic SIEM, QRadar |
| Category | Examples |
|---|---|
| Brute Force & Credential Attacks | SSH brute force, password spraying, credential stuffing |
| Lateral Movement | Pass-the-hash, RDP pivoting, SMB relay |
| Privilege Escalation | sudo abuse, token manipulation, kernel exploits |
| Command & Control | Beacon callbacks, DNS tunneling, encoded PowerShell |
| Data Exfiltration | Unusual outbound transfers, DNS exfil, cloud storage abuse |
| Web Attacks | SQLi, XSS, RFI/LFI, path traversal, SSRF |
| Persistence | Cron jobs, systemd services, registry run keys, SSH authorized_keys |
| Reconnaissance | Port scanning, service enumeration, directory brute forcing |
| Cloud-Specific | IAM key abuse, S3 bucket exposure, security group modifications |
| Container/K8s | Privileged container escape, RBAC bypass, crypto mining pods |
Training data follows a structured instruction format with security-specific fields:
[
{
"instruction": "You are SecLM, a cybersecurity log analysis expert. Analyze the following log entry and provide: classification (BENIGN/SUSPICIOUS/MALICIOUS), severity, attack type, MITRE ATT&CK mapping, analysis, IOCs, and recommended actions.",
"input": "Mar 15 03:42:18 web-prod-03 sshd[28451]: Failed password for root from 185.220.101.42 port 44831 ssh2\nMar 15 03:42:19 web-prod-03 sshd[28453]: Failed password for root from 185.220.101.42 port 44832 ssh2",
"output": "Classification: MALICIOUS\nSeverity: HIGH\nAttack Type: SSH Brute Force\n\nMITRE ATT&CK:\n- Tactic: Initial Access (TA0001)\n- Technique: Brute Force (T1110.001)\n\nAnalysis: Rapid repeated SSH authentication failures targeting root account from a single source IP with sequential ports indicates automated brute force attack.\n\nIOCs:\n- Source IP: 185.220.101.42\n- Target: root@web-prod-03\n\nRecommended Actions:\n1. Block source IP at firewall\n2. Verify root login is disabled in sshd_config\n3. Check if any attempt succeeded"
}
]| Source | Type | Description |
|---|---|---|
| CICIDS2017 | Network IDS | Labeled network traffic with multiple attack types |
| CSE-CIC-IDS2018 | Network IDS | Updated version with more attack scenarios |
| LANL Unified Host and Network | Auth + Network | 58 days of auth events from Los Alamos National Lab |
| Splunk Boss of the SOC (BOTS) | SIEM | Realistic SOC scenario datasets |
| Custom labeled logs | Various | Your own labeled production logs (recommended) |
| Synthetic generation | Various | LLM-generated log scenarios for data augmentation |
| Component | Minimum (QLoRA) | Recommended (LoRA) |
|---|---|---|
| GPU VRAM | 10 GB | 24 GB |
| System RAM | 16 GB | 32 GB |
| Disk | 30 GB | 50 GB |
| Python | 3.10 | 3.11 |
| CUDA | 12.1 | 12.4 |
Tested GPUs: NVIDIA A10 (24GB), RTX 4090, RTX 3090, A100
# Clone
git clone https://github.com/<your-username>/seclm-log-threat-detection.git
cd seclm-log-threat-detection
# Install
pip install -r requirements.txt
pip install flash-attn --no-build-isolation # optional, recommended
# Test with example security logs
python train.py
# Train with your labeled log dataset
python train.py --dataset data/your_labeled_logs.json --epochs 3
# QLoRA mode (budget GPUs โ RTX 3090, T4)
python train.py --dataset data/your_labeled_logs.json --use_4bit
# Higher capacity for complex threat detection
python train.py --dataset data/your_labeled_logs.json --lora_rank 64 --max_seq_length 2048| Use Case | GPU | Command |
|---|---|---|
| Quick test | Any 24GB | python train.py |
| Production (A10) | A10 24GB | python train.py --dataset data/logs.json --lora_rank 32 --epochs 3 |
| Budget training | T4 16GB | python train.py --dataset data/logs.json --use_4bit --batch_size 1 --max_seq_length 512 |
| Max quality | A100 80GB | python train.py --dataset data/logs.json --lora_rank 64 --batch_size 4 --max_seq_length 2048 |
| Config | VRAM | Notes |
|---|---|---|
| LoRA fp16, rank 16 | ~18-20 GB | Default, good for most log analysis tasks |
| LoRA fp16, rank 64 | ~20-22 GB | Better for complex multi-source correlation |
| QLoRA 4-bit, rank 16 | ~10-12 GB | Budget option, slight quality tradeoff |
| QLoRA 4-bit, rank 64 | ~12-14 GB | Best quality on budget hardware |
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-8B", device_map="auto", torch_dtype=torch.bfloat16
)
model = PeftModel.from_pretrained(base_model, "./output/<run_name>/final")
tokenizer = AutoTokenizer.from_pretrained("./output/<run_name>/final")merged = model.merge_and_unload()
merged.save_pretrained("./seclm-production")
tokenizer.save_pretrained("./seclm-production")
# Deploy with vLLM, TGI, or Ollama โ no PEFT dependency neededโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ Log โโโโโโโ Preprocessingโโโโโโโ SecLM โโโโโโโ Alert / โ
โ Sources โ โ & Batching โ โ Inferenceโ โ SOAR Action โ
โ โ โ โ โ (vLLM) โ โ โ
โ โข Syslog โ โ โข Normalize โ โ โ โ โข PagerDuty โ
โ โข Cloud โ โ โข Chunk โ โ Returns: โ โ โข Slack โ
โ โข WAF โ โ โข Filter โ โ โข Class โ โ โข JIRA โ
โ โข EDR โ โ noise โ โ โข MITRE โ โ โข Block IP โ
โ โข K8s โ โ โ โ โข IOCs โ โ โข Isolate โ
โโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
seclm-log-threat-detection/
โโโ train.py # Main fine-tuning script
โโโ requirements.txt # Python dependencies
โโโ README.md # This file
โโโ LICENSE # Apache 2.0
โโโ .gitignore # Git ignore rules
โโโ data/
โ โโโ example_security_logs.json # Example labeled security logs
โ โโโ README.md # Dataset preparation guide
โโโ configs/
โ โโโ lora_a10.yaml # Config for A10 GPU
โ โโโ qlora_t4.yaml # Config for T4 GPU (budget)
โ โโโ lora_a100.yaml # Config for A100 GPU
โโโ output/ # Training outputs (git-ignored)
| Platform | GPU | $/hr | 5K log samples | 50K log samples |
|---|---|---|---|---|
| Vast.ai | RTX 4090 | $0.25-0.40 | ~$1-2 | ~$5-10 |
| Vast.ai | A100 80GB | $0.75-1.35 | ~$1-3 | ~$8-15 |
| AWS | g5.xlarge (A10G) | $1.00-1.20 | ~$3-5 | ~$15-25 |
| RunPod | RTX 4090 | $0.35-0.44 | ~$1-3 | ~$6-12 |
- LoRA / QLoRA fine-tuning on Qwen3-8B
- Multi-source log format support
- MITRE ATT&CK mapping
- Evaluation benchmarks (detection accuracy, false positive rate)
- DPO alignment for reducing false positives
- Multi-GPU training support (FSDP / DeepSpeed)
- Streaming inference pipeline for real-time log analysis
- Pre-built dataset generation scripts (synthetic + public datasets)
- GGUF export for on-prem deployment via llama.cpp
- vLLM inference server with REST API
- Splunk / Elastic SIEM plugin
- Grafana dashboard for detection metrics
Contributions welcome! High-impact areas:
- Labeled datasets โ share anonymized, labeled log samples
- Log parsers โ add parsers for new log sources
- Evaluation โ build benchmarks for detection accuracy
- Integration โ connectors for SIEM platforms
- Documentation โ dataset preparation guides, deployment tutorials
SecLM is a research and analysis tool. It is not a replacement for production security monitoring systems, SIEM platforms, or professional SOC analysts. Always validate findings with your security team before taking action. The model may produce false positives or miss threats. Use as a supplementary analysis layer, not as a sole detection mechanism.
Apache 2.0 โ free for commercial and non-commercial use.
- Qwen Team (Alibaba) โ Qwen3-8B base model
- Hugging Face โ Transformers, PEFT, TRL
- MITRE ATT&CK โ Threat classification framework
- Canadian Institute for Cybersecurity โ Public IDS datasets
If this helps your security operations, give it a โญ!