This guide helps diagnose and fix common live reduction problems.
Check these locations in order when things go wrong:
flowchart TD
Start{"What's the problem?"}
Start -->|"Service won't start"| CheckStatus["systemctl status livereduce<br/>Shows immediate failure cause"]
CheckStatus --> CheckJournal["sudo journalctl -u livereduce -n 100<br/>Shows detailed startup errors"]
CheckJournal --> Causes1["Common causes:<br/>• Bad JSON in /etc/livereduce.conf<br/>• Missing snsdata user<br/>• No processing scripts found<br/>• Permission issues"]
Start -->|"Service running but not processing"| CheckLog["tail -f /var/log/SNS_applications/livereduce.log<br/>Look for connection/script errors"]
CheckLog --> CheckInstrument["Is instrument actually running?<br/>Verify with instrument staff"]
CheckInstrument --> CheckNetwork["Network connectivity to DAS<br/>telnet/ping DAS host"]
Start -->|"Memory errors or frequent restarts"| CheckMemory["grep 'Memory' /var/log/.../livereduce.log<br/>Shows memory threshold violations"]
CheckMemory --> Solutions["Solutions:<br/>• Increase system_mem_limit_perc<br/>• Set PreserveEvents=False in scripts<br/>• Use accum_method: Replace"]
systemctl status livereduceLook for:
active (running)= service is upfailedorinactive= service crashed or stopped- Recent log lines at bottom
tail -f /var/log/SNS_applications/livereduce.logOr systemd journal:
sudo journalctl -u livereduce -fShows:
- Configuration loading
- Script detection
- Connection to DAS
- Processing errors
- Memory warnings
- Restart events
tail -f /var/log/SNS_applications/livereduce_watchdog.logShows:
- When watchdog detected inactivity
- Last 20 lines of main log before restart
- Service restart actions
Some post-processing scripts create their own logs:
ls /SNS/INSTR/shared/livereduce/*.log
tail -f /SNS/INSTR/shared/livereduce/INSTR_live_reduction.log# Check for systemd issues
sudo journalctl -xe
# Check for memory/disk issues
dmesg | tail
df -h
free -hSymptom: systemctl start livereduce fails immediately
Check 1: Configuration file syntax
# Validate JSON
cat /etc/livereduce.conf | jq .If error, fix JSON syntax.
Check 2: Missing snsdata user
# Check user exists
id snsdataIf not, create it:
sudo useradd -r -g users -G hfiradmin snsdataCheck 3: Missing conda environment
# List environments
conda env listEnsure specified environment exists with Mantid installed.
Check 4: Processing scripts not found
# Check default location
ls -la /SNS/INSTR/shared/livereduce/reduce_*At least one script must exist.
Check 5: Permissions
# Service runs as snsdata
sudo -u snsdata ls -la /SNS/INSTR/shared/livereduce/Ensure snsdata can read scripts.
Get specific error:
sudo journalctl -u livereduce -n 100Symptom: Status is "active" but no new log entries or output files
Check 1: DAS connection
# Look for connection errors
grep -i "connection\|listener\|timeout" /var/log/SNS_applications/livereduce.logCheck 2: Active run
- Verify instrument is running
- Check with instrument scientists
Check 3: Processing script errors
# Look for Python tracebacks
grep -i "error\|exception\|traceback" /var/log/SNS_applications/livereduce.logCheck 4: Watchdog restart loop
# Check for repeated restarts
grep "restarting" /var/log/SNS_applications/livereduce_watchdog.logSolutions:
- Test scripts locally with fake server
- Check network connectivity to DAS
- Verify instrument is collecting data
Symptom: Logs show "Memory usage exceeds limit", frequent restarts
Cause: Scripts preserve events or accumulate too much data
Solution 1: Adjust threshold (temporary)
{
"system_mem_limit_perc": 80
}Solution 2: Disable event preservation
In processing scripts, change:
# From:
Rebin(..., PreserveEvents=True)
# To:
Rebin(..., PreserveEvents=False)Or in config:
{
"preserve_events": false
}Solution 3: Use Replace accumulation
{
"accum_method": "Replace"
}Solution 4: Load fewer spectra
{
"spectra": [0, 100, 200]
}Symptom: Python tracebacks in logs, processing fails
Common Error 1: Undefined variable
# Wrong - 'input' and 'output' are pre-defined
InputWorkspace="input_ws" # Variable doesn't exist!
# Correct
InputWorkspace=input
OutputWorkspace=outputCommon Error 2: Workspace doesn't exist
# Add check
from mantid.simpleapi import mtd
if mtd.doesExist("my_workspace"):
# Use it
else:
print("Workspace not found")Common Error 3: Invalid algorithm parameters
# Check algorithm docs
python3 -c "from mantid.simpleapi import help; help('Rebin')"Common Error 4: File permissions
# Ensure directory exists and is writable
import os
output_dir = "/SNS/INSTR/shared/livereduce"
os.makedirs(output_dir, exist_ok=True)Solution: Test scripts interactively in Workbench before deploying
Symptom: Service starts, runs briefly, restarts continuously
Normal Case: "Run paused/resumed" spam
These are NOT problems:
2026-01-21 09:40:13 - Mantid - INFO - Run paused
2026-01-21 09:40:13 - Mantid - INFO - Run resumed
Indicates:
- Alignment scans
- Polarized beam spin state changes
- Rocking curve measurements
- Normal operations
Problem Case 1: Scripts too slow
- Watchdog sees no log entries for
thresholdseconds - Fix: Optimize scripts or increase threshold:
{"watchdog": {"threshold": 600}}
Problem Case 2: Memory crashes
- Service crashes from memory issues
- Fix: Address memory configuration (see above)
Problem Case 3: DAS connection drops
- Network issues causing disconnects
- Fix: Check network, contact facility IT
Symptom: Updated script but behavior unchanged
Check 1: File actually changed
# Check modification time
ls -l /SNS/INSTR/shared/livereduce/reduce_*
# Compare MD5
md5sum /SNS/INSTR/shared/livereduce/reduce_*Check 2: Daemon detected change
# Look for restart message
grep "changed - restarting" /var/log/SNS_applications/livereduce.logCheck 3: Permissions
# Ensure snsdata can read
sudo -u snsdata cat /SNS/INSTR/shared/livereduce/reduce_INSTR_live_proc.pySolution: Force restart
sudo systemctl restart livereduceStop service and run manually for full output:
# Stop service
sudo systemctl stop livereduce
# Run as snsdata user
sudo -u snsdata bash
cd /tmp
python3 /usr/bin/livereduce.py /etc/livereduce.confBenefits:
- See all output in console
- Python tracebacks appear immediately
- Easy to interrupt (Ctrl+C)
- Better for debugging script errors
Use test infrastructure to simulate live data:
# Terminal 1: Start fake server
cd /path/to/livereduce/repo
pixi shell
python test/fake_server.py
# Terminal 2: Run livereduce
pixi shell
python scripts/livereduce.py test/fake.conf
# Terminal 3: Watch logs
tail -f livereduce.logBenefits:
- Test without real instrument
- Rapid iteration on scripts
- Verify configuration changes
- Test memory monitoring
Daemon detects script changes via MD5 checksums:
# See what daemon calculated
grep "md5" /var/log/SNS_applications/livereduce.log
# Calculate manually
md5sum /SNS/INSTR/shared/livereduce/reduce_*If scripts not updating:
- Verify file changed
- Check permissions
- Ensure inotify working
Watch memory in real-time:
# Overall system
watch -n 1 free -h
# Specific process
watch -n 1 "ps aux | grep livereduce | grep -v grep"
# Interactive monitoring
htop -p $(pgrep -f livereduce.py)Configure appropriately:
{
"system_mem_limit_perc": 70,
"mem_check_interval_sec": 2
}Test network connectivity:
# For TCP listeners
telnet bl-dassrv1.facility.gov 31415
# Check firewall
sudo iptables -L -n | grep <port>
# DNS resolution
nslookup bl-dassrv1.facility.gov
# Ping test
ping -c 3 bl-dassrv1.facility.gov# Start
sudo systemctl start livereduce
# Stop
sudo systemctl stop livereduce
# Restart
sudo systemctl restart livereduce
# Status
systemctl status livereduce
sudo systemctl status livereduce # More details
# Enable at boot
sudo systemctl enable livereduce
# Disable at boot
sudo systemctl disable livereduceRestart when:
- Configuration file changed (required)
- Service shows "failed"
- Making routine updates
- Testing new scripts
Investigate before restarting when:
- Service "active" but not working
- Repeated auto-restarts
- Memory/disk issues suspected
- New scripts just deployed (auto-restarts)
# All snsdata processes
ps -u snsdata -o pid,etime,stat,command
# Process tree
pstree -p $(pgrep -f livereduce.py)
# Open files
sudo lsof -p $(pgrep -f livereduce.py)
# Network connections
sudo netstat -tnp | grep livereduceWatchdog is independent:
# Watchdog operations
sudo systemctl start livereduce_watchdog
sudo systemctl stop livereduce_watchdog
systemctl status livereduce_watchdog
# Stopping watchdog doesn't affect main serviceDisable watchdog when:
- Doing maintenance
- Testing interactively
- Investigating restart issues
- Too aggressive for workload
Enable watchdog when:
- Production operation
- Unattended running
- Service has stalling issues
- Want automatic recovery
# Check size
ls -lh /var/log/SNS_applications/livereduce.log
# Rotate manually
sudo logrotate -f /etc/logrotate.d/livereduce
# Truncate
sudo truncate -s 0 /var/log/SNS_applications/livereduce.logSet up rotation (/etc/logrotate.d/livereduce):
/var/log/SNS_applications/livereduce.log {
daily
rotate 7
compress
missingok
notifempty
}
Cause: instrument not in config and not in /etc/mantid.local.properties
Fix: Add to config:
{"instrument": "POWGEN"}Cause: Script file exists but is 0 bytes
Fix: Check file has content:
ls -l /SNS/INSTR/shared/livereduce/reduce_*Cause: Neither proc nor post_proc script exists
Fix: Create at least one:
touch /SNS/INSTR/shared/livereduce/reduce_INSTR_live_proc.py
# Add script contentCause: Cannot connect to DAS
Fix:
- Check network connectivity
- Verify DAS is running
- Check firewall rules
- Contact facility IT
Cause: snsdata user can't access files
Fix: Check permissions:
sudo chown -R snsdata:users /SNS/INSTR/shared/livereduce/
sudo chmod -R 755 /SNS/INSTR/shared/livereduce/When asking for help, provide:
-
Service status:
systemctl status livereduce
-
Recent logs:
sudo journalctl -u livereduce -n 200 > livereduce_logs.txt -
Configuration:
cat /etc/livereduce.conf
-
Script locations:
ls -la /SNS/INSTR/shared/livereduce/
-
System info:
free -h df -h uname -a
- GitHub Issues: https://github.com/mantidproject/livereduce/issues
- Mantid Help: https://mantidproject.org/help
- Facility IT: Contact your facility's support team
- Architecture - System design
- Developer Guide - Setup procedures
- Processing Scripts - Script troubleshooting
- Configuration Reference - Configuration options