Files
shell/plex/docs/critical-safety-fixes.md

3.4 KiB

Critical Safety Fixes for Plex Backup Script

Overview

Analysis of the backup script revealed several critical safety issues that have been identified and require immediate attention. While the script is functional (contrary to initial static analysis), it contains dangerous operations that can cause data corruption and service instability.

Critical Issues Identified

1. Dangerous Force-Kill Operations (Lines 1276-1297)

Issue: Script uses pkill -KILL (SIGKILL) to force-terminate Plex processes

# DANGEROUS CODE:
sudo pkill -KILL -f "Plex Media Server" 2>/dev/null || true

Risk:

  • Can cause database corruption if Plex is writing to database
  • May leave incomplete transactions and WAL files in inconsistent state
  • No opportunity for graceful cleanup of resources
  • Can corrupt metadata and configuration files

Impact: Database corruption requiring complex recovery procedures

2. Insufficient Synchronization in Service Operations

Issue: Race conditions between service start/stop operations

# PROBLEMATIC: Inadequate wait times
sleep 2  # Too short for reliable synchronization

Risk:

  • Service restart operations may overlap
  • Database operations may conflict with service startup
  • Backup operations may begin before service fully stops

3. Database Repair Safety Issues

Issue: Auto-repair operations without proper safeguards

  • Repair operations run automatically without sufficient validation
  • Inadequate backup of corrupted data before repair attempts
  • Force-stop operations during database repairs increase corruption risk

Real-World Impact Observed

During testing, these issues caused:

  1. Actual database corruption requiring manual intervention
  2. Service startup failures after database repair attempts
  3. Loss of schema integrity when using aggressive repair methods

Safety Improvements Required

1. Remove Force-Kill Operations

Replace dangerous pkill -KILL with:

  • Extended graceful shutdown timeouts
  • Proper service dependency management
  • Safe fallback procedures without force termination

2. Implement Proper Synchronization

  • Increase wait timeouts for critical operations
  • Add service readiness checks before proceeding
  • Implement proper error recovery without dangerous shortcuts

3. Enhanced Database Safety

  • Mandatory corruption backups before ANY repair attempt
  • Read-only integrity checks before deciding on repair strategy
  • Never attempt repairs while service might be running
  1. URGENT: Remove all pkill -KILL operations
  2. HIGH: Increase service operation timeouts
  3. HIGH: Add comprehensive pre-repair validation
  4. MEDIUM: Implement safer fallback procedures

Long-term Recommendations

  1. Separate backup operations from repair operations
  2. Implement a more conservative repair strategy
  3. Add comprehensive testing of all service management operations
  4. Implement proper error recovery procedures

File Status

  • Current script: /home/acedanger/shell/plex/backup-plex.sh (NEEDS SAFETY FIXES)
  • Service status: Plex is running with corrupted database (functional but risky)
  • Backup system: Functional but contains dangerous operations

Next Steps

  1. Implement safer service management functions
  2. Test service operations thoroughly
  3. Validate database repair procedures
  4. Update all related scripts to use safe service management