Comprehensive guide for monitoring system health, maintaining optimal performance, and proactive troubleshooting of your Koha deployment.
System Health Monitoring
Quick Health Check
Connect to your instance via EC2 Instance Connect and run:
# System services status
sudo systemctl status koha-plack
sudo systemctl status koha-worker
sudo systemctl status koha-zebra-daemon
sudo systemctl status apache2
sudo systemctl status mysql # Basic/Standard only
# Disk space
df -h /
df -h /var/lib/mysql # Basic/Standard only
# Memory usage
free -h
# CPU load
uptime
Expected output:
- All services:
active (running) - Disk usage: < 80%
- Memory: At least 20% free
- Load average: < number of CPU cores
AWS CloudWatch Monitoring
Enable Detailed Monitoring
All tiers include basic CloudWatch metrics. For enhanced monitoring:
- Go to EC2 Console
- Select your instance
- Actions → Monitor and troubleshoot → Manage detailed monitoring
- Enable (additional charges apply)
Key Metrics to Watch
EC2 Instance Metrics
CPU Utilization:
- Normal: 10-40% average
- High: 60-80% (consider scaling up)
- Critical: > 90% sustained
Network I/O:
- Monitor for unusual spikes
- Basic tier: Typically 1-10 MB/min
- Enterprise tier: Can be higher with multiple instances
Disk I/O:
- Read/Write Operations
- High sustained I/O may indicate:
- Database queries need optimization
- Insufficient memory (swapping)
- Need for SSD volumes
Database Metrics (Enterprise Aurora Only)
CPU Utilization:
- Normal: < 50%
- Review: 50-80%
- Scale: > 80% sustained
Connections:
- Monitor connection count
- Default max: 100 (adjustable)
- High connections may indicate connection pooling issues
Aurora Capacity Units (ACU):
- Monitor scaling events
- Adjust min/max ACU if frequent scaling
CloudWatch Alarms Setup
Create CPU alarm:
aws cloudwatch put-metric-alarm \
--alarm-name koha-high-cpu \
--alarm-description "Alert when CPU exceeds 80%" \
--metric-name CPUUtilization \
--namespace AWS/EC2 \
--statistic Average \
--period 300 \
--threshold 80 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 2 \
--dimensions Name=InstanceId,Value=i-xxxxx
# Replace i-xxxxx with your instance ID
Create disk space alarm:
# First, install CloudWatch agent
sudo apt-get install -y amazon-cloudwatch-agent
# Configure agent to monitor disk
sudo tee /opt/aws/amazon-cloudwatch-agent/etc/config.json > /dev/null << EOF
{
"metrics": {
"namespace": "Koha/DiskSpace",
"metrics_collected": {
"disk": {
"measurement": [
{"name": "used_percent", "unit": "Percent"}
],
"metrics_collection_interval": 60,
"resources": {
"*": "*"
}
}
}
}
}
EOF
# Start agent
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
-a fetch-config \
-m ec2 \
-s \
-c file:/opt/aws/amazon-cloudwatch-agent/etc/config.json
Log Monitoring
Log Locations
Koha application logs:
/var/log/koha/library/intranet-error.log # Staff interface errors
/var/log/koha/library/opac-error.log # OPAC errors
/var/log/koha/library/plack.log # Plack application server
/var/log/koha/library/plack-error.log # Plack errors
System logs:
/var/log/apache2/error.log # Apache errors
/var/log/apache2/access.log # Access logs
/var/log/mysql/error.log # MySQL errors (Basic/Standard)
/var/log/syslog # System messages
Real-Time Log Monitoring
# Watch all Koha errors
sudo tail -f /var/log/koha/library/*error*.log
# Watch Apache errors
sudo tail -f /var/log/apache2/error.log
# Watch database errors (Basic/Standard)
sudo tail -f /var/log/mysql/error.log
# Search for specific errors
sudo grep -i "error" /var/log/koha/library/*.log | tail -20
sudo grep -i "fatal" /var/log/koha/library/*.log | tail -20
Log Analysis
Check for common issues:
# Database connection errors
sudo grep -c "DBI connect" /var/log/koha/library/*error*.log
# Memory exhaustion
sudo grep -c "Out of memory" /var/log/syslog
# Permission errors
sudo grep -c "Permission denied" /var/log/koha/library/*.log
# 500 errors
sudo grep -c "500" /var/log/apache2/error.log
Log Rotation
Logs are automatically rotated. Check configuration:
# Koha log rotation
cat /etc/logrotate.d/koha-common
# Apache log rotation
cat /etc/logrotate.d/apache2
Typical configuration:
- Rotate: Daily
- Compress: Yes
- Retention: 14 days
- Size limit: 100M per file
CloudWatch Logs Integration (Optional)
Install CloudWatch Logs agent:
# Install agent
sudo apt-get install -y awslogs
# Configure
sudo tee /etc/awslogs/config/koha.conf > /dev/null << EOF
[/var/log/koha/library/intranet-error.log]
datetime_format = %Y-%m-%d %H:%M:%S
file = /var/log/koha/library/intranet-error.log
buffer_duration = 5000
log_stream_name = {instance_id}/koha-intranet-error
initial_position = start_of_file
log_group_name = /koha/application
[/var/log/apache2/error.log]
datetime_format = %Y-%m-%d %H:%M:%S
file = /var/log/apache2/error.log
buffer_duration = 5000
log_stream_name = {instance_id}/apache-error
initial_position = start_of_file
log_group_name = /koha/apache
EOF
# Start service
sudo systemctl start awslogsd
sudo systemctl enable awslogsd
Performance Monitoring
Database Performance (Basic/Standard)
Check slow queries:
# Enable slow query log
sudo mysql -e "SET GLOBAL slow_query_log = 'ON';"
sudo mysql -e "SET GLOBAL long_query_time = 2;" # Log queries > 2 seconds
# View slow queries
sudo mysqldumpslow /var/log/mysql/mysql-slow.log | head -20
Monitor database size:
# Database size
sudo mysql -e "
SELECT
table_schema AS 'Database',
ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) AS 'Size (MB)'
FROM information_schema.tables
WHERE table_schema = 'koha_library'
GROUP BY table_schema;
"
# Largest tables
sudo mysql -e "
SELECT
table_name AS 'Table',
ROUND((data_length + index_length) / 1024 / 1024, 2) AS 'Size (MB)'
FROM information_schema.tables
WHERE table_schema = 'koha_library'
ORDER BY (data_length + index_length) DESC
LIMIT 10;
"
Database Performance (Enterprise Aurora)
Monitor from RDS Console:
- Go to RDS Console
- Select your Aurora cluster
- Click “Monitoring” tab
- Review:
- CPU utilization
- Database connections
- Read/Write IOPS
- Network throughput
Performance Insights:
- Go to RDS Console → Performance Insights
- Analyze slow queries
- Identify bottlenecks
- Review wait events
Apache Performance
Check Apache status:
# Enable Apache status module (if not already enabled)
sudo a2enmod status
# View status
curl http://localhost/server-status
# Monitor active connections
watch -n 1 'curl -s http://localhost/server-status?auto | grep -E "Total Accesses|BusyWorkers|IdleWorkers"'
Apache worker processes:
# Check current configuration
apache2ctl -M | grep -E "mpm_|worker"
# View process count
ps aux | grep apache2 | wc -l
Search Index Performance
Monitor Zebra:
# Check Zebra process
ps aux | grep zebra
# Test search performance
time sudo koha-shell library << EOF
use C4::Search;
my (\$error, \$results) = SimpleSearch("ti:test");
print "Found: ", scalar @\$results, " results\n";
EOF
Rebuild if slow:
# Incremental rebuild
sudo koha-rebuild-zebra -v library
# Full rebuild (if searches are very slow)
sudo koha-rebuild-zebra -f -v library
Automated Health Checks
Create Health Check Script
# Create script
sudo tee /usr/local/bin/koha-health-check.sh > /dev/null << 'EOF'
#!/bin/bash
# Koha Health Check Script
EMAIL="admin@yourlibrary.org"
LOG="/var/log/koha-health-check.log"
ERRORS=0
echo "=== Koha Health Check ===" >> $LOG
date >> $LOG
# Check Koha services
for service in koha-plack koha-worker koha-zebra-daemon apache2; do
if ! systemctl is-active --quiet $service; then
echo "ERROR: $service is not running" >> $LOG
ERRORS=$((ERRORS + 1))
fi
done
# Check MySQL (Basic/Standard only)
if systemctl list-units --type=service --all | grep -q mysql; then
if ! systemctl is-active --quiet mysql; then
echo "ERROR: MySQL is not running" >> $LOG
ERRORS=$((ERRORS + 1))
fi
fi
# Check disk space
DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
if [ $DISK_USAGE -gt 80 ]; then
echo "WARNING: Disk usage is ${DISK_USAGE}%" >> $LOG
ERRORS=$((ERRORS + 1))
fi
# Check memory
MEM_USAGE=$(free | awk 'NR==2 {printf "%.0f", $3/$2*100}')
if [ $MEM_USAGE -gt 90 ]; then
echo "WARNING: Memory usage is ${MEM_USAGE}%" >> $LOG
ERRORS=$((ERRORS + 1))
fi
# Send email if errors found
if [ $ERRORS -gt 0 ]; then
cat $LOG | mail -s "Koha Health Check: $ERRORS issues found" $EMAIL
fi
echo "Health check completed with $ERRORS errors" >> $LOG
echo "---" >> $LOG
EOF
# Make executable
sudo chmod +x /usr/local/bin/koha-health-check.sh
Schedule Health Checks
# Add to crontab (runs every hour)
sudo crontab -e
# Add this line:
0 * * * * /usr/local/bin/koha-health-check.sh
Regular Maintenance Tasks
Daily Tasks
1. Check backups:
# For Standard tier with S3
aws s3 ls s3://your-backup-bucket/ --recursive | tail -5
# For Basic/Free tier
ls -lh /var/lib/koha/backups/ | tail -5
# For Enterprise tier
aws rds describe-db-cluster-snapshots \
--db-cluster-identifier your-cluster \
--query 'DBClusterSnapshots[0:3].[SnapshotCreateTime,DBClusterSnapshotIdentifier,Status]'
2. Review error logs:
# Check for new errors since yesterday
sudo find /var/log/koha/library/ -name "*error*.log" -mtime -1 -exec tail -20 {} \;
3. Monitor disk space:
df -h / | awk 'NR==2 {print "Root: " $5}'
df -h /var/lib/mysql | awk 'NR==2 {print "Database: " $5}' # Basic/Standard
Weekly Tasks
1. Database optimization:
# For Basic/Standard
sudo koha-mysql library << 'EOF'
-- Optimize tables
OPTIMIZE TABLE biblio;
OPTIMIZE TABLE items;
OPTIMIZE TABLE borrowers;
OPTIMIZE TABLE issues;
OPTIMIZE TABLE old_issues;
-- Check fragmentation
SELECT
table_name,
ROUND(data_length / 1024 / 1024, 2) AS data_mb,
ROUND(data_free / 1024 / 1024, 2) AS free_mb,
ROUND((data_free / data_length) * 100, 2) AS fragmentation
FROM information_schema.tables
WHERE table_schema = 'koha_library'
AND data_free > 0
ORDER BY fragmentation DESC;
EOF
2. Clear temporary files:
# Clear old sessions
sudo find /var/lib/koha/library/sessions/ -type f -mtime +7 -delete
# Clear temp files
sudo find /tmp -name "Koha*" -mtime +7 -delete
3. Review system updates:
# Check for security updates
sudo apt-get update
sudo apt list --upgradable
# Apply security updates (during maintenance window)
sudo apt-get upgrade -y
Monthly Tasks
1. Review CloudWatch metrics:
- Check average CPU usage trends
- Review disk I/O patterns
- Analyze network traffic
- Identify performance degradation
2. Database backup test:
# Test backup restoration on test instance
# See Migration Guide for detailed procedures
3. Update documentation:
- Document any configuration changes
- Update runbook procedures
- Review and update alarm thresholds
4. Capacity planning:
- Review growth trends
- Forecast future capacity needs
- Plan for scaling if needed
Quarterly Tasks
1. Security audit:
- Review IAM permissions
- Audit security group rules
- Check for unused resources
- Review access logs
2. Performance review:
- Analyze slow query logs
- Optimize database indices
- Review Apache configuration
- Consider tier upgrade if needed
3. Disaster recovery test:
- Test backup restoration
- Verify recovery procedures
- Update DR documentation
- Train staff on procedures
Maintenance Windows
Scheduling Maintenance
Best practices:
- Schedule during lowest usage (typically weekend evenings)
- Notify users 48-72 hours in advance
- Communicate expected downtime
- Have rollback plan ready
Communication template:
Subject: Scheduled Maintenance - [Date/Time]
Dear Library Users,
We will be performing scheduled maintenance on our library system:
Date: [Day, Month Date, Year]
Time: [Start Time] - [End Time] [Timezone]
Expected Duration: [X hours]
During this time:
- The catalog will be unavailable
- You will not be able to place holds or renew items
- All current loans will remain active
We apologize for any inconvenience.
[Your Library] IT Team
Maintenance Checklist
Before maintenance:
- Announce maintenance window
- Create full backup
- Document current system state
- Prepare rollback procedures
- Test changes in staging (if available)
During maintenance:
- Stop user-facing services (if needed)
- Perform updates/changes
- Test functionality
- Review logs for errors
- Verify services are running
After maintenance:
- Monitor system for 1 hour
- Check error logs
- Verify all services functional
- Announce completion
- Document changes made
Scaling and Optimization
When to Scale Up (Basic/Standard)
Indicators:
- CPU consistently > 70%
- Memory usage > 85%
- Disk I/O wait times increasing
- Response times degrading
- Database slow query log growing
Scaling procedure:
# 1. Create snapshot/backup
# 2. Stop instance
# 3. Change instance type
# 4. Start instance
# 5. Verify functionality
# Via AWS CLI
aws ec2 stop-instances --instance-ids i-xxxxx
aws ec2 modify-instance-attribute \
--instance-id i-xxxxx \
--instance-type m8g.large
aws ec2 start-instances --instance-ids i-xxxxx
When to Scale Up (Enterprise)
Auto Scaling handles instance count automatically.
Adjust Aurora capacity:
# Modify Aurora cluster
aws rds modify-db-cluster \
--db-cluster-identifier your-cluster \
--serverless-v2-scaling-configuration MinCapacity=1.0,MaxCapacity=8.0
Add more instances to ASG:
- Go to EC2 Console → Auto Scaling Groups
- Select your ASG
- Edit → Desired capacity: Increase
- Wait for new instances to launch
- Verify in Target Group health checks
Monitoring Tools Summary
| Tool | Free/Basic | Standard | Enterprise | Purpose |
|---|---|---|---|---|
| CloudWatch Metrics | ✓ | ✓ | ✓ | Basic monitoring |
| CloudWatch Logs | Optional | Optional | Optional | Centralized logging |
| Performance Insights | ✗ | ✗ | ✓ | Aurora query analysis |
| RDS Enhanced Monitoring | ✗ | ✗ | ✓ | Detailed DB metrics |
| Application Load Balancer | ✗ | ✗ | ✓ | Request metrics |
| Health Check Script | ✓ | ✓ | ✓ | Automated checks |
Getting Help
For monitoring assistance:
- Email: support@kohasupport.com
- Subject: “Monitoring Help - [Your Library]”
- Include: Tier, metrics, timeframe
Related Documentation
Last Updated: December 2025