Files
site11/docs/DATA_PERSISTENCE.md
jungwoo choi dd165454f0 feat: Add Step 13 - Search System with Apache Solr and Data Persistence
- Implemented search service with Apache Solr instead of Elasticsearch
- Added full-text search, faceted search, and autocomplete capabilities
- Created data indexer for synchronizing data from MongoDB/Kafka to Solr
- Configured external volume mounts for all data services:
  - MongoDB, Redis, Kafka, Zookeeper, MinIO, Solr
  - All data now persists in ./data/ directory
- Added comprehensive search API endpoints
- Created documentation for data persistence and backup strategies

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-11 20:27:02 +09:00

140 lines
3.2 KiB
Markdown

# Data Persistence Configuration
## Overview
All data services are configured to use bind mounts to local directories for data persistence. This ensures data survives container restarts and rebuilds.
## Directory Structure
```
data/
├── mongodb/ # MongoDB database files
├── redis/ # Redis persistence files
├── kafka/ # Kafka log data
├── zookeeper/ # Zookeeper data and logs
│ ├── data/
│ └── logs/
├── minio/ # MinIO object storage
├── solr/ # Solr search index
├── files-temp/ # Temporary file storage
└── images-cache/ # Image processing cache
```
## Volume Mappings
### MongoDB
- `./data/mongodb:/data/db` - Database files
- `./data/mongodb/configdb:/data/configdb` - Configuration database
### Redis
- `./data/redis:/data` - RDB snapshots and AOF logs
### Kafka
- `./data/kafka:/var/lib/kafka/data` - Message logs
### Zookeeper
- `./data/zookeeper/data:/var/lib/zookeeper/data` - Coordination data
- `./data/zookeeper/logs:/var/lib/zookeeper/log` - Transaction logs
### MinIO
- `./data/minio:/data` - Object storage buckets
### Solr
- `./data/solr:/var/solr` - Search index and configuration
### Application Caches
- `./data/files-temp:/tmp` - Temporary file processing
- `./data/images-cache:/app/cache` - Processed image cache
## Backup and Restore
### Backup All Data
```bash
# Stop services
docker-compose down
# Create backup
tar -czf backup-$(date +%Y%m%d).tar.gz data/
# Restart services
docker-compose up -d
```
### Restore Data
```bash
# Stop services
docker-compose down
# Extract backup
tar -xzf backup-YYYYMMDD.tar.gz
# Restart services
docker-compose up -d
```
### Individual Service Backups
#### MongoDB Backup
```bash
docker exec site11_mongodb mongodump --out /data/db/backup
tar -czf mongodb-backup.tar.gz data/mongodb/backup/
```
#### Redis Backup
```bash
docker exec site11_redis redis-cli BGSAVE
# Wait for completion
cp data/redis/dump.rdb redis-backup-$(date +%Y%m%d).rdb
```
## Permissions
Ensure proper permissions for data directories:
```bash
# Set appropriate permissions
chmod -R 755 data/
```
## Disk Space Monitoring
Monitor disk usage regularly:
```bash
# Check data directory size
du -sh data/*
# Check individual services
du -sh data/mongodb
du -sh data/minio
du -sh data/kafka
```
## Clean Up Old Data
### Clear Kafka Logs (older than 7 days)
```bash
docker exec site11_kafka kafka-log-dirs.sh --describe --bootstrap-server localhost:9092
```
### Clear Image Cache
```bash
rm -rf data/images-cache/*
```
### Clear Temporary Files
```bash
rm -rf data/files-temp/*
```
## Migration from Docker Volumes
If migrating from named Docker volumes to bind mounts:
1. Export data from Docker volumes:
```bash
docker run --rm -v site11_mongodb_data:/source -v $(pwd)/data/mongodb:/dest alpine cp -av /source/. /dest/
```
2. Update docker-compose.yml (already done)
3. Restart services with new configuration
## Notes
- The `data/` directory is excluded from git via .gitignore
- Ensure sufficient disk space for data growth
- Consider setting up automated backups for production
- Monitor disk I/O performance for database services