## 📚 Documentation Updates - Add ARCHITECTURE.md: Comprehensive system architecture overview - Add PRESENTATION.md: 16-slide presentation for architecture overview - Update K8S-DEPLOYMENT-GUIDE.md: Refine deployment instructions ## 📊 Architecture Documentation - Executive summary of Site11 platform - Detailed microservices breakdown (30+ services) - Technology stack and deployment patterns - Data flow and event-driven architecture - Security and monitoring strategies ## 🎯 Presentation Materials - Complete slide deck for architecture presentation - Visual diagrams and flow charts - Performance metrics and business impact - Future roadmap (Q1-Q4 2025) 🤖 Generated with [Claude Code](https://claude.ai/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
519 lines
22 KiB
Markdown
519 lines
22 KiB
Markdown
# Site11 Platform Architecture
|
|
|
|
## Executive Summary
|
|
|
|
Site11 is a **large-scale, AI-powered content generation and aggregation platform** built on a microservices architecture. The platform automatically collects, processes, generates, and distributes multi-language content across various domains including news, entertainment, technology, and regional content for multiple countries.
|
|
|
|
### Key Capabilities
|
|
- **Automated Content Pipeline**: 24/7 content generation without human intervention
|
|
- **Multi-language Support**: Content in 8+ languages (Korean, English, Chinese, Japanese, French, German, Spanish, Italian)
|
|
- **Domain-Specific Services**: 30+ specialized microservices for different content domains
|
|
- **Real-time Processing**: Event-driven architecture with Kafka for real-time data flow
|
|
- **Scalable Infrastructure**: Containerized services with Kubernetes deployment support
|
|
|
|
## System Overview
|
|
|
|
### Architecture Pattern
|
|
**Hybrid Microservices Architecture** combining:
|
|
- **API Gateway Pattern**: Console service acts as the central orchestrator
|
|
- **Event-Driven Architecture**: Asynchronous communication via Kafka
|
|
- **Pipeline Architecture**: Multi-stage content processing workflow
|
|
- **Service Mesh Ready**: Prepared for Istio/Linkerd integration
|
|
|
|
### Technology Stack
|
|
|
|
| Layer | Technology | Purpose |
|
|
|-------|------------|---------|
|
|
| **Backend** | FastAPI (Python 3.11) | High-performance async API services |
|
|
| **Frontend** | React 18 + TypeScript + Vite | Modern responsive web interfaces |
|
|
| **Primary Database** | MongoDB 7.0 | Document storage for flexible content |
|
|
| **Cache Layer** | Redis 7 | High-speed caching and queue management |
|
|
| **Message Broker** | Apache Kafka | Event streaming and service communication |
|
|
| **Search Engine** | Apache Solr 9.4 | Full-text search capabilities |
|
|
| **Object Storage** | MinIO | Media and file storage |
|
|
| **Containerization** | Docker & Docker Compose | Service isolation and deployment |
|
|
| **Orchestration** | Kubernetes (Kind/Docker Desktop) | Production deployment and scaling |
|
|
|
|
## Core Services Architecture
|
|
|
|
### 1. Infrastructure Services
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Infrastructure Layer │
|
|
├───────────────┬───────────────┬──────────────┬──────────────┤
|
|
│ MongoDB │ Redis │ Kafka │ MinIO │
|
|
│ (Primary DB) │ (Cache) │ (Events) │ (Storage) │
|
|
├───────────────┼───────────────┼──────────────┼──────────────┤
|
|
│ Port: 27017 │ Port: 6379 │ Port: 9092 │ Port: 9000 │
|
|
└───────────────┴───────────────┴──────────────┴──────────────┘
|
|
```
|
|
|
|
### 2. Core Application Services
|
|
|
|
#### Console Service (API Gateway)
|
|
- **Port**: 8000 (Backend), 3000 (Frontend via Envoy)
|
|
- **Role**: Central orchestrator and monitoring dashboard
|
|
- **Responsibilities**:
|
|
- Service discovery and health monitoring
|
|
- Unified authentication portal
|
|
- Request routing to microservices
|
|
- Real-time metrics aggregation
|
|
|
|
#### Content Services
|
|
- **AI Writer** (8019): AI-powered article generation using Claude API
|
|
- **News Aggregator** (8018): Aggregates content from multiple sources
|
|
- **RSS Feed** (8017): RSS feed collection and management
|
|
- **Google Search** (8016): Search integration for content discovery
|
|
- **Search Service** (8015): Full-text search via Solr
|
|
|
|
#### Support Services
|
|
- **Users** (8007-8008): User management and authentication
|
|
- **OAuth** (8003-8004): OAuth2 authentication provider
|
|
- **Images** (8001-8002): Image processing and caching
|
|
- **Files** (8014): File management with MinIO integration
|
|
- **Notifications** (8013): Email, SMS, and push notifications
|
|
- **Statistics** (8012): Analytics and metrics collection
|
|
|
|
### 3. Pipeline Architecture
|
|
|
|
The pipeline represents the **heart of the content generation system**, processing content through multiple stages:
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────────────┐
|
|
│ Content Pipeline Flow │
|
|
├──────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ [Scheduler] ─────> [RSS Collector] ────> [Google Search] │
|
|
│ │ │ │
|
|
│ │ ▼ │
|
|
│ │ [AI Generator] │
|
|
│ │ │ │
|
|
│ ▼ ▼ │
|
|
│ [Keywords] [Translator] │
|
|
│ Manager │ │
|
|
│ ▼ │
|
|
│ [Image Generator] │
|
|
│ │ │
|
|
│ ▼ │
|
|
│ [Language Sync] │
|
|
│ │
|
|
└──────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
#### Pipeline Components
|
|
|
|
1. **Multi-threaded Scheduler**: Orchestrates the entire pipeline workflow
|
|
2. **Keyword Manager** (API Port 8100): Manages search keywords and topics
|
|
3. **RSS Collector**: Collects content from RSS feeds
|
|
4. **Google Search Worker**: Searches for trending content
|
|
5. **AI Article Generator**: Generates articles using Claude AI
|
|
6. **Translator**: Translates content using DeepL API
|
|
7. **Image Generator**: Creates images for articles
|
|
8. **Language Sync**: Ensures content consistency across languages
|
|
9. **Pipeline Monitor** (Port 8100): Real-time pipeline monitoring dashboard
|
|
|
|
### 4. Domain-Specific Services
|
|
|
|
The platform includes **30+ specialized services** for different content domains:
|
|
|
|
#### Entertainment Services
|
|
- **Artist Services**: blackpink, enhypen, ive, nct, straykids, twice
|
|
- **K-Culture**: Korean cultural content
|
|
- **Media Empire**: Entertainment industry coverage
|
|
|
|
#### Regional Services
|
|
- **Korea** (8020-8021): Korean market content
|
|
- **Japan** (8022-8023): Japanese market content
|
|
- **China** (8024-8025): Chinese market content
|
|
- **USA** (8026-8027): US market content
|
|
|
|
#### Technology Services
|
|
- **AI Service** (8028-8029): AI technology news
|
|
- **Crypto** (8030-8031): Cryptocurrency coverage
|
|
- **Apple** (8032-8033): Apple ecosystem news
|
|
- **Google** (8034-8035): Google technology updates
|
|
- **Samsung** (8036-8037): Samsung product news
|
|
- **LG** (8038-8039): LG technology coverage
|
|
|
|
#### Business Services
|
|
- **WSJ** (8040-8041): Wall Street Journal integration
|
|
- **Musk** (8042-8043): Elon Musk related content
|
|
|
|
## Data Flow Architecture
|
|
|
|
### 1. Content Generation Flow
|
|
|
|
```
|
|
User Request / Scheduled Task
|
|
│
|
|
▼
|
|
[Console API Gateway]
|
|
│
|
|
├──> [Keyword Manager] ──> Topics/Keywords
|
|
│
|
|
▼
|
|
[Pipeline Scheduler]
|
|
│
|
|
├──> [RSS Collector] ──> Feed Content
|
|
├──> [Google Search] ──> Search Results
|
|
│
|
|
▼
|
|
[AI Article Generator]
|
|
│
|
|
├──> [MongoDB] (Store Korean Original)
|
|
│
|
|
▼
|
|
[Translator Service]
|
|
│
|
|
├──> [MongoDB] (Store Translations)
|
|
│
|
|
▼
|
|
[Image Generator]
|
|
│
|
|
├──> [MinIO] (Store Images)
|
|
│
|
|
▼
|
|
[Language Sync]
|
|
│
|
|
└──> [Content Ready for Distribution]
|
|
```
|
|
|
|
### 2. Event-Driven Communication
|
|
|
|
```
|
|
Service A ──[Publish]──> Kafka Topic ──[Subscribe]──> Service B
|
|
│
|
|
├──> Service C
|
|
└──> Service D
|
|
|
|
Topics:
|
|
- content.created
|
|
- content.updated
|
|
- translation.completed
|
|
- image.generated
|
|
- user.activity
|
|
```
|
|
|
|
### 3. Caching Strategy
|
|
|
|
```
|
|
Client Request ──> [Console] ──> [Redis Cache]
|
|
│
|
|
├─ HIT ──> Return Cached
|
|
│
|
|
└─ MISS ──> [Service] ──> [MongoDB]
|
|
│
|
|
└──> Update Cache
|
|
```
|
|
|
|
## Deployment Architecture
|
|
|
|
### 1. Development Environment (Docker Compose)
|
|
|
|
All services run in Docker containers with:
|
|
- **Single docker-compose.yml**: Defines all services
|
|
- **Shared network**: `site11_network` for inter-service communication
|
|
- **Persistent volumes**: Data stored in `./data/` directory
|
|
- **Hot-reload**: Code mounted for development
|
|
|
|
### 2. Production Environment (Kubernetes)
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Kubernetes Cluster │
|
|
├─────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ ┌──────────────────────────────────────────────────┐ │
|
|
│ │ Ingress (Nginx) │ │
|
|
│ └──────────────────────────────────────────────────┘ │
|
|
│ │ │
|
|
│ ┌──────────────────────────────────────────────────┐ │
|
|
│ │ Service Mesh (Optional) │ │
|
|
│ └──────────────────────────────────────────────────┘ │
|
|
│ │ │
|
|
│ ┌───────────────────────┼───────────────────────────┐ │
|
|
│ │ Namespace: site11-core │ │
|
|
│ ├──────────────┬────────────────┬──────────────────┤ │
|
|
│ │ Console │ MongoDB │ Redis │ │
|
|
│ │ Deployment │ StatefulSet │ StatefulSet │ │
|
|
│ └──────────────┴────────────────┴──────────────────┘ │
|
|
│ │
|
|
│ ┌───────────────────────────────────────────────────┐ │
|
|
│ │ Namespace: site11-pipeline │ │
|
|
│ ├──────────────┬────────────────┬──────────────────┤ │
|
|
│ │ Scheduler │ RSS Collector │ AI Generator │ │
|
|
│ │ Deployment │ Deployment │ Deployment │ │
|
|
│ └──────────────┴────────────────┴──────────────────┘ │
|
|
│ │
|
|
│ ┌───────────────────────────────────────────────────┐ │
|
|
│ │ Namespace: site11-services │ │
|
|
│ ├──────────────┬────────────────┬──────────────────┤ │
|
|
│ │ Artist Svcs │ Regional Svcs │ Tech Svcs │ │
|
|
│ │ Deployments │ Deployments │ Deployments │ │
|
|
│ └──────────────┴────────────────┴──────────────────┘ │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### 3. Hybrid Deployment
|
|
|
|
The platform supports **hybrid deployment** combining:
|
|
- **Docker Compose**: For development and small deployments
|
|
- **Kubernetes**: For production scaling
|
|
- **Docker Desktop Kubernetes**: For local K8s testing
|
|
- **Kind**: For lightweight K8s development
|
|
|
|
## Security Architecture
|
|
|
|
### Authentication & Authorization
|
|
```
|
|
┌──────────────────────────────────────────────────────────────┐
|
|
│ Security Flow │
|
|
├──────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ Client ──> [Console Gateway] ──> [OAuth Service] │
|
|
│ │ │ │
|
|
│ │ ▼ │
|
|
│ │ [JWT Generation] │
|
|
│ │ │ │
|
|
│ ▼ ▼ │
|
|
│ [Token Validation] <────── [Token] │
|
|
│ │ │
|
|
│ ▼ │
|
|
│ [Service Access] │
|
|
│ │
|
|
└──────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Security Measures
|
|
- **JWT-based authentication**: Stateless token authentication
|
|
- **Service-to-service auth**: Internal service tokens
|
|
- **Rate limiting**: API Gateway level throttling
|
|
- **CORS configuration**: Controlled cross-origin access
|
|
- **Environment variables**: Sensitive data in `.env` files
|
|
- **Network isolation**: Services communicate within Docker/K8s network
|
|
|
|
## Monitoring & Observability
|
|
|
|
### 1. Health Checks
|
|
Every service implements health endpoints:
|
|
```python
|
|
GET /health
|
|
Response: {"status": "healthy", "service": "service-name"}
|
|
```
|
|
|
|
### 2. Monitoring Stack
|
|
- **Pipeline Monitor**: Real-time pipeline status (Port 8100)
|
|
- **Console Dashboard**: Service health overview
|
|
- **Redis Queue Monitoring**: Queue depth and processing rates
|
|
- **MongoDB Metrics**: Database performance metrics
|
|
|
|
### 3. Logging Strategy
|
|
- Centralized logging with structured JSON format
|
|
- Log levels: DEBUG, INFO, WARNING, ERROR
|
|
- Correlation IDs for distributed tracing
|
|
|
|
## Scalability & Performance
|
|
|
|
### Horizontal Scaling
|
|
- **Stateless services**: Easy horizontal scaling
|
|
- **Load balancing**: Kubernetes service mesh
|
|
- **Auto-scaling**: Based on CPU/memory metrics
|
|
|
|
### Performance Optimizations
|
|
- **Redis caching**: Reduces database load
|
|
- **Async processing**: FastAPI async endpoints
|
|
- **Batch processing**: Pipeline processes in batches
|
|
- **Connection pooling**: Database connection reuse
|
|
- **CDN ready**: Static content delivery
|
|
|
|
### Resource Management
|
|
```yaml
|
|
Resources per Service:
|
|
- CPU: 100m - 500m (request), 1000m (limit)
|
|
- Memory: 128Mi - 512Mi (request), 1Gi (limit)
|
|
- Storage: 1Gi - 10Gi PVC for data services
|
|
```
|
|
|
|
## Development Workflow
|
|
|
|
### 1. Local Development
|
|
```bash
|
|
# Start all services
|
|
docker-compose up -d
|
|
|
|
# Start specific services
|
|
docker-compose up -d console mongodb redis
|
|
|
|
# View logs
|
|
docker-compose logs -f [service-name]
|
|
|
|
# Rebuild after changes
|
|
docker-compose build [service-name]
|
|
docker-compose up -d [service-name]
|
|
```
|
|
|
|
### 2. Testing
|
|
```bash
|
|
# Run unit tests
|
|
docker-compose exec [service-name] pytest
|
|
|
|
# Integration tests
|
|
docker-compose exec [service-name] pytest tests/integration
|
|
|
|
# Load testing
|
|
docker-compose exec [service-name] locust
|
|
```
|
|
|
|
### 3. Deployment
|
|
```bash
|
|
# Development
|
|
./deploy-local.sh
|
|
|
|
# Staging (Kind)
|
|
./deploy-kind.sh
|
|
|
|
# Production (Kubernetes)
|
|
./deploy-k8s.sh
|
|
|
|
# Docker Hub
|
|
./deploy-dockerhub.sh
|
|
```
|
|
|
|
## Key Design Decisions
|
|
|
|
### 1. Microservices over Monolith
|
|
- **Reasoning**: Independent scaling, technology diversity, fault isolation
|
|
- **Trade-off**: Increased complexity, network overhead
|
|
|
|
### 2. MongoDB as Primary Database
|
|
- **Reasoning**: Flexible schema for diverse content types
|
|
- **Trade-off**: Eventual consistency, complex queries
|
|
|
|
### 3. Event-Driven with Kafka
|
|
- **Reasoning**: Decoupling, scalability, real-time processing
|
|
- **Trade-off**: Operational complexity, debugging challenges
|
|
|
|
### 4. Python/FastAPI for Backend
|
|
- **Reasoning**: Async support, fast development, AI library ecosystem
|
|
- **Trade-off**: GIL limitations, performance vs compiled languages
|
|
|
|
### 5. Container-First Approach
|
|
- **Reasoning**: Consistent environments, easy deployment, cloud-native
|
|
- **Trade-off**: Resource overhead, container management
|
|
|
|
## Performance Metrics
|
|
|
|
### Current Capacity (Single Instance)
|
|
- **Content Generation**: 1000+ articles/day
|
|
- **Translation Throughput**: 8 languages simultaneously
|
|
- **API Response Time**: <100ms p50, <500ms p99
|
|
- **Queue Processing**: 100+ jobs/minute
|
|
- **Storage**: Scalable to TBs with MinIO
|
|
|
|
### Scaling Potential
|
|
- **Horizontal**: Each service can scale to 10+ replicas
|
|
- **Vertical**: Services can use up to 4GB RAM, 4 CPUs
|
|
- **Geographic**: Multi-region deployment ready
|
|
|
|
## Future Roadmap
|
|
|
|
### Phase 1: Current State ✅
|
|
- Core microservices architecture
|
|
- Automated content pipeline
|
|
- Multi-language support
|
|
- Basic monitoring
|
|
|
|
### Phase 2: Enhanced Observability (Q1 2025)
|
|
- Prometheus + Grafana integration
|
|
- Distributed tracing with Jaeger
|
|
- ELK stack for logging
|
|
- Advanced alerting
|
|
|
|
### Phase 3: Advanced Features (Q2 2025)
|
|
- Machine Learning pipeline
|
|
- Real-time analytics
|
|
- GraphQL API layer
|
|
- WebSocket support
|
|
|
|
### Phase 4: Enterprise Features (Q3 2025)
|
|
- Multi-tenancy support
|
|
- Advanced RBAC
|
|
- Audit logging
|
|
- Compliance features
|
|
|
|
## Conclusion
|
|
|
|
Site11 represents a **modern, scalable, AI-driven content platform** that leverages:
|
|
- **Microservices architecture** for modularity and scalability
|
|
- **Event-driven design** for real-time processing
|
|
- **Container orchestration** for deployment flexibility
|
|
- **AI integration** for automated content generation
|
|
- **Multi-language support** for global reach
|
|
|
|
The architecture is designed to handle **massive scale**, support **rapid development**, and provide **high availability** while maintaining **operational simplicity** through automation and monitoring.
|
|
|
|
## Appendix
|
|
|
|
### A. Service Port Mapping
|
|
|
|
| Service | Backend Port | Frontend Port | Description |
|
|
|---------|-------------|---------------|-------------|
|
|
| Console | 8000 | 3000 | API Gateway & Dashboard |
|
|
| Users | 8007 | 8008 | User Management |
|
|
| OAuth | 8003 | 8004 | Authentication |
|
|
| Images | 8001 | 8002 | Image Processing |
|
|
| Statistics | 8012 | - | Analytics |
|
|
| Notifications | 8013 | - | Alerts & Messages |
|
|
| Files | 8014 | - | File Storage |
|
|
| Search | 8015 | - | Full-text Search |
|
|
| Google Search | 8016 | - | Search Integration |
|
|
| RSS Feed | 8017 | - | RSS Management |
|
|
| News Aggregator | 8018 | - | Content Aggregation |
|
|
| AI Writer | 8019 | - | AI Content Generation |
|
|
| Pipeline Monitor | 8100 | - | Pipeline Dashboard |
|
|
| Keyword Manager | 8100 | - | Keyword API |
|
|
|
|
### B. Environment Variables
|
|
|
|
Key configuration managed through `.env`:
|
|
- Database connections (MongoDB, Redis)
|
|
- API keys (Claude, DeepL, Google)
|
|
- Service URLs and ports
|
|
- JWT secrets
|
|
- Cache TTLs
|
|
|
|
### C. Database Schema
|
|
|
|
MongoDB Collections:
|
|
- `users`: User profiles and authentication
|
|
- `articles_[lang]`: Articles by language
|
|
- `keywords`: Search keywords and topics
|
|
- `rss_feeds`: RSS feed configurations
|
|
- `statistics`: Analytics data
|
|
- `files`: File metadata
|
|
|
|
### D. API Documentation
|
|
|
|
All services provide OpenAPI/Swagger documentation at:
|
|
```
|
|
http://[service-url]/docs
|
|
```
|
|
|
|
### E. Deployment Scripts
|
|
|
|
| Script | Purpose |
|
|
|--------|---------|
|
|
| `deploy-local.sh` | Local Docker Compose deployment |
|
|
| `deploy-kind.sh` | Kind Kubernetes deployment |
|
|
| `deploy-docker-desktop.sh` | Docker Desktop K8s deployment |
|
|
| `deploy-dockerhub.sh` | Push images to Docker Hub |
|
|
| `backup-mongodb.sh` | MongoDB backup utility |
|
|
|
|
---
|
|
|
|
**Document Version**: 1.0.0
|
|
**Last Updated**: September 2025
|
|
**Platform Version**: Site11 v1.0
|
|
**Architecture Review**: Approved for Production |