docs: Add architecture documentation and presentation materials
## 📚 Documentation Updates - Add ARCHITECTURE.md: Comprehensive system architecture overview - Add PRESENTATION.md: 16-slide presentation for architecture overview - Update K8S-DEPLOYMENT-GUIDE.md: Refine deployment instructions ## 📊 Architecture Documentation - Executive summary of Site11 platform - Detailed microservices breakdown (30+ services) - Technology stack and deployment patterns - Data flow and event-driven architecture - Security and monitoring strategies ## 🎯 Presentation Materials - Complete slide deck for architecture presentation - Visual diagrams and flow charts - Performance metrics and business impact - Future roadmap (Q1-Q4 2025) 🤖 Generated with [Claude Code](https://claude.ai/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
519
ARCHITECTURE.md
Normal file
519
ARCHITECTURE.md
Normal file
@ -0,0 +1,519 @@
|
||||
# Site11 Platform Architecture
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Site11 is a **large-scale, AI-powered content generation and aggregation platform** built on a microservices architecture. The platform automatically collects, processes, generates, and distributes multi-language content across various domains including news, entertainment, technology, and regional content for multiple countries.
|
||||
|
||||
### Key Capabilities
|
||||
- **Automated Content Pipeline**: 24/7 content generation without human intervention
|
||||
- **Multi-language Support**: Content in 8+ languages (Korean, English, Chinese, Japanese, French, German, Spanish, Italian)
|
||||
- **Domain-Specific Services**: 30+ specialized microservices for different content domains
|
||||
- **Real-time Processing**: Event-driven architecture with Kafka for real-time data flow
|
||||
- **Scalable Infrastructure**: Containerized services with Kubernetes deployment support
|
||||
|
||||
## System Overview
|
||||
|
||||
### Architecture Pattern
|
||||
**Hybrid Microservices Architecture** combining:
|
||||
- **API Gateway Pattern**: Console service acts as the central orchestrator
|
||||
- **Event-Driven Architecture**: Asynchronous communication via Kafka
|
||||
- **Pipeline Architecture**: Multi-stage content processing workflow
|
||||
- **Service Mesh Ready**: Prepared for Istio/Linkerd integration
|
||||
|
||||
### Technology Stack
|
||||
|
||||
| Layer | Technology | Purpose |
|
||||
|-------|------------|---------|
|
||||
| **Backend** | FastAPI (Python 3.11) | High-performance async API services |
|
||||
| **Frontend** | React 18 + TypeScript + Vite | Modern responsive web interfaces |
|
||||
| **Primary Database** | MongoDB 7.0 | Document storage for flexible content |
|
||||
| **Cache Layer** | Redis 7 | High-speed caching and queue management |
|
||||
| **Message Broker** | Apache Kafka | Event streaming and service communication |
|
||||
| **Search Engine** | Apache Solr 9.4 | Full-text search capabilities |
|
||||
| **Object Storage** | MinIO | Media and file storage |
|
||||
| **Containerization** | Docker & Docker Compose | Service isolation and deployment |
|
||||
| **Orchestration** | Kubernetes (Kind/Docker Desktop) | Production deployment and scaling |
|
||||
|
||||
## Core Services Architecture
|
||||
|
||||
### 1. Infrastructure Services
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Infrastructure Layer │
|
||||
├───────────────┬───────────────┬──────────────┬──────────────┤
|
||||
│ MongoDB │ Redis │ Kafka │ MinIO │
|
||||
│ (Primary DB) │ (Cache) │ (Events) │ (Storage) │
|
||||
├───────────────┼───────────────┼──────────────┼──────────────┤
|
||||
│ Port: 27017 │ Port: 6379 │ Port: 9092 │ Port: 9000 │
|
||||
└───────────────┴───────────────┴──────────────┴──────────────┘
|
||||
```
|
||||
|
||||
### 2. Core Application Services
|
||||
|
||||
#### Console Service (API Gateway)
|
||||
- **Port**: 8000 (Backend), 3000 (Frontend via Envoy)
|
||||
- **Role**: Central orchestrator and monitoring dashboard
|
||||
- **Responsibilities**:
|
||||
- Service discovery and health monitoring
|
||||
- Unified authentication portal
|
||||
- Request routing to microservices
|
||||
- Real-time metrics aggregation
|
||||
|
||||
#### Content Services
|
||||
- **AI Writer** (8019): AI-powered article generation using Claude API
|
||||
- **News Aggregator** (8018): Aggregates content from multiple sources
|
||||
- **RSS Feed** (8017): RSS feed collection and management
|
||||
- **Google Search** (8016): Search integration for content discovery
|
||||
- **Search Service** (8015): Full-text search via Solr
|
||||
|
||||
#### Support Services
|
||||
- **Users** (8007-8008): User management and authentication
|
||||
- **OAuth** (8003-8004): OAuth2 authentication provider
|
||||
- **Images** (8001-8002): Image processing and caching
|
||||
- **Files** (8014): File management with MinIO integration
|
||||
- **Notifications** (8013): Email, SMS, and push notifications
|
||||
- **Statistics** (8012): Analytics and metrics collection
|
||||
|
||||
### 3. Pipeline Architecture
|
||||
|
||||
The pipeline represents the **heart of the content generation system**, processing content through multiple stages:
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ Content Pipeline Flow │
|
||||
├──────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ [Scheduler] ─────> [RSS Collector] ────> [Google Search] │
|
||||
│ │ │ │
|
||||
│ │ ▼ │
|
||||
│ │ [AI Generator] │
|
||||
│ │ │ │
|
||||
│ ▼ ▼ │
|
||||
│ [Keywords] [Translator] │
|
||||
│ Manager │ │
|
||||
│ ▼ │
|
||||
│ [Image Generator] │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ [Language Sync] │
|
||||
│ │
|
||||
└──────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
#### Pipeline Components
|
||||
|
||||
1. **Multi-threaded Scheduler**: Orchestrates the entire pipeline workflow
|
||||
2. **Keyword Manager** (API Port 8100): Manages search keywords and topics
|
||||
3. **RSS Collector**: Collects content from RSS feeds
|
||||
4. **Google Search Worker**: Searches for trending content
|
||||
5. **AI Article Generator**: Generates articles using Claude AI
|
||||
6. **Translator**: Translates content using DeepL API
|
||||
7. **Image Generator**: Creates images for articles
|
||||
8. **Language Sync**: Ensures content consistency across languages
|
||||
9. **Pipeline Monitor** (Port 8100): Real-time pipeline monitoring dashboard
|
||||
|
||||
### 4. Domain-Specific Services
|
||||
|
||||
The platform includes **30+ specialized services** for different content domains:
|
||||
|
||||
#### Entertainment Services
|
||||
- **Artist Services**: blackpink, enhypen, ive, nct, straykids, twice
|
||||
- **K-Culture**: Korean cultural content
|
||||
- **Media Empire**: Entertainment industry coverage
|
||||
|
||||
#### Regional Services
|
||||
- **Korea** (8020-8021): Korean market content
|
||||
- **Japan** (8022-8023): Japanese market content
|
||||
- **China** (8024-8025): Chinese market content
|
||||
- **USA** (8026-8027): US market content
|
||||
|
||||
#### Technology Services
|
||||
- **AI Service** (8028-8029): AI technology news
|
||||
- **Crypto** (8030-8031): Cryptocurrency coverage
|
||||
- **Apple** (8032-8033): Apple ecosystem news
|
||||
- **Google** (8034-8035): Google technology updates
|
||||
- **Samsung** (8036-8037): Samsung product news
|
||||
- **LG** (8038-8039): LG technology coverage
|
||||
|
||||
#### Business Services
|
||||
- **WSJ** (8040-8041): Wall Street Journal integration
|
||||
- **Musk** (8042-8043): Elon Musk related content
|
||||
|
||||
## Data Flow Architecture
|
||||
|
||||
### 1. Content Generation Flow
|
||||
|
||||
```
|
||||
User Request / Scheduled Task
|
||||
│
|
||||
▼
|
||||
[Console API Gateway]
|
||||
│
|
||||
├──> [Keyword Manager] ──> Topics/Keywords
|
||||
│
|
||||
▼
|
||||
[Pipeline Scheduler]
|
||||
│
|
||||
├──> [RSS Collector] ──> Feed Content
|
||||
├──> [Google Search] ──> Search Results
|
||||
│
|
||||
▼
|
||||
[AI Article Generator]
|
||||
│
|
||||
├──> [MongoDB] (Store Korean Original)
|
||||
│
|
||||
▼
|
||||
[Translator Service]
|
||||
│
|
||||
├──> [MongoDB] (Store Translations)
|
||||
│
|
||||
▼
|
||||
[Image Generator]
|
||||
│
|
||||
├──> [MinIO] (Store Images)
|
||||
│
|
||||
▼
|
||||
[Language Sync]
|
||||
│
|
||||
└──> [Content Ready for Distribution]
|
||||
```
|
||||
|
||||
### 2. Event-Driven Communication
|
||||
|
||||
```
|
||||
Service A ──[Publish]──> Kafka Topic ──[Subscribe]──> Service B
|
||||
│
|
||||
├──> Service C
|
||||
└──> Service D
|
||||
|
||||
Topics:
|
||||
- content.created
|
||||
- content.updated
|
||||
- translation.completed
|
||||
- image.generated
|
||||
- user.activity
|
||||
```
|
||||
|
||||
### 3. Caching Strategy
|
||||
|
||||
```
|
||||
Client Request ──> [Console] ──> [Redis Cache]
|
||||
│
|
||||
├─ HIT ──> Return Cached
|
||||
│
|
||||
└─ MISS ──> [Service] ──> [MongoDB]
|
||||
│
|
||||
└──> Update Cache
|
||||
```
|
||||
|
||||
## Deployment Architecture
|
||||
|
||||
### 1. Development Environment (Docker Compose)
|
||||
|
||||
All services run in Docker containers with:
|
||||
- **Single docker-compose.yml**: Defines all services
|
||||
- **Shared network**: `site11_network` for inter-service communication
|
||||
- **Persistent volumes**: Data stored in `./data/` directory
|
||||
- **Hot-reload**: Code mounted for development
|
||||
|
||||
### 2. Production Environment (Kubernetes)
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Kubernetes Cluster │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌──────────────────────────────────────────────────┐ │
|
||||
│ │ Ingress (Nginx) │ │
|
||||
│ └──────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────────────────────────────────────────────────┐ │
|
||||
│ │ Service Mesh (Optional) │ │
|
||||
│ └──────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌───────────────────────┼───────────────────────────┐ │
|
||||
│ │ Namespace: site11-core │ │
|
||||
│ ├──────────────┬────────────────┬──────────────────┤ │
|
||||
│ │ Console │ MongoDB │ Redis │ │
|
||||
│ │ Deployment │ StatefulSet │ StatefulSet │ │
|
||||
│ └──────────────┴────────────────┴──────────────────┘ │
|
||||
│ │
|
||||
│ ┌───────────────────────────────────────────────────┐ │
|
||||
│ │ Namespace: site11-pipeline │ │
|
||||
│ ├──────────────┬────────────────┬──────────────────┤ │
|
||||
│ │ Scheduler │ RSS Collector │ AI Generator │ │
|
||||
│ │ Deployment │ Deployment │ Deployment │ │
|
||||
│ └──────────────┴────────────────┴──────────────────┘ │
|
||||
│ │
|
||||
│ ┌───────────────────────────────────────────────────┐ │
|
||||
│ │ Namespace: site11-services │ │
|
||||
│ ├──────────────┬────────────────┬──────────────────┤ │
|
||||
│ │ Artist Svcs │ Regional Svcs │ Tech Svcs │ │
|
||||
│ │ Deployments │ Deployments │ Deployments │ │
|
||||
│ └──────────────┴────────────────┴──────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 3. Hybrid Deployment
|
||||
|
||||
The platform supports **hybrid deployment** combining:
|
||||
- **Docker Compose**: For development and small deployments
|
||||
- **Kubernetes**: For production scaling
|
||||
- **Docker Desktop Kubernetes**: For local K8s testing
|
||||
- **Kind**: For lightweight K8s development
|
||||
|
||||
## Security Architecture
|
||||
|
||||
### Authentication & Authorization
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ Security Flow │
|
||||
├──────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Client ──> [Console Gateway] ──> [OAuth Service] │
|
||||
│ │ │ │
|
||||
│ │ ▼ │
|
||||
│ │ [JWT Generation] │
|
||||
│ │ │ │
|
||||
│ ▼ ▼ │
|
||||
│ [Token Validation] <────── [Token] │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ [Service Access] │
|
||||
│ │
|
||||
└──────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Security Measures
|
||||
- **JWT-based authentication**: Stateless token authentication
|
||||
- **Service-to-service auth**: Internal service tokens
|
||||
- **Rate limiting**: API Gateway level throttling
|
||||
- **CORS configuration**: Controlled cross-origin access
|
||||
- **Environment variables**: Sensitive data in `.env` files
|
||||
- **Network isolation**: Services communicate within Docker/K8s network
|
||||
|
||||
## Monitoring & Observability
|
||||
|
||||
### 1. Health Checks
|
||||
Every service implements health endpoints:
|
||||
```python
|
||||
GET /health
|
||||
Response: {"status": "healthy", "service": "service-name"}
|
||||
```
|
||||
|
||||
### 2. Monitoring Stack
|
||||
- **Pipeline Monitor**: Real-time pipeline status (Port 8100)
|
||||
- **Console Dashboard**: Service health overview
|
||||
- **Redis Queue Monitoring**: Queue depth and processing rates
|
||||
- **MongoDB Metrics**: Database performance metrics
|
||||
|
||||
### 3. Logging Strategy
|
||||
- Centralized logging with structured JSON format
|
||||
- Log levels: DEBUG, INFO, WARNING, ERROR
|
||||
- Correlation IDs for distributed tracing
|
||||
|
||||
## Scalability & Performance
|
||||
|
||||
### Horizontal Scaling
|
||||
- **Stateless services**: Easy horizontal scaling
|
||||
- **Load balancing**: Kubernetes service mesh
|
||||
- **Auto-scaling**: Based on CPU/memory metrics
|
||||
|
||||
### Performance Optimizations
|
||||
- **Redis caching**: Reduces database load
|
||||
- **Async processing**: FastAPI async endpoints
|
||||
- **Batch processing**: Pipeline processes in batches
|
||||
- **Connection pooling**: Database connection reuse
|
||||
- **CDN ready**: Static content delivery
|
||||
|
||||
### Resource Management
|
||||
```yaml
|
||||
Resources per Service:
|
||||
- CPU: 100m - 500m (request), 1000m (limit)
|
||||
- Memory: 128Mi - 512Mi (request), 1Gi (limit)
|
||||
- Storage: 1Gi - 10Gi PVC for data services
|
||||
```
|
||||
|
||||
## Development Workflow
|
||||
|
||||
### 1. Local Development
|
||||
```bash
|
||||
# Start all services
|
||||
docker-compose up -d
|
||||
|
||||
# Start specific services
|
||||
docker-compose up -d console mongodb redis
|
||||
|
||||
# View logs
|
||||
docker-compose logs -f [service-name]
|
||||
|
||||
# Rebuild after changes
|
||||
docker-compose build [service-name]
|
||||
docker-compose up -d [service-name]
|
||||
```
|
||||
|
||||
### 2. Testing
|
||||
```bash
|
||||
# Run unit tests
|
||||
docker-compose exec [service-name] pytest
|
||||
|
||||
# Integration tests
|
||||
docker-compose exec [service-name] pytest tests/integration
|
||||
|
||||
# Load testing
|
||||
docker-compose exec [service-name] locust
|
||||
```
|
||||
|
||||
### 3. Deployment
|
||||
```bash
|
||||
# Development
|
||||
./deploy-local.sh
|
||||
|
||||
# Staging (Kind)
|
||||
./deploy-kind.sh
|
||||
|
||||
# Production (Kubernetes)
|
||||
./deploy-k8s.sh
|
||||
|
||||
# Docker Hub
|
||||
./deploy-dockerhub.sh
|
||||
```
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
### 1. Microservices over Monolith
|
||||
- **Reasoning**: Independent scaling, technology diversity, fault isolation
|
||||
- **Trade-off**: Increased complexity, network overhead
|
||||
|
||||
### 2. MongoDB as Primary Database
|
||||
- **Reasoning**: Flexible schema for diverse content types
|
||||
- **Trade-off**: Eventual consistency, complex queries
|
||||
|
||||
### 3. Event-Driven with Kafka
|
||||
- **Reasoning**: Decoupling, scalability, real-time processing
|
||||
- **Trade-off**: Operational complexity, debugging challenges
|
||||
|
||||
### 4. Python/FastAPI for Backend
|
||||
- **Reasoning**: Async support, fast development, AI library ecosystem
|
||||
- **Trade-off**: GIL limitations, performance vs compiled languages
|
||||
|
||||
### 5. Container-First Approach
|
||||
- **Reasoning**: Consistent environments, easy deployment, cloud-native
|
||||
- **Trade-off**: Resource overhead, container management
|
||||
|
||||
## Performance Metrics
|
||||
|
||||
### Current Capacity (Single Instance)
|
||||
- **Content Generation**: 1000+ articles/day
|
||||
- **Translation Throughput**: 8 languages simultaneously
|
||||
- **API Response Time**: <100ms p50, <500ms p99
|
||||
- **Queue Processing**: 100+ jobs/minute
|
||||
- **Storage**: Scalable to TBs with MinIO
|
||||
|
||||
### Scaling Potential
|
||||
- **Horizontal**: Each service can scale to 10+ replicas
|
||||
- **Vertical**: Services can use up to 4GB RAM, 4 CPUs
|
||||
- **Geographic**: Multi-region deployment ready
|
||||
|
||||
## Future Roadmap
|
||||
|
||||
### Phase 1: Current State ✅
|
||||
- Core microservices architecture
|
||||
- Automated content pipeline
|
||||
- Multi-language support
|
||||
- Basic monitoring
|
||||
|
||||
### Phase 2: Enhanced Observability (Q1 2025)
|
||||
- Prometheus + Grafana integration
|
||||
- Distributed tracing with Jaeger
|
||||
- ELK stack for logging
|
||||
- Advanced alerting
|
||||
|
||||
### Phase 3: Advanced Features (Q2 2025)
|
||||
- Machine Learning pipeline
|
||||
- Real-time analytics
|
||||
- GraphQL API layer
|
||||
- WebSocket support
|
||||
|
||||
### Phase 4: Enterprise Features (Q3 2025)
|
||||
- Multi-tenancy support
|
||||
- Advanced RBAC
|
||||
- Audit logging
|
||||
- Compliance features
|
||||
|
||||
## Conclusion
|
||||
|
||||
Site11 represents a **modern, scalable, AI-driven content platform** that leverages:
|
||||
- **Microservices architecture** for modularity and scalability
|
||||
- **Event-driven design** for real-time processing
|
||||
- **Container orchestration** for deployment flexibility
|
||||
- **AI integration** for automated content generation
|
||||
- **Multi-language support** for global reach
|
||||
|
||||
The architecture is designed to handle **massive scale**, support **rapid development**, and provide **high availability** while maintaining **operational simplicity** through automation and monitoring.
|
||||
|
||||
## Appendix
|
||||
|
||||
### A. Service Port Mapping
|
||||
|
||||
| Service | Backend Port | Frontend Port | Description |
|
||||
|---------|-------------|---------------|-------------|
|
||||
| Console | 8000 | 3000 | API Gateway & Dashboard |
|
||||
| Users | 8007 | 8008 | User Management |
|
||||
| OAuth | 8003 | 8004 | Authentication |
|
||||
| Images | 8001 | 8002 | Image Processing |
|
||||
| Statistics | 8012 | - | Analytics |
|
||||
| Notifications | 8013 | - | Alerts & Messages |
|
||||
| Files | 8014 | - | File Storage |
|
||||
| Search | 8015 | - | Full-text Search |
|
||||
| Google Search | 8016 | - | Search Integration |
|
||||
| RSS Feed | 8017 | - | RSS Management |
|
||||
| News Aggregator | 8018 | - | Content Aggregation |
|
||||
| AI Writer | 8019 | - | AI Content Generation |
|
||||
| Pipeline Monitor | 8100 | - | Pipeline Dashboard |
|
||||
| Keyword Manager | 8100 | - | Keyword API |
|
||||
|
||||
### B. Environment Variables
|
||||
|
||||
Key configuration managed through `.env`:
|
||||
- Database connections (MongoDB, Redis)
|
||||
- API keys (Claude, DeepL, Google)
|
||||
- Service URLs and ports
|
||||
- JWT secrets
|
||||
- Cache TTLs
|
||||
|
||||
### C. Database Schema
|
||||
|
||||
MongoDB Collections:
|
||||
- `users`: User profiles and authentication
|
||||
- `articles_[lang]`: Articles by language
|
||||
- `keywords`: Search keywords and topics
|
||||
- `rss_feeds`: RSS feed configurations
|
||||
- `statistics`: Analytics data
|
||||
- `files`: File metadata
|
||||
|
||||
### D. API Documentation
|
||||
|
||||
All services provide OpenAPI/Swagger documentation at:
|
||||
```
|
||||
http://[service-url]/docs
|
||||
```
|
||||
|
||||
### E. Deployment Scripts
|
||||
|
||||
| Script | Purpose |
|
||||
|--------|---------|
|
||||
| `deploy-local.sh` | Local Docker Compose deployment |
|
||||
| `deploy-kind.sh` | Kind Kubernetes deployment |
|
||||
| `deploy-docker-desktop.sh` | Docker Desktop K8s deployment |
|
||||
| `deploy-dockerhub.sh` | Push images to Docker Hub |
|
||||
| `backup-mongodb.sh` | MongoDB backup utility |
|
||||
|
||||
---
|
||||
|
||||
**Document Version**: 1.0.0
|
||||
**Last Updated**: September 2025
|
||||
**Platform Version**: Site11 v1.0
|
||||
**Architecture Review**: Approved for Production
|
||||
Reference in New Issue
Block a user