# Site11 Platform Architecture ## Executive Summary Site11 is a **large-scale, AI-powered content generation and aggregation platform** built on a microservices architecture. The platform automatically collects, processes, generates, and distributes multi-language content across various domains including news, entertainment, technology, and regional content for multiple countries. ### Key Capabilities - **Automated Content Pipeline**: 24/7 content generation without human intervention - **Multi-language Support**: Content in 8+ languages (Korean, English, Chinese, Japanese, French, German, Spanish, Italian) - **Domain-Specific Services**: 30+ specialized microservices for different content domains - **Real-time Processing**: Event-driven architecture with Kafka for real-time data flow - **Scalable Infrastructure**: Containerized services with Kubernetes deployment support ## System Overview ### Architecture Pattern **Hybrid Microservices Architecture** combining: - **API Gateway Pattern**: Console service acts as the central orchestrator - **Event-Driven Architecture**: Asynchronous communication via Kafka - **Pipeline Architecture**: Multi-stage content processing workflow - **Service Mesh Ready**: Prepared for Istio/Linkerd integration ### Technology Stack | Layer | Technology | Purpose | |-------|------------|---------| | **Backend** | FastAPI (Python 3.11) | High-performance async API services | | **Frontend** | React 18 + TypeScript + Vite | Modern responsive web interfaces | | **Primary Database** | MongoDB 7.0 | Document storage for flexible content | | **Cache Layer** | Redis 7 | High-speed caching and queue management | | **Message Broker** | Apache Kafka | Event streaming and service communication | | **Search Engine** | Apache Solr 9.4 | Full-text search capabilities | | **Object Storage** | MinIO | Media and file storage | | **Containerization** | Docker & Docker Compose | Service isolation and deployment | | **Orchestration** | Kubernetes (Kind/Docker Desktop) | Production deployment and scaling | ## Core Services Architecture ### 1. Infrastructure Services ``` ┌─────────────────────────────────────────────────────────────┐ │ Infrastructure Layer │ ├───────────────┬───────────────┬──────────────┬──────────────┤ │ MongoDB │ Redis │ Kafka │ MinIO │ │ (Primary DB) │ (Cache) │ (Events) │ (Storage) │ ├───────────────┼───────────────┼──────────────┼──────────────┤ │ Port: 27017 │ Port: 6379 │ Port: 9092 │ Port: 9000 │ └───────────────┴───────────────┴──────────────┴──────────────┘ ``` ### 2. Core Application Services #### Console Service (API Gateway) - **Port**: 8000 (Backend), 3000 (Frontend via Envoy) - **Role**: Central orchestrator and monitoring dashboard - **Responsibilities**: - Service discovery and health monitoring - Unified authentication portal - Request routing to microservices - Real-time metrics aggregation #### Content Services - **AI Writer** (8019): AI-powered article generation using Claude API - **News Aggregator** (8018): Aggregates content from multiple sources - **RSS Feed** (8017): RSS feed collection and management - **Google Search** (8016): Search integration for content discovery - **Search Service** (8015): Full-text search via Solr #### Support Services - **Users** (8007-8008): User management and authentication - **OAuth** (8003-8004): OAuth2 authentication provider - **Images** (8001-8002): Image processing and caching - **Files** (8014): File management with MinIO integration - **Notifications** (8013): Email, SMS, and push notifications - **Statistics** (8012): Analytics and metrics collection ### 3. Pipeline Architecture The pipeline represents the **heart of the content generation system**, processing content through multiple stages: ``` ┌──────────────────────────────────────────────────────────────┐ │ Content Pipeline Flow │ ├──────────────────────────────────────────────────────────────┤ │ │ │ [Scheduler] ─────> [RSS Collector] ────> [Google Search] │ │ │ │ │ │ │ ▼ │ │ │ [AI Generator] │ │ │ │ │ │ ▼ ▼ │ │ [Keywords] [Translator] │ │ Manager │ │ │ ▼ │ │ [Image Generator] │ │ │ │ │ ▼ │ │ [Language Sync] │ │ │ └──────────────────────────────────────────────────────────────┘ ``` #### Pipeline Components 1. **Multi-threaded Scheduler**: Orchestrates the entire pipeline workflow 2. **Keyword Manager** (API Port 8100): Manages search keywords and topics 3. **RSS Collector**: Collects content from RSS feeds 4. **Google Search Worker**: Searches for trending content 5. **AI Article Generator**: Generates articles using Claude AI 6. **Translator**: Translates content using DeepL API 7. **Image Generator**: Creates images for articles 8. **Language Sync**: Ensures content consistency across languages 9. **Pipeline Monitor** (Port 8100): Real-time pipeline monitoring dashboard ### 4. Domain-Specific Services The platform includes **30+ specialized services** for different content domains: #### Entertainment Services - **Artist Services**: blackpink, enhypen, ive, nct, straykids, twice - **K-Culture**: Korean cultural content - **Media Empire**: Entertainment industry coverage #### Regional Services - **Korea** (8020-8021): Korean market content - **Japan** (8022-8023): Japanese market content - **China** (8024-8025): Chinese market content - **USA** (8026-8027): US market content #### Technology Services - **AI Service** (8028-8029): AI technology news - **Crypto** (8030-8031): Cryptocurrency coverage - **Apple** (8032-8033): Apple ecosystem news - **Google** (8034-8035): Google technology updates - **Samsung** (8036-8037): Samsung product news - **LG** (8038-8039): LG technology coverage #### Business Services - **WSJ** (8040-8041): Wall Street Journal integration - **Musk** (8042-8043): Elon Musk related content ## Data Flow Architecture ### 1. Content Generation Flow ``` User Request / Scheduled Task │ ▼ [Console API Gateway] │ ├──> [Keyword Manager] ──> Topics/Keywords │ ▼ [Pipeline Scheduler] │ ├──> [RSS Collector] ──> Feed Content ├──> [Google Search] ──> Search Results │ ▼ [AI Article Generator] │ ├──> [MongoDB] (Store Korean Original) │ ▼ [Translator Service] │ ├──> [MongoDB] (Store Translations) │ ▼ [Image Generator] │ ├──> [MinIO] (Store Images) │ ▼ [Language Sync] │ └──> [Content Ready for Distribution] ``` ### 2. Event-Driven Communication ``` Service A ──[Publish]──> Kafka Topic ──[Subscribe]──> Service B │ ├──> Service C └──> Service D Topics: - content.created - content.updated - translation.completed - image.generated - user.activity ``` ### 3. Caching Strategy ``` Client Request ──> [Console] ──> [Redis Cache] │ ├─ HIT ──> Return Cached │ └─ MISS ──> [Service] ──> [MongoDB] │ └──> Update Cache ``` ## Deployment Architecture ### 1. Development Environment (Docker Compose) All services run in Docker containers with: - **Single docker-compose.yml**: Defines all services - **Shared network**: `site11_network` for inter-service communication - **Persistent volumes**: Data stored in `./data/` directory - **Hot-reload**: Code mounted for development ### 2. Production Environment (Kubernetes) ``` ┌─────────────────────────────────────────────────────────────┐ │ Kubernetes Cluster │ ├─────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────────────────────────────────────────┐ │ │ │ Ingress (Nginx) │ │ │ └──────────────────────────────────────────────────┘ │ │ │ │ │ ┌──────────────────────────────────────────────────┐ │ │ │ Service Mesh (Optional) │ │ │ └──────────────────────────────────────────────────┘ │ │ │ │ │ ┌───────────────────────┼───────────────────────────┐ │ │ │ Namespace: site11-core │ │ │ ├──────────────┬────────────────┬──────────────────┤ │ │ │ Console │ MongoDB │ Redis │ │ │ │ Deployment │ StatefulSet │ StatefulSet │ │ │ └──────────────┴────────────────┴──────────────────┘ │ │ │ │ ┌───────────────────────────────────────────────────┐ │ │ │ Namespace: site11-pipeline │ │ │ ├──────────────┬────────────────┬──────────────────┤ │ │ │ Scheduler │ RSS Collector │ AI Generator │ │ │ │ Deployment │ Deployment │ Deployment │ │ │ └──────────────┴────────────────┴──────────────────┘ │ │ │ │ ┌───────────────────────────────────────────────────┐ │ │ │ Namespace: site11-services │ │ │ ├──────────────┬────────────────┬──────────────────┤ │ │ │ Artist Svcs │ Regional Svcs │ Tech Svcs │ │ │ │ Deployments │ Deployments │ Deployments │ │ │ └──────────────┴────────────────┴──────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────┘ ``` ### 3. Hybrid Deployment The platform supports **hybrid deployment** combining: - **Docker Compose**: For development and small deployments - **Kubernetes**: For production scaling - **Docker Desktop Kubernetes**: For local K8s testing - **Kind**: For lightweight K8s development ## Security Architecture ### Authentication & Authorization ``` ┌──────────────────────────────────────────────────────────────┐ │ Security Flow │ ├──────────────────────────────────────────────────────────────┤ │ │ │ Client ──> [Console Gateway] ──> [OAuth Service] │ │ │ │ │ │ │ ▼ │ │ │ [JWT Generation] │ │ │ │ │ │ ▼ ▼ │ │ [Token Validation] <────── [Token] │ │ │ │ │ ▼ │ │ [Service Access] │ │ │ └──────────────────────────────────────────────────────────────┘ ``` ### Security Measures - **JWT-based authentication**: Stateless token authentication - **Service-to-service auth**: Internal service tokens - **Rate limiting**: API Gateway level throttling - **CORS configuration**: Controlled cross-origin access - **Environment variables**: Sensitive data in `.env` files - **Network isolation**: Services communicate within Docker/K8s network ## Monitoring & Observability ### 1. Health Checks Every service implements health endpoints: ```python GET /health Response: {"status": "healthy", "service": "service-name"} ``` ### 2. Monitoring Stack - **Pipeline Monitor**: Real-time pipeline status (Port 8100) - **Console Dashboard**: Service health overview - **Redis Queue Monitoring**: Queue depth and processing rates - **MongoDB Metrics**: Database performance metrics ### 3. Logging Strategy - Centralized logging with structured JSON format - Log levels: DEBUG, INFO, WARNING, ERROR - Correlation IDs for distributed tracing ## Scalability & Performance ### Horizontal Scaling - **Stateless services**: Easy horizontal scaling - **Load balancing**: Kubernetes service mesh - **Auto-scaling**: Based on CPU/memory metrics ### Performance Optimizations - **Redis caching**: Reduces database load - **Async processing**: FastAPI async endpoints - **Batch processing**: Pipeline processes in batches - **Connection pooling**: Database connection reuse - **CDN ready**: Static content delivery ### Resource Management ```yaml Resources per Service: - CPU: 100m - 500m (request), 1000m (limit) - Memory: 128Mi - 512Mi (request), 1Gi (limit) - Storage: 1Gi - 10Gi PVC for data services ``` ## Development Workflow ### 1. Local Development ```bash # Start all services docker-compose up -d # Start specific services docker-compose up -d console mongodb redis # View logs docker-compose logs -f [service-name] # Rebuild after changes docker-compose build [service-name] docker-compose up -d [service-name] ``` ### 2. Testing ```bash # Run unit tests docker-compose exec [service-name] pytest # Integration tests docker-compose exec [service-name] pytest tests/integration # Load testing docker-compose exec [service-name] locust ``` ### 3. Deployment ```bash # Development ./deploy-local.sh # Staging (Kind) ./deploy-kind.sh # Production (Kubernetes) ./deploy-k8s.sh # Docker Hub ./deploy-dockerhub.sh ``` ## Key Design Decisions ### 1. Microservices over Monolith - **Reasoning**: Independent scaling, technology diversity, fault isolation - **Trade-off**: Increased complexity, network overhead ### 2. MongoDB as Primary Database - **Reasoning**: Flexible schema for diverse content types - **Trade-off**: Eventual consistency, complex queries ### 3. Event-Driven with Kafka - **Reasoning**: Decoupling, scalability, real-time processing - **Trade-off**: Operational complexity, debugging challenges ### 4. Python/FastAPI for Backend - **Reasoning**: Async support, fast development, AI library ecosystem - **Trade-off**: GIL limitations, performance vs compiled languages ### 5. Container-First Approach - **Reasoning**: Consistent environments, easy deployment, cloud-native - **Trade-off**: Resource overhead, container management ## Performance Metrics ### Current Capacity (Single Instance) - **Content Generation**: 1000+ articles/day - **Translation Throughput**: 8 languages simultaneously - **API Response Time**: <100ms p50, <500ms p99 - **Queue Processing**: 100+ jobs/minute - **Storage**: Scalable to TBs with MinIO ### Scaling Potential - **Horizontal**: Each service can scale to 10+ replicas - **Vertical**: Services can use up to 4GB RAM, 4 CPUs - **Geographic**: Multi-region deployment ready ## Future Roadmap ### Phase 1: Current State ✅ - Core microservices architecture - Automated content pipeline - Multi-language support - Basic monitoring ### Phase 2: Enhanced Observability (Q1 2025) - Prometheus + Grafana integration - Distributed tracing with Jaeger - ELK stack for logging - Advanced alerting ### Phase 3: Advanced Features (Q2 2025) - Machine Learning pipeline - Real-time analytics - GraphQL API layer - WebSocket support ### Phase 4: Enterprise Features (Q3 2025) - Multi-tenancy support - Advanced RBAC - Audit logging - Compliance features ## Conclusion Site11 represents a **modern, scalable, AI-driven content platform** that leverages: - **Microservices architecture** for modularity and scalability - **Event-driven design** for real-time processing - **Container orchestration** for deployment flexibility - **AI integration** for automated content generation - **Multi-language support** for global reach The architecture is designed to handle **massive scale**, support **rapid development**, and provide **high availability** while maintaining **operational simplicity** through automation and monitoring. ## Appendix ### A. Service Port Mapping | Service | Backend Port | Frontend Port | Description | |---------|-------------|---------------|-------------| | Console | 8000 | 3000 | API Gateway & Dashboard | | Users | 8007 | 8008 | User Management | | OAuth | 8003 | 8004 | Authentication | | Images | 8001 | 8002 | Image Processing | | Statistics | 8012 | - | Analytics | | Notifications | 8013 | - | Alerts & Messages | | Files | 8014 | - | File Storage | | Search | 8015 | - | Full-text Search | | Google Search | 8016 | - | Search Integration | | RSS Feed | 8017 | - | RSS Management | | News Aggregator | 8018 | - | Content Aggregation | | AI Writer | 8019 | - | AI Content Generation | | Pipeline Monitor | 8100 | - | Pipeline Dashboard | | Keyword Manager | 8100 | - | Keyword API | ### B. Environment Variables Key configuration managed through `.env`: - Database connections (MongoDB, Redis) - API keys (Claude, DeepL, Google) - Service URLs and ports - JWT secrets - Cache TTLs ### C. Database Schema MongoDB Collections: - `users`: User profiles and authentication - `articles_[lang]`: Articles by language - `keywords`: Search keywords and topics - `rss_feeds`: RSS feed configurations - `statistics`: Analytics data - `files`: File metadata ### D. API Documentation All services provide OpenAPI/Swagger documentation at: ``` http://[service-url]/docs ``` ### E. Deployment Scripts | Script | Purpose | |--------|---------| | `deploy-local.sh` | Local Docker Compose deployment | | `deploy-kind.sh` | Kind Kubernetes deployment | | `deploy-docker-desktop.sh` | Docker Desktop K8s deployment | | `deploy-dockerhub.sh` | Push images to Docker Hub | | `backup-mongodb.sh` | MongoDB backup utility | --- **Document Version**: 1.0.0 **Last Updated**: September 2025 **Platform Version**: Site11 v1.0 **Architecture Review**: Approved for Production