# Site11 Platform Architecture

## Executive Summary

Site11 is a **large-scale, AI-powered content generation and aggregation platform** built on a microservices architecture. The platform automatically collects, processes, generates, and distributes multi-language content across various domains including news, entertainment, technology, and regional content for multiple countries.

### Key Capabilities
- **Automated Content Pipeline**: 24/7 content generation without human intervention
- **Multi-language Support**: Content in 8+ languages (Korean, English, Chinese, Japanese, French, German, Spanish, Italian)
- **Domain-Specific Services**: 30+ specialized microservices for different content domains
- **Real-time Processing**: Event-driven architecture with Kafka for real-time data flow
- **Scalable Infrastructure**: Containerized services with Kubernetes deployment support

## System Overview

### Architecture Pattern
**Hybrid Microservices Architecture** combining:
- **API Gateway Pattern**: Console service acts as the central orchestrator
- **Event-Driven Architecture**: Asynchronous communication via Kafka
- **Pipeline Architecture**: Multi-stage content processing workflow
- **Service Mesh Ready**: Prepared for Istio/Linkerd integration

### Technology Stack

| Layer | Technology | Purpose |
|-------|------------|---------|
| **Backend** | FastAPI (Python 3.11) | High-performance async API services |
| **Frontend** | React 18 + TypeScript + Vite | Modern responsive web interfaces |
| **Primary Database** | MongoDB 7.0 | Document storage for flexible content |
| **Cache Layer** | Redis 7 | High-speed caching and queue management |
| **Message Broker** | Apache Kafka | Event streaming and service communication |
| **Search Engine** | Apache Solr 9.4 | Full-text search capabilities |
| **Object Storage** | MinIO | Media and file storage |
| **Containerization** | Docker & Docker Compose | Service isolation and deployment |
| **Orchestration** | Kubernetes (Kind/Docker Desktop) | Production deployment and scaling |

## Core Services Architecture

### 1. Infrastructure Services

```
┌─────────────────────────────────────────────────────────────┐
│                     Infrastructure Layer                      │
├───────────────┬───────────────┬──────────────┬──────────────┤
│   MongoDB     │     Redis     │    Kafka     │   MinIO      │
│  (Primary DB) │   (Cache)     │  (Events)    │  (Storage)   │
├───────────────┼───────────────┼──────────────┼──────────────┤
│  Port: 27017  │  Port: 6379   │  Port: 9092  │  Port: 9000  │
└───────────────┴───────────────┴──────────────┴──────────────┘
```

### 2. Core Application Services

#### Console Service (API Gateway)
- **Port**: 8000 (Backend), 3000 (Frontend via Envoy)
- **Role**: Central orchestrator and monitoring dashboard
- **Responsibilities**:
  - Service discovery and health monitoring
  - Unified authentication portal
  - Request routing to microservices
  - Real-time metrics aggregation

#### Content Services
- **AI Writer** (8019): AI-powered article generation using Claude API
- **News Aggregator** (8018): Aggregates content from multiple sources
- **RSS Feed** (8017): RSS feed collection and management
- **Google Search** (8016): Search integration for content discovery
- **Search Service** (8015): Full-text search via Solr

#### Support Services
- **Users** (8007-8008): User management and authentication
- **OAuth** (8003-8004): OAuth2 authentication provider
- **Images** (8001-8002): Image processing and caching
- **Files** (8014): File management with MinIO integration
- **Notifications** (8013): Email, SMS, and push notifications
- **Statistics** (8012): Analytics and metrics collection

### 3. Pipeline Architecture

The pipeline represents the **heart of the content generation system**, processing content through multiple stages:

```
┌──────────────────────────────────────────────────────────────┐
│                    Content Pipeline Flow                      │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  [Scheduler] ─────> [RSS Collector] ────> [Google Search]   │
│      │                                          │           │
│      │                                          ▼           │
│      │                                   [AI Generator]     │
│      │                                          │           │
│      ▼                                          ▼           │
│  [Keywords]                              [Translator]       │
│   Manager                                       │           │
│                                                 ▼           │
│                                         [Image Generator]   │
│                                                 │           │
│                                                 ▼           │
│                                         [Language Sync]     │
│                                                              │
└──────────────────────────────────────────────────────────────┘
```

#### Pipeline Components

1. **Multi-threaded Scheduler**: Orchestrates the entire pipeline workflow
2. **Keyword Manager** (API Port 8100): Manages search keywords and topics
3. **RSS Collector**: Collects content from RSS feeds
4. **Google Search Worker**: Searches for trending content
5. **AI Article Generator**: Generates articles using Claude AI
6. **Translator**: Translates content using DeepL API
7. **Image Generator**: Creates images for articles
8. **Language Sync**: Ensures content consistency across languages
9. **Pipeline Monitor** (Port 8100): Real-time pipeline monitoring dashboard

### 4. Domain-Specific Services

The platform includes **30+ specialized services** for different content domains:

#### Entertainment Services
- **Artist Services**: blackpink, enhypen, ive, nct, straykids, twice
- **K-Culture**: Korean cultural content
- **Media Empire**: Entertainment industry coverage

#### Regional Services
- **Korea** (8020-8021): Korean market content
- **Japan** (8022-8023): Japanese market content
- **China** (8024-8025): Chinese market content
- **USA** (8026-8027): US market content

#### Technology Services
- **AI Service** (8028-8029): AI technology news
- **Crypto** (8030-8031): Cryptocurrency coverage
- **Apple** (8032-8033): Apple ecosystem news
- **Google** (8034-8035): Google technology updates
- **Samsung** (8036-8037): Samsung product news
- **LG** (8038-8039): LG technology coverage

#### Business Services
- **WSJ** (8040-8041): Wall Street Journal integration
- **Musk** (8042-8043): Elon Musk related content

## Data Flow Architecture

### 1. Content Generation Flow

```
User Request / Scheduled Task
         │
         ▼
   [Console API Gateway]
         │
         ├──> [Keyword Manager] ──> Topics/Keywords
         │
         ▼
   [Pipeline Scheduler]
         │
         ├──> [RSS Collector] ──> Feed Content
         ├──> [Google Search] ──> Search Results
         │
         ▼
   [AI Article Generator]
         │
         ├──> [MongoDB] (Store Korean Original)
         │
         ▼
   [Translator Service]
         │
         ├──> [MongoDB] (Store Translations)
         │
         ▼
   [Image Generator]
         │
         ├──> [MinIO] (Store Images)
         │
         ▼
   [Language Sync]
         │
         └──> [Content Ready for Distribution]
```

### 2. Event-Driven Communication

```
Service A ──[Publish]──> Kafka Topic ──[Subscribe]──> Service B
                              │
                              ├──> Service C
                              └──> Service D

Topics:
- content.created
- content.updated
- translation.completed
- image.generated
- user.activity
```

### 3. Caching Strategy

```
Client Request ──> [Console] ──> [Redis Cache]
                                      │
                                      ├─ HIT ──> Return Cached
                                      │
                                      └─ MISS ──> [Service] ──> [MongoDB]
                                                        │
                                                        └──> Update Cache
```

## Deployment Architecture

### 1. Development Environment (Docker Compose)

All services run in Docker containers with:
- **Single docker-compose.yml**: Defines all services
- **Shared network**: `site11_network` for inter-service communication
- **Persistent volumes**: Data stored in `./data/` directory
- **Hot-reload**: Code mounted for development

### 2. Production Environment (Kubernetes)

```
┌─────────────────────────────────────────────────────────────┐
│                    Kubernetes Cluster                        │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌──────────────────────────────────────────────────┐      │
│  │                  Ingress (Nginx)                  │      │
│  └──────────────────────────────────────────────────┘      │
│                          │                                  │
│  ┌──────────────────────────────────────────────────┐      │
│  │              Service Mesh (Optional)              │      │
│  └──────────────────────────────────────────────────┘      │
│                          │                                  │
│  ┌───────────────────────┼───────────────────────────┐     │
│  │   Namespace: site11-core                          │     │
│  ├──────────────┬────────────────┬──────────────────┤     │
│  │  Console     │   MongoDB      │     Redis        │     │
│  │  Deployment  │   StatefulSet  │   StatefulSet    │     │
│  └──────────────┴────────────────┴──────────────────┘     │
│                                                             │
│  ┌───────────────────────────────────────────────────┐     │
│  │   Namespace: site11-pipeline                      │     │
│  ├──────────────┬────────────────┬──────────────────┤     │
│  │  Scheduler   │  RSS Collector │  AI Generator    │     │
│  │  Deployment  │   Deployment   │   Deployment     │     │
│  └──────────────┴────────────────┴──────────────────┘     │
│                                                             │
│  ┌───────────────────────────────────────────────────┐     │
│  │   Namespace: site11-services                      │     │
│  ├──────────────┬────────────────┬──────────────────┤     │
│  │  Artist Svcs │  Regional Svcs │   Tech Svcs      │     │
│  │  Deployments │   Deployments  │   Deployments    │     │
│  └──────────────┴────────────────┴──────────────────┘     │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

### 3. Hybrid Deployment

The platform supports **hybrid deployment** combining:
- **Docker Compose**: For development and small deployments
- **Kubernetes**: For production scaling
- **Docker Desktop Kubernetes**: For local K8s testing
- **Kind**: For lightweight K8s development

## Security Architecture

### Authentication & Authorization
```
┌──────────────────────────────────────────────────────────────┐
│                    Security Flow                              │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  Client ──> [Console Gateway] ──> [OAuth Service]           │
│                    │                     │                   │
│                    │                     ▼                   │
│                    │              [JWT Generation]           │
│                    │                     │                   │
│                    ▼                     ▼                   │
│           [Token Validation] <────── [Token]                │
│                    │                                        │
│                    ▼                                        │
│           [Service Access]                                  │
│                                                              │
└──────────────────────────────────────────────────────────────┘
```

### Security Measures
- **JWT-based authentication**: Stateless token authentication
- **Service-to-service auth**: Internal service tokens
- **Rate limiting**: API Gateway level throttling
- **CORS configuration**: Controlled cross-origin access
- **Environment variables**: Sensitive data in `.env` files
- **Network isolation**: Services communicate within Docker/K8s network

## Monitoring & Observability

### 1. Health Checks
Every service implements health endpoints:
```python
GET /health
Response: {"status": "healthy", "service": "service-name"}
```

### 2. Monitoring Stack
- **Pipeline Monitor**: Real-time pipeline status (Port 8100)
- **Console Dashboard**: Service health overview
- **Redis Queue Monitoring**: Queue depth and processing rates
- **MongoDB Metrics**: Database performance metrics

### 3. Logging Strategy
- Centralized logging with structured JSON format
- Log levels: DEBUG, INFO, WARNING, ERROR
- Correlation IDs for distributed tracing

## Scalability & Performance

### Horizontal Scaling
- **Stateless services**: Easy horizontal scaling
- **Load balancing**: Kubernetes service mesh
- **Auto-scaling**: Based on CPU/memory metrics

### Performance Optimizations
- **Redis caching**: Reduces database load
- **Async processing**: FastAPI async endpoints
- **Batch processing**: Pipeline processes in batches
- **Connection pooling**: Database connection reuse
- **CDN ready**: Static content delivery

### Resource Management
```yaml
Resources per Service:
- CPU: 100m - 500m (request), 1000m (limit)
- Memory: 128Mi - 512Mi (request), 1Gi (limit)
- Storage: 1Gi - 10Gi PVC for data services
```

## Development Workflow

### 1. Local Development
```bash
# Start all services
docker-compose up -d

# Start specific services
docker-compose up -d console mongodb redis

# View logs
docker-compose logs -f [service-name]

# Rebuild after changes
docker-compose build [service-name]
docker-compose up -d [service-name]
```

### 2. Testing
```bash
# Run unit tests
docker-compose exec [service-name] pytest

# Integration tests
docker-compose exec [service-name] pytest tests/integration

# Load testing
docker-compose exec [service-name] locust
```

### 3. Deployment
```bash
# Development
./deploy-local.sh

# Staging (Kind)
./deploy-kind.sh

# Production (Kubernetes)
./deploy-k8s.sh

# Docker Hub
./deploy-dockerhub.sh
```

## Key Design Decisions

### 1. Microservices over Monolith
- **Reasoning**: Independent scaling, technology diversity, fault isolation
- **Trade-off**: Increased complexity, network overhead

### 2. MongoDB as Primary Database
- **Reasoning**: Flexible schema for diverse content types
- **Trade-off**: Eventual consistency, complex queries

### 3. Event-Driven with Kafka
- **Reasoning**: Decoupling, scalability, real-time processing
- **Trade-off**: Operational complexity, debugging challenges

### 4. Python/FastAPI for Backend
- **Reasoning**: Async support, fast development, AI library ecosystem
- **Trade-off**: GIL limitations, performance vs compiled languages

### 5. Container-First Approach
- **Reasoning**: Consistent environments, easy deployment, cloud-native
- **Trade-off**: Resource overhead, container management

## Performance Metrics

### Current Capacity (Single Instance)
- **Content Generation**: 1000+ articles/day
- **Translation Throughput**: 8 languages simultaneously
- **API Response Time**: <100ms p50, <500ms p99
- **Queue Processing**: 100+ jobs/minute
- **Storage**: Scalable to TBs with MinIO

### Scaling Potential
- **Horizontal**: Each service can scale to 10+ replicas
- **Vertical**: Services can use up to 4GB RAM, 4 CPUs
- **Geographic**: Multi-region deployment ready

## Future Roadmap

### Phase 1: Current State ✅
- Core microservices architecture
- Automated content pipeline
- Multi-language support
- Basic monitoring

### Phase 2: Enhanced Observability (Q1 2025)
- Prometheus + Grafana integration
- Distributed tracing with Jaeger
- ELK stack for logging
- Advanced alerting

### Phase 3: Advanced Features (Q2 2025)
- Machine Learning pipeline
- Real-time analytics
- GraphQL API layer
- WebSocket support

### Phase 4: Enterprise Features (Q3 2025)
- Multi-tenancy support
- Advanced RBAC
- Audit logging
- Compliance features

## Conclusion

Site11 represents a **modern, scalable, AI-driven content platform** that leverages:
- **Microservices architecture** for modularity and scalability
- **Event-driven design** for real-time processing
- **Container orchestration** for deployment flexibility
- **AI integration** for automated content generation
- **Multi-language support** for global reach

The architecture is designed to handle **massive scale**, support **rapid development**, and provide **high availability** while maintaining **operational simplicity** through automation and monitoring.

## Appendix

### A. Service Port Mapping

| Service | Backend Port | Frontend Port | Description |
|---------|-------------|---------------|-------------|
| Console | 8000 | 3000 | API Gateway & Dashboard |
| Users | 8007 | 8008 | User Management |
| OAuth | 8003 | 8004 | Authentication |
| Images | 8001 | 8002 | Image Processing |
| Statistics | 8012 | - | Analytics |
| Notifications | 8013 | - | Alerts & Messages |
| Files | 8014 | - | File Storage |
| Search | 8015 | - | Full-text Search |
| Google Search | 8016 | - | Search Integration |
| RSS Feed | 8017 | - | RSS Management |
| News Aggregator | 8018 | - | Content Aggregation |
| AI Writer | 8019 | - | AI Content Generation |
| Pipeline Monitor | 8100 | - | Pipeline Dashboard |
| Keyword Manager | 8100 | - | Keyword API |

### B. Environment Variables

Key configuration managed through `.env`:
- Database connections (MongoDB, Redis)
- API keys (Claude, DeepL, Google)
- Service URLs and ports
- JWT secrets
- Cache TTLs

### C. Database Schema

MongoDB Collections:
- `users`: User profiles and authentication
- `articles_[lang]`: Articles by language
- `keywords`: Search keywords and topics
- `rss_feeds`: RSS feed configurations
- `statistics`: Analytics data
- `files`: File metadata

### D. API Documentation

All services provide OpenAPI/Swagger documentation at:
```
http://[service-url]/docs
```

### E. Deployment Scripts

| Script | Purpose |
|--------|---------|
| `deploy-local.sh` | Local Docker Compose deployment |
| `deploy-kind.sh` | Kind Kubernetes deployment |
| `deploy-docker-desktop.sh` | Docker Desktop K8s deployment |
| `deploy-dockerhub.sh` | Push images to Docker Hub |
| `backup-mongodb.sh` | MongoDB backup utility |

---

**Document Version**: 1.0.0
**Last Updated**: September 2025
**Platform Version**: Site11 v1.0
**Architecture Review**: Approved for Production