multilingual-translation/CLAUDE.md

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

This is a multilingual translation API service built with FastAPI and Hugging Face Transformers. It provides any-to-any translation between up to 204 languages using Facebook's M2M100 and NLLB-200 models.

**Dual Model System:**
- **M2M100 (default)**: 105 languages, Apache 2.0 License, commercial use allowed
- **NLLB-200 (optional)**: 204 languages, CC-BY-NC 4.0 License, non-commercial only

## Development Commands

### Local Development

```bash
# Setup virtual environment and install dependencies
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

# Run the development server (with auto-reload)
python run.py

# Or run with uvicorn directly
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
```

### Docker Development

```bash
# Build and run with Docker Compose
docker-compose up -d

# View logs
docker-compose logs -f

# Stop services
docker-compose down

# Rebuild after code changes
docker-compose up -d --build
```

### Testing the API

```bash
# Health check
curl http://localhost:8001/health

# Translate Malay to English (M2M100, default)
curl -X POST "http://localhost:8001/api/translate" \
  -H "Content-Type: application/json" \
  -d '{"text": "Selamat pagi", "source_lang": "ms", "target_lang": "en"}'

# Translate English to Korean (M2M100)
curl -X POST "http://localhost:8001/api/translate" \
  -H "Content-Type: application/json" \
  -d '{"text": "Good morning", "source_lang": "en", "target_lang": "ko", "model": "m2m100"}'

# Translate English to Bemba (NLLB-200 exclusive language)
curl -X POST "http://localhost:8001/api/translate" \
  -H "Content-Type: application/json" \
  -d '{"text": "Welcome", "source_lang": "en", "target_lang": "bem", "model": "nllb200"}'

# Get supported languages for M2M100
curl http://localhost:8001/api/supported-languages?model=m2m100

# Get supported languages for NLLB-200
curl http://localhost:8001/api/supported-languages?model=nllb200
```

## Architecture

### Core Components

1. **app/main.py** - FastAPI application with endpoint definitions
   - Lifespan events handle model preloading on startup
   - CORS middleware configured for cross-origin requests
   - Main endpoints: root (`/`), health (`/health`), translate (`/api/translate`), supported-languages (`/api/supported-languages`)
   - Includes lang_names dictionary with display names for all 204+ language codes

2. **app/translator.py** - Translation service singleton
   - Manages loading and caching of both M2M100 and NLLB-200 models
   - Automatically detects and uses GPU if available (CUDA)
   - Supports lazy loading - models are loaded on first use or preloaded at startup
   - Model support:
     - M2M100: `facebook/m2m100_418M` (105 languages)
     - NLLB-200: `facebook/nllb-200-distilled-600M` (204 languages, FLORES-200 format)
   - Language code mapping for both models (m2m100_lang_codes, nllb200_lang_codes)

3. **app/models.py** - Pydantic schemas for request/response validation
   - `TranslationRequest`: Validates input (text, source_lang, target_lang, model)
   - `TranslationResponse`: Structured output with metadata
   - `HealthResponse`: Health check response
   - Model parameter accepts "m2m100" (default) or "nllb200"

4. **app/config.py** - Configuration management using pydantic-settings
   - Loads settings from environment variables or `.env` file
   - Default values provided for all settings

### Translation Flow

1. Request received at `/api/translate` endpoint
2. Pydantic validates request schema (including optional model parameter)
3. TranslationService selects model based on `model` parameter (m2m100 or nllb200)
4. Language codes are validated against the selected model's supported languages
5. Model is loaded if not already cached in memory
6. Text is tokenized with model-specific language codes:
   - M2M100: Uses simple codes (e.g., "en", "ko")
   - NLLB-200: Uses FLORES-200 format (e.g., "eng_Latn", "kor_Hang")
7. Translation generated using the model
8. Response includes original text, translation, and model metadata

### Model Caching

- Models are downloaded to `MODEL_CACHE_DIR` (default: `./models/`)
- Once downloaded, models persist across restarts
- In Docker, use volume mount to persist models
- First translation request may be slow due to model download:
  - M2M100: ~1.6GB
  - NLLB-200: ~2.5GB
- Both models can be cached simultaneously

### Device Selection

The translator automatically detects GPU availability:
- CUDA GPU: Used automatically if available for faster inference
- CPU: Fallback option, slower but works everywhere

## Configuration

Environment variables (see `.env.example`):
- `API_HOST` / `API_PORT`: Server binding
- `MODEL_CACHE_DIR`: Where to store downloaded models
- `MAX_LENGTH`: Maximum token length for translation (default 512)
- `ALLOWED_ORIGINS`: CORS configuration

## Common Tasks

### Adding New Language Codes

The system currently supports all 105 M2M100 languages and all 204 NLLB-200 languages. To add new language code mappings:

1. **For M2M100**: Update `m2m100_lang_codes` dictionary in `app/translator.py`
   - Format: `"user_code": "m2m100_code"` (e.g., `"en": "en"`)

2. **For NLLB-200**: Update `nllb200_lang_codes` dictionary in `app/translator.py`
   - Format: `"user_code": "flores_code"` (e.g., `"en": "eng_Latn"`)
   - Reference: https://github.com/facebookresearch/flores/blob/main/flores200/README.md

3. **Display Names**: Add entries to `lang_names` dictionary in `app/main.py`
   - Format: `"code": {"name": "English Name", "native": "Native Name"}`

### Modifying Translation Behavior

Translation parameters are in `app/translator.py` in the `translate()` method:
- Adjust `max_length` in tokenizer call to handle longer texts
- Modify generation parameters passed to `model.generate()` for different translation strategies
- Model-specific behavior:
  - M2M100: Uses `tokenizer.get_lang_id()` for target language
  - NLLB-200: Uses `tokenizer.convert_tokens_to_ids()` for target language

### Production Deployment

For production use:
1. Set `reload=False` in `run.py` or use production-ready uvicorn command
2. Configure proper `ALLOWED_ORIGINS` instead of "*"
3. Add authentication middleware if needed
4. Consider using multiple workers: `uvicorn app.main:app --workers 4`
5. Mount persistent volume for `models/` directory in Docker

## API Documentation

When the server is running, interactive API documentation is available at:
- Swagger UI: http://localhost:8001/docs (Docker) or http://localhost:8000/docs (local)
- ReDoc: http://localhost:8001/redoc (Docker) or http://localhost:8000/redoc (local)

## Model Licenses

**IMPORTANT**: Be aware of licensing when deploying:

- **M2M100**: Apache 2.0 License - Commercial use allowed ✅
- **NLLB-200**: CC-BY-NC 4.0 License - Non-commercial use only ⚠️

Always use M2M100 for commercial applications. Only use NLLB-200 for research, education, or personal non-commercial projects.