multilingual-translation/CLAUDE.md

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

This is a Malaysian language translation API service built with FastAPI and Hugging Face Transformers. It provides bidirectional translation between Malay (Bahasa Melayu) and English using Helsinki-NLP's OPUS-MT neural machine translation models.

## Development Commands

### Local Development

```bash
# Setup virtual environment and install dependencies
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

# Run the development server (with auto-reload)
python run.py

# Or run with uvicorn directly
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
```

### Docker Development

```bash
# Build and run with Docker Compose
docker-compose up -d

# View logs
docker-compose logs -f

# Stop services
docker-compose down

# Rebuild after code changes
docker-compose up -d --build
```

### Testing the API

```bash
# Health check
curl http://localhost:8000/health

# Translate Malay to English
curl -X POST "http://localhost:8000/api/translate" \
  -H "Content-Type: application/json" \
  -d '{"text": "Selamat pagi", "source_lang": "ms", "target_lang": "en"}'

# Translate English to Malay
curl -X POST "http://localhost:8000/api/translate" \
  -H "Content-Type: application/json" \
  -d '{"text": "Good morning", "source_lang": "en", "target_lang": "ms"}'
```

## Architecture

### Core Components

1. **app/main.py** - FastAPI application with endpoint definitions
   - Lifespan events handle model preloading on startup
   - CORS middleware configured for cross-origin requests
   - Three main endpoints: root (`/`), health (`/health`), translate (`/api/translate`)

2. **app/translator.py** - Translation service singleton
   - Manages loading and caching of translation models
   - Automatically detects and uses GPU if available (CUDA)
   - Supports lazy loading - models are loaded on first use or preloaded at startup
   - Model naming convention: `Helsinki-NLP/opus-mt-{source}-{target}`

3. **app/models.py** - Pydantic schemas for request/response validation
   - `TranslationRequest`: Validates input (text, source_lang, target_lang)
   - `TranslationResponse`: Structured output with metadata
   - `LanguageCode` enum: Only "ms" and "en" are supported

4. **app/config.py** - Configuration management using pydantic-settings
   - Loads settings from environment variables or `.env` file
   - Default values provided for all settings

### Translation Flow

1. Request received at `/api/translate` endpoint
2. Pydantic validates request schema
3. TranslationService determines appropriate model based on language pair
4. Model is loaded if not already cached in memory
5. Text is tokenized, translated, and decoded
6. Response includes original text, translation, and model metadata

### Model Caching

- Models are downloaded to `MODEL_CACHE_DIR` (default: `./models/`)
- Once downloaded, models persist across restarts
- In Docker, use volume mount to persist models
- First translation request may be slow due to model download (~300MB per model)

### Device Selection

The translator automatically detects GPU availability:
- CUDA GPU: Used automatically if available for faster inference
- CPU: Fallback option, slower but works everywhere

## Configuration

Environment variables (see `.env.example`):
- `API_HOST` / `API_PORT`: Server binding
- `MODEL_CACHE_DIR`: Where to store downloaded models
- `MAX_LENGTH`: Maximum token length for translation (default 512)
- `ALLOWED_ORIGINS`: CORS configuration

## Common Tasks

### Adding New Language Pairs

To add support for additional languages:

1. Check if Helsinki-NLP has an OPUS-MT model for the language pair at https://huggingface.co/Helsinki-NLP
2. Update `app/models.py` - Add new language code to `LanguageCode` enum
3. Update `app/translator.py` - Add model mapping in `_get_model_name()` method
4. Update `app/main.py` - Add language info to `/api/supported-languages` endpoint

### Modifying Translation Behavior

Translation parameters are in `app/translator.py` in the `translate()` method:
- Adjust `max_length` in tokenizer call to handle longer texts
- Modify generation parameters passed to `model.generate()` for different translation strategies

### Production Deployment

For production use:
1. Set `reload=False` in `run.py` or use production-ready uvicorn command
2. Configure proper `ALLOWED_ORIGINS` instead of "*"
3. Add authentication middleware if needed
4. Consider using multiple workers: `uvicorn app.main:app --workers 4`
5. Mount persistent volume for `models/` directory in Docker

## API Documentation

When the server is running, interactive API documentation is available at:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc