# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview This is a Malaysian language translation API service built with FastAPI and Hugging Face Transformers. It provides bidirectional translation between Malay (Bahasa Melayu) and English using Helsinki-NLP's OPUS-MT neural machine translation models. ## Development Commands ### Local Development ```bash # Setup virtual environment and install dependencies python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate pip install -r requirements.txt # Run the development server (with auto-reload) python run.py # Or run with uvicorn directly uvicorn app.main:app --reload --host 0.0.0.0 --port 8000 ``` ### Docker Development ```bash # Build and run with Docker Compose docker-compose up -d # View logs docker-compose logs -f # Stop services docker-compose down # Rebuild after code changes docker-compose up -d --build ``` ### Testing the API ```bash # Health check curl http://localhost:8000/health # Translate Malay to English curl -X POST "http://localhost:8000/api/translate" \ -H "Content-Type: application/json" \ -d '{"text": "Selamat pagi", "source_lang": "ms", "target_lang": "en"}' # Translate English to Malay curl -X POST "http://localhost:8000/api/translate" \ -H "Content-Type: application/json" \ -d '{"text": "Good morning", "source_lang": "en", "target_lang": "ms"}' ``` ## Architecture ### Core Components 1. **app/main.py** - FastAPI application with endpoint definitions - Lifespan events handle model preloading on startup - CORS middleware configured for cross-origin requests - Three main endpoints: root (`/`), health (`/health`), translate (`/api/translate`) 2. **app/translator.py** - Translation service singleton - Manages loading and caching of translation models - Automatically detects and uses GPU if available (CUDA) - Supports lazy loading - models are loaded on first use or preloaded at startup - Model naming convention: `Helsinki-NLP/opus-mt-{source}-{target}` 3. **app/models.py** - Pydantic schemas for request/response validation - `TranslationRequest`: Validates input (text, source_lang, target_lang) - `TranslationResponse`: Structured output with metadata - `LanguageCode` enum: Only "ms" and "en" are supported 4. **app/config.py** - Configuration management using pydantic-settings - Loads settings from environment variables or `.env` file - Default values provided for all settings ### Translation Flow 1. Request received at `/api/translate` endpoint 2. Pydantic validates request schema 3. TranslationService determines appropriate model based on language pair 4. Model is loaded if not already cached in memory 5. Text is tokenized, translated, and decoded 6. Response includes original text, translation, and model metadata ### Model Caching - Models are downloaded to `MODEL_CACHE_DIR` (default: `./models/`) - Once downloaded, models persist across restarts - In Docker, use volume mount to persist models - First translation request may be slow due to model download (~300MB per model) ### Device Selection The translator automatically detects GPU availability: - CUDA GPU: Used automatically if available for faster inference - CPU: Fallback option, slower but works everywhere ## Configuration Environment variables (see `.env.example`): - `API_HOST` / `API_PORT`: Server binding - `MODEL_CACHE_DIR`: Where to store downloaded models - `MAX_LENGTH`: Maximum token length for translation (default 512) - `ALLOWED_ORIGINS`: CORS configuration ## Common Tasks ### Adding New Language Pairs To add support for additional languages: 1. Check if Helsinki-NLP has an OPUS-MT model for the language pair at https://huggingface.co/Helsinki-NLP 2. Update `app/models.py` - Add new language code to `LanguageCode` enum 3. Update `app/translator.py` - Add model mapping in `_get_model_name()` method 4. Update `app/main.py` - Add language info to `/api/supported-languages` endpoint ### Modifying Translation Behavior Translation parameters are in `app/translator.py` in the `translate()` method: - Adjust `max_length` in tokenizer call to handle longer texts - Modify generation parameters passed to `model.generate()` for different translation strategies ### Production Deployment For production use: 1. Set `reload=False` in `run.py` or use production-ready uvicorn command 2. Configure proper `ALLOWED_ORIGINS` instead of "*" 3. Add authentication middleware if needed 4. Consider using multiple workers: `uvicorn app.main:app --workers 4` 5. Mount persistent volume for `models/` directory in Docker ## API Documentation When the server is running, interactive API documentation is available at: - Swagger UI: http://localhost:8000/docs - ReDoc: http://localhost:8000/redoc