- Implemented REST API for 105+ language translation - Used Facebook M2M100 model (Apache 2.0 License - Commercial use allowed) - Supports any-to-any translation between 105 languages - Major languages: English, Chinese, Spanish, Arabic, Russian, Japanese, Korean, etc. - Southeast Asian: Malay, Indonesian, Thai, Vietnamese, Tagalog, Burmese, Khmer, Lao - South Asian: Bengali, Hindi, Urdu, Tamil, Telugu, Marathi, Gujarati, etc. - European: German, French, Italian, Spanish, Portuguese, Russian, etc. - African: Swahili, Amharic, Hausa, Igbo, Yoruba, Zulu, Xhosa - And many more languages Tech Stack: - FastAPI for REST API - Transformers (Hugging Face) for ML model - PyTorch for inference - Docker for containerization - M2M100 418M parameter model Features: - Health check endpoint - Supported languages listing - Dynamic language validation - Model caching for performance - GPU support (auto-detection) - CORS enabled for web clients 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
4.7 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
This is a Malaysian language translation API service built with FastAPI and Hugging Face Transformers. It provides bidirectional translation between Malay (Bahasa Melayu) and English using Helsinki-NLP's OPUS-MT neural machine translation models.
Development Commands
Local Development
# Setup virtual environment and install dependencies
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# Run the development server (with auto-reload)
python run.py
# Or run with uvicorn directly
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
Docker Development
# Build and run with Docker Compose
docker-compose up -d
# View logs
docker-compose logs -f
# Stop services
docker-compose down
# Rebuild after code changes
docker-compose up -d --build
Testing the API
# Health check
curl http://localhost:8000/health
# Translate Malay to English
curl -X POST "http://localhost:8000/api/translate" \
-H "Content-Type: application/json" \
-d '{"text": "Selamat pagi", "source_lang": "ms", "target_lang": "en"}'
# Translate English to Malay
curl -X POST "http://localhost:8000/api/translate" \
-H "Content-Type: application/json" \
-d '{"text": "Good morning", "source_lang": "en", "target_lang": "ms"}'
Architecture
Core Components
-
app/main.py - FastAPI application with endpoint definitions
- Lifespan events handle model preloading on startup
- CORS middleware configured for cross-origin requests
- Three main endpoints: root (
/), health (/health), translate (/api/translate)
-
app/translator.py - Translation service singleton
- Manages loading and caching of translation models
- Automatically detects and uses GPU if available (CUDA)
- Supports lazy loading - models are loaded on first use or preloaded at startup
- Model naming convention:
Helsinki-NLP/opus-mt-{source}-{target}
-
app/models.py - Pydantic schemas for request/response validation
TranslationRequest: Validates input (text, source_lang, target_lang)TranslationResponse: Structured output with metadataLanguageCodeenum: Only "ms" and "en" are supported
-
app/config.py - Configuration management using pydantic-settings
- Loads settings from environment variables or
.envfile - Default values provided for all settings
- Loads settings from environment variables or
Translation Flow
- Request received at
/api/translateendpoint - Pydantic validates request schema
- TranslationService determines appropriate model based on language pair
- Model is loaded if not already cached in memory
- Text is tokenized, translated, and decoded
- Response includes original text, translation, and model metadata
Model Caching
- Models are downloaded to
MODEL_CACHE_DIR(default:./models/) - Once downloaded, models persist across restarts
- In Docker, use volume mount to persist models
- First translation request may be slow due to model download (~300MB per model)
Device Selection
The translator automatically detects GPU availability:
- CUDA GPU: Used automatically if available for faster inference
- CPU: Fallback option, slower but works everywhere
Configuration
Environment variables (see .env.example):
API_HOST/API_PORT: Server bindingMODEL_CACHE_DIR: Where to store downloaded modelsMAX_LENGTH: Maximum token length for translation (default 512)ALLOWED_ORIGINS: CORS configuration
Common Tasks
Adding New Language Pairs
To add support for additional languages:
- Check if Helsinki-NLP has an OPUS-MT model for the language pair at https://huggingface.co/Helsinki-NLP
- Update
app/models.py- Add new language code toLanguageCodeenum - Update
app/translator.py- Add model mapping in_get_model_name()method - Update
app/main.py- Add language info to/api/supported-languagesendpoint
Modifying Translation Behavior
Translation parameters are in app/translator.py in the translate() method:
- Adjust
max_lengthin tokenizer call to handle longer texts - Modify generation parameters passed to
model.generate()for different translation strategies
Production Deployment
For production use:
- Set
reload=Falseinrun.pyor use production-ready uvicorn command - Configure proper
ALLOWED_ORIGINSinstead of "*" - Add authentication middleware if needed
- Consider using multiple workers:
uvicorn app.main:app --workers 4 - Mount persistent volume for
models/directory in Docker
API Documentation
When the server is running, interactive API documentation is available at:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc