Fix NLLB-200 tokenizer and add .dockerignore

- Fixed NLLB-200 tokenizer forced_bos_token_id issue - Changed from lang_code_to_id to convert_tokens_to_ids - Added .dockerignore to exclude models directory from Docker build - Prevents disk space issues during build - Models are loaded at runtime via volume mount - Both M2M100 and NLLB-200 models tested and working 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-11 16:02:32 +09:00
parent 28e26d19b6
commit 5a99d081ab
2 changed files with 44 additions and 1 deletions
--- a/.dockerignore
+++ b/.dockerignore
@ -0,0 +1,42 @@
 # Python
 __pycache__/
 *.py[cod]
 *$py.class
 *.so
 .Python
 *.egg-info/
 dist/
 build/
 # Virtual environments
 venv/
 env/
 ENV/
 # Models cache (will be mounted as volume)
 models/
 # IDE
 .vscode/
 .idea/
 *.swp
 *.swo
 # Git
 .git/
 .gitignore
 # Documentation
 README.md
 CLAUDE.md
 *.md
 # Environment
 .env
 .env.local
 .env.*.local
 # Docker
 .dockerignore
 Dockerfile
 docker-compose.yml
--- a/app/translator.py
+++ b/app/translator.py
@ -473,7 +473,8 @@ class TranslationService:
                ).to(self.device)
                # Generate translation - NLLB uses forced_bos_token_id
-                forced_bos_token_id = tokenizer.lang_code_to_id[tgt_code]
+                # Convert language code to token ID
                forced_bos_token_id = tokenizer.convert_tokens_to_ids(tgt_code)
                with torch.no_grad():
                    translated = model.generate(