Fix NLLB-200 tokenizer and add .dockerignore

- Fixed NLLB-200 tokenizer forced_bos_token_id issue - Changed from lang_code_to_id to convert_tokens_to_ids - Added .dockerignore to exclude models directory from Docker build - Prevents disk space issues during build - Models are loaded at runtime via volume mount - Both M2M100 and NLLB-200 models tested and working 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-11 16:02:32 +09:00
parent 28e26d19b6
commit 5a99d081ab
2 changed files with 44 additions and 1 deletions
--- a/app/translator.py
+++ b/app/translator.py
@ -473,7 +473,8 @@ class TranslationService:
                ).to(self.device)

                # Generate translation - NLLB uses forced_bos_token_id
-                forced_bos_token_id = tokenizer.lang_code_to_id[tgt_code]
+                # Convert language code to token ID
+                forced_bos_token_id = tokenizer.convert_tokens_to_ids(tgt_code)

                with torch.no_grad():
                    translated = model.generate(