Fix NLLB-200 tokenizer and add .dockerignore

- Fixed NLLB-200 tokenizer forced_bos_token_id issue
  - Changed from lang_code_to_id to convert_tokens_to_ids
- Added .dockerignore to exclude models directory from Docker build
  - Prevents disk space issues during build
  - Models are loaded at runtime via volume mount
- Both M2M100 and NLLB-200 models tested and working

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
jungwoo choi
2025-11-11 16:02:32 +09:00
parent 28e26d19b6
commit 5a99d081ab
2 changed files with 44 additions and 1 deletions

View File

@ -473,7 +473,8 @@ class TranslationService:
).to(self.device)
# Generate translation - NLLB uses forced_bos_token_id
forced_bos_token_id = tokenizer.lang_code_to_id[tgt_code]
# Convert language code to token ID
forced_bos_token_id = tokenizer.convert_tokens_to_ids(tgt_code)
with torch.no_grad():
translated = model.generate(