Fix NLLB-200 tokenizer and add .dockerignore
- Fixed NLLB-200 tokenizer forced_bos_token_id issue - Changed from lang_code_to_id to convert_tokens_to_ids - Added .dockerignore to exclude models directory from Docker build - Prevents disk space issues during build - Models are loaded at runtime via volume mount - Both M2M100 and NLLB-200 models tested and working 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
42
.dockerignore
Normal file
42
.dockerignore
Normal file
@ -0,0 +1,42 @@
|
|||||||
|
# Python
|
||||||
|
__pycache__/
|
||||||
|
*.py[cod]
|
||||||
|
*$py.class
|
||||||
|
*.so
|
||||||
|
.Python
|
||||||
|
*.egg-info/
|
||||||
|
dist/
|
||||||
|
build/
|
||||||
|
|
||||||
|
# Virtual environments
|
||||||
|
venv/
|
||||||
|
env/
|
||||||
|
ENV/
|
||||||
|
|
||||||
|
# Models cache (will be mounted as volume)
|
||||||
|
models/
|
||||||
|
|
||||||
|
# IDE
|
||||||
|
.vscode/
|
||||||
|
.idea/
|
||||||
|
*.swp
|
||||||
|
*.swo
|
||||||
|
|
||||||
|
# Git
|
||||||
|
.git/
|
||||||
|
.gitignore
|
||||||
|
|
||||||
|
# Documentation
|
||||||
|
README.md
|
||||||
|
CLAUDE.md
|
||||||
|
*.md
|
||||||
|
|
||||||
|
# Environment
|
||||||
|
.env
|
||||||
|
.env.local
|
||||||
|
.env.*.local
|
||||||
|
|
||||||
|
# Docker
|
||||||
|
.dockerignore
|
||||||
|
Dockerfile
|
||||||
|
docker-compose.yml
|
||||||
@ -473,7 +473,8 @@ class TranslationService:
|
|||||||
).to(self.device)
|
).to(self.device)
|
||||||
|
|
||||||
# Generate translation - NLLB uses forced_bos_token_id
|
# Generate translation - NLLB uses forced_bos_token_id
|
||||||
forced_bos_token_id = tokenizer.lang_code_to_id[tgt_code]
|
# Convert language code to token ID
|
||||||
|
forced_bos_token_id = tokenizer.convert_tokens_to_ids(tgt_code)
|
||||||
|
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
translated = model.generate(
|
translated = model.generate(
|
||||||
|
|||||||
Reference in New Issue
Block a user