tool-04-tool-selection
tool-calling · deterministic-tests · seed tier 4 · published
Best result per model
| # | Model | Score | Tests | Run |
|---|---|---|---|---|
| 1 | qwen3-coder | 1.000 | 2/2 | UD-Q4_K_XL · 24 GB · runner verified |
| 2 | qwen3-coder-next | 1.000 | 2/2 | UD-Q4_K_XL · 24 GB · runner verified |
| 3 | phi-4-mini | 0.500 | 1/2 | Q6_K · 24 GB · runner verified |
3 models attempted.