Overview
Tables
1st2ndSecond to lastLast
| Model | Size | Min-Max | Z-Score | Avg Rank | ASR (WER %) | FR | EN | DE | ES | IT | PT | NL | AST (METEOR %) | ES-FR · Multilingual_TEDx | ES-IT · Multilingual_TEDx | FR-EN · Multilingual_TEDx | FR-ES · Multilingual_TEDx | Question Answering (FLOW_JUDGE) | FR | EN | Others | Emotion Recognition (FLOW_JUDGE) | Gender Recognition (FLOW_JUDGE) | Age Recognition (FLOW_JUDGE) | Dialogue Summarization (FLOW_JUDGE) | Spoken Language Identification (FLOW_JUDGE) | Music | Music Question Answering (FLOW_JUDGE) | Music Captioning (FLOW_JUDGE) | Sound | Audio Captioning (FLOW_JUDGE) | Audio Question Answering (FLOW_JUDGE) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Qwen/Qwen2-Audio-7B-Instruct | 8.4B | 0.796 | 1.03 | 4.8 | 25.23 ±1.67 | 33.56 ±3.68 | 11.00 ±1.15 | 32.65 ±4.70 | 16.57 ±2.64 | 21.20 ±7.11 | 19.15 ±2.94 | 40.68 ±3.75 | 48.00 ±1.15 | 46.22 ±2.08 | 45.46 ±3.11 | 51.06 ±2.30 | 49.24 ±2.04 | 63.04 ±0.05 | 50.99 ±0.09 | 68.21 ±0.05 | 55.99 | 30.20 ±0.02 | 85.80 ±0.02 | 23.60 ±0.03 | 51.93 ±0.08 | 88.40 ±0.03 | 59.99 | 60.23 ±0.07 | 59.76 ±0.04 | 57.35 | 53.70 ±0.06 | 60.99 ±0.09 |
| LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h | 2.5B | 0.599 | 0.37 | 5.3 | 13.82 ±3.79 | 19.79 ±10.56 | 6.67 ±1.67 | 17.84 ±1.86 | 8.67 ±7.83 | 9.50 ±6.89 | 11.20 ±2.08 | 15.57 ±2.12 | 58.79 ±1.04 | 55.08 ±1.87 | 55.72 ±3.13 | 63.43 ±1.81 | 60.94 ±1.89 | 65.04 ±0.05 | 56.97 ±0.08 | 68.50 ±0.06 | 47.21 | 22.00 ±0.02 | 62.20 ±0.02 | 31.80 ±0.03 | 47.47 ±0.09 | 72.60 ±0.04 | 36.67 | 36.99 ±0.06 | 36.36 ±0.06 | 42.13 | 33.46 ±0.05 | 50.80 ±0.09 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h | 2.1B | 0.570 | 0.26 | 5.5 | 11.12 ±3.42 | 15.12 ±9.45 | 6.59 ±0.65 | 15.99 ±1.81 | 5.95 ±0.95 | 7.64 ±6.18 | 10.37 ±8.40 | 9.77 ±1.51 | 62.23 ±1.02 | 59.98 ±1.92 | 59.50 ±3.11 | 66.55 ±1.73 | 62.90 ±1.85 | 62.56 ±0.05 | 61.82 ±0.08 | 62.88 ±0.06 | 44.52 | 21.80 ±0.02 | 63.07 ±0.02 | 32.40 ±0.03 | 41.93 ±0.10 | 63.40 ±0.04 | 35.65 | 40.89 ±0.06 | 30.40 ±0.06 | 41.06 | 31.22 ±0.05 | 50.90 ±0.09 |
| nvidia/audio-flamingo-3-hf | 8.2B | 0.566 | 0.23 | 7.7 | 72.44 ±5.63 | 73.74 ±9.09 | 12.58 ±8.94 | 112.01 ±22.08 | 67.31 ±10.68 | 106.41 ±20.15 | 88.70 ±22.54 | 74.85 ±6.63 | 23.97 ±1.22 | 17.21 ±1.99 | 12.67 ±2.35 | 43.68 ±2.03 | 22.34 ±2.25 | 55.98 ±0.05 | 44.63 ±0.07 | 60.84 ±0.06 | 59.31 | 52.73 ±0.03 | 75.80 ±0.02 | 19.50 ±0.02 | 55.53 ±0.09 | 93.00 ±0.02 | 56.73 | 57.97 ±0.07 | 55.48 ±0.11 | 58.74 | 57.76 ±0.06 | 59.73 ±0.08 |
| microsoft/Phi-4-multimodal-instruct | 5.6B | 0.555 | 0.20 | 7.3 | 36.83 ±6.09 | 43.81 ±16.23 | 10.29 ±1.80 | 41.56 ±6.57 | 34.85 ±16.84 | 24.70 ±9.17 | 31.95 ±6.49 | 103.10 ±22.76 | 42.31 ±1.41 | 31.28 ±2.46 | 40.85 ±3.57 | 57.03 ±2.36 | 40.09 ±2.66 | 65.30 ±0.05 | 52.16 ±0.09 | 70.93 ±0.05 | 35.67 | 18.60 ±0.02 | 48.53 ±0.03 | 4.00 ±0.01 | 56.40 ±0.09 | 50.80 ±0.04 | 44.59 | 44.93 ±0.07 | 44.24 ±0.06 | 53.18 | 47.34 ±0.06 | 59.02 ±0.09 |
| Qwen/Qwen2.5-Omni-7B | 11B | 0.551 | 0.17 | 7.7 | 37.96 ±5.64 | 44.41 ±9.42 | 41.77 ±10.57 | 48.09 ±21.47 | 28.24 ±23.62 | 18.58 ±12.24 | 35.15 ±16.46 | 31.38 ±4.68 | 45.93 ±1.20 | 42.80 ±2.25 | 39.45 ±3.23 | 53.66 ±2.01 | 47.82 ±2.26 | 66.86 ±0.04 | 53.78 ±0.07 | 72.47 ±0.05 | 32.44 | 10.00 ±0.02 | 16.87 ±0.02 | 0.20 ±0.00 | 55.33 ±0.08 | 79.80 ±0.04 | 37.80 | 38.20 ±0.05 | 37.40 ±0.07 | 56.93 | 54.68 ±0.07 | 59.19 ±0.09 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h | 2.1B | 0.522 | 0.09 | 7.2 | 12.99 ±3.93 | 20.53 ±11.37 | 6.78 ±0.73 | 16.50 ±1.79 | 6.38 ±1.08 | 7.87 ±6.41 | 9.43 ±1.21 | 9.93 ±1.48 | 61.91 ±1.01 | 59.19 ±1.91 | 59.89 ±3.01 | 65.46 ±1.78 | 63.09 ±1.82 | 59.78 ±0.05 | 58.60 ±0.08 | 60.28 ±0.06 | 43.19 | 16.87 ±0.02 | 58.60 ±0.02 | 34.40 ±0.03 | 44.87 ±0.09 | 61.20 ±0.04 | 34.81 | 39.91 ±0.07 | 29.72 ±0.06 | 39.83 | 27.38 ±0.05 | 52.27 ±0.09 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h | 2.1B | 0.520 | 0.09 | 6.7 | 12.08 ±3.36 | 15.75 ±9.61 | 7.26 ±0.73 | 17.60 ±2.00 | 6.90 ±1.11 | 8.73 ±6.45 | 11.38 ±2.31 | 11.85 ±1.40 | 61.34 ±1.02 | 59.08 ±1.92 | 57.69 ±3.04 | 65.73 ±1.76 | 62.85 ±1.84 | 60.41 ±0.05 | 57.85 ±0.08 | 61.51 ±0.06 | 41.50 | 18.20 ±0.02 | 54.40 ±0.03 | 28.70 ±0.03 | 44.60 ±0.09 | 61.60 ±0.04 | 35.95 | 37.67 ±0.06 | 34.24 ±0.06 | 39.09 | 27.72 ±0.04 | 50.46 ±0.10 |
| LINAGORA/Canary_Luciole-1B_Step-Adapter_2h | 2.1B | 0.452 | -0.14 | 8.2 | 20.00 ±10.47 | 25.54 ±30.62 | 10.30 ±1.52 | 28.03 ±6.27 | 12.91 ±3.50 | 15.27 ±7.40 | 18.41 ±5.11 | 26.52 ±15.14 | 56.39 ±1.13 | 55.45 ±1.97 | 53.00 ±3.29 | 59.84 ±2.16 | 57.27 ±2.07 | 56.05 ±0.05 | 55.41 ±0.08 | 56.32 ±0.06 | 36.91 | 12.67 ±0.02 | 58.20 ±0.02 | 27.80 ±0.03 | 41.27 ±0.08 | 44.60 ±0.04 | 42.53 | 39.43 ±0.06 | 45.64 ±0.05 | 37.31 | 28.64 ±0.05 | 45.98 ±0.09 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_8h | 2.1B | 0.441 | -0.18 | 8.7 | 18.16 ±4.12 | 21.49 ±10.72 | 10.53 ±8.36 | 30.11 ±7.71 | 12.82 ±1.69 | 14.22 ±7.24 | 16.26 ±3.59 | 19.54 ±8.60 | 56.26 ±1.13 | 55.72 ±1.93 | 51.22 ±3.24 | 60.18 ±2.16 | 57.90 ±2.06 | 55.09 ±0.05 | 55.56 ±0.08 | 54.89 ±0.06 | 37.21 | 14.67 ±0.02 | 56.93 ±0.03 | 26.90 ±0.03 | 43.93 ±0.09 | 43.60 ±0.04 | 41.92 | 38.08 ±0.06 | 45.76 ±0.05 | 36.99 | 28.62 ±0.04 | 45.35 ±0.09 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_80h | 2.1B | 0.438 | -0.19 | 9.3 | 18.25 ±5.14 | 23.31 ±13.20 | 11.33 ±7.87 | 23.71 ±3.08 | 14.84 ±16.91 | 14.35 ±7.27 | 16.47 ±3.76 | 15.93 ±2.04 | 55.91 ±1.16 | 55.06 ±2.03 | 52.02 ±3.31 | 59.82 ±2.17 | 56.74 ±2.14 | 55.68 ±0.05 | 56.12 ±0.08 | 55.49 ±0.06 | 36.82 | 14.60 ±0.02 | 55.53 ±0.03 | 29.70 ±0.03 | 41.07 ±0.09 | 43.20 ±0.04 | 42.16 | 39.00 ±0.06 | 45.32 ±0.05 | 35.91 | 27.62 ±0.04 | 44.21 ±0.09 |
| mistralai/Voxtral-Mini-3B-2507 | 4.68B | 0.427 | -0.22 | 8.2 | 125.88 ±18.69 | 127.95 ±36.03 | 81.49 ±20.97 | 163.93 ±69.63 | 125.22 ±69.41 | 135.14 ±50.01 | 135.23 ±47.91 | 134.67 ±40.94 | 25.60 ±0.78 | 21.31 ±1.35 | 25.62 ±2.25 | 29.81 ±1.36 | 25.65 ±1.50 | 73.85 ±0.04 | 61.04 ±0.07 | 79.34 ±0.04 | 35.65 | 15.33 ±0.02 | 18.27 ±0.02 | 2.40 ±0.01 | 66.67 ±0.08 | 75.60 ±0.04 | 48.93 | 47.27 ±0.07 | 50.60 ±0.07 | 48.80 | 38.90 ±0.05 | 58.69 ±0.07 |
| Qwen/Qwen2.5-Omni-3B | 5.9B | 0.369 | -0.44 | 10.3 | 65.99 ±7.27 | 82.32 ±12.68 | 68.47 ±13.86 | 71.28 ±25.51 | 55.52 ±24.49 | 44.98 ±19.24 | 52.58 ±24.86 | 39.77 ±7.22 | 44.77 ±1.17 | 39.10 ±2.19 | 41.91 ±3.00 | 51.84 ±2.00 | 46.21 ±2.27 | 64.97 ±0.04 | 51.26 ±0.08 | 70.84 ±0.05 | 28.05 | 11.00 ±0.02 | 7.93 ±0.01 | 2.90 ±0.01 | 55.20 ±0.09 | 63.20 ±0.04 | 34.67 | 33.05 ±0.06 | 36.28 ±0.07 | 44.41 | 35.88 ±0.06 | 52.93 ±0.08 |
| LINAGORA/Canary-Qwen3-4B_data-v1_8h | 4.8B | 0.360 | -0.41 | 10.2 | 18.17 ±7.32 | 21.70 ±13.80 | 9.43 ±1.71 | 28.29 ±36.38 | 12.58 ±32.37 | 12.59 ±7.25 | 19.23 ±7.46 | 23.10 ±2.25 | 40.85 ±1.11 | 34.56 ±2.00 | 36.04 ±3.14 | 47.57 ±1.93 | 45.23 ±2.00 | 66.45 ±0.05 | 59.84 ±0.08 | 69.28 ±0.05 | 27.10 | 10.93 ±0.02 | 32.07 ±0.02 | 2.50 ±0.01 | 49.40 ±0.09 | 40.60 ±0.04 | 35.33 | 36.77 ±0.06 | 33.88 ±0.07 | 32.89 | 22.38 ±0.03 | 43.41 ±0.09 |
| LINAGORA/Canary-Qwen3-1.7B_data-v1_8h | 2.5B | 0.226 | -0.88 | 13.0 | 27.98 ±1.73 | 20.80 ±1.78 | 12.09 ±0.95 | 33.12 ±7.10 | 13.93 ±6.25 | 18.95 ±7.88 | 80.39 ±6.12 | 49.77 ±3.54 | 33.41 ±1.11 | 27.86 ±1.93 | 30.12 ±3.08 | 41.47 ±1.94 | 34.19 ±2.08 | 59.86 ±0.05 | 54.45 ±0.08 | 62.17 ±0.06 | 26.09 | 10.73 ±0.02 | 33.33 ±0.02 | 2.00 ±0.01 | 41.60 ±0.09 | 42.80 ±0.04 | 31.56 | 34.84 ±0.06 | 28.28 ±0.06 | 32.61 | 21.74 ±0.03 | 43.49 ±0.09 |
Overview (FR/EN — ASR, AST, QA)
Tables
1st2ndSecond to lastLast
| Model | Size | Min-Max | Z-Score | Avg Rank | ASR (WER %) | FR | EN | AST (METEOR %) | ES-FR · Multilingual_TEDx | FR-EN · Multilingual_TEDx | FR-ES · Multilingual_TEDx | Question Answering (FLOW_JUDGE) | FR | EN |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h | 2.5B | 0.803 | 0.70 | 4.0 | 15.42 ±7.07 | 19.79 ±10.56 | 6.67 ±1.67 | 59.82 ±1.09 | 55.08 ±1.87 | 63.43 ±1.81 | 60.94 ±1.89 | 65.04 ±0.05 | 56.97 ±0.08 | 68.50 ±0.06 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h | 2.1B | 0.799 | 0.67 | 3.3 | 12.28 ±6.31 | 15.12 ±9.45 | 6.59 ±0.65 | 63.14 ±1.07 | 59.98 ±1.92 | 66.55 ±1.73 | 62.90 ±1.85 | 62.56 ±0.05 | 61.82 ±0.08 | 62.88 ±0.06 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h | 2.1B | 0.754 | 0.50 | 4.7 | 12.92 ±6.41 | 15.75 ±9.61 | 7.26 ±0.73 | 62.55 ±1.07 | 59.08 ±1.92 | 65.73 ±1.76 | 62.85 ±1.84 | 60.41 ±0.05 | 57.85 ±0.08 | 61.51 ±0.06 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h | 2.1B | 0.733 | 0.43 | 5.7 | 15.95 ±7.59 | 20.53 ±11.37 | 6.78 ±0.73 | 62.58 ±1.07 | 59.19 ±1.91 | 65.46 ±1.78 | 63.09 ±1.82 | 59.78 ±0.05 | 58.60 ±0.08 | 60.28 ±0.06 |
| LINAGORA/Canary-Qwen3-4B_data-v1_8h | 4.8B | 0.667 | 0.29 | 6.7 | 17.61 ±9.23 | 21.70 ±13.80 | 9.43 ±1.71 | 42.45 ±1.18 | 34.56 ±2.00 | 47.57 ±1.93 | 45.23 ±2.00 | 66.45 ±0.05 | 59.84 ±0.08 | 69.28 ±0.05 |
| Qwen/Qwen2.5-Omni-7B | 11B | 0.638 | 0.16 | 7.7 | 43.53 ±7.21 | 44.41 ±9.42 | 41.77 ±10.57 | 48.09 ±1.28 | 42.80 ±2.25 | 53.66 ±2.01 | 47.82 ±2.26 | 66.86 ±0.04 | 53.78 ±0.07 | 72.47 ±0.05 |
| Qwen/Qwen2-Audio-7B-Instruct | 8.4B | 0.635 | 0.14 | 8.3 | 26.04 ±2.52 | 33.56 ±3.68 | 11.00 ±1.15 | 48.84 ±1.24 | 46.22 ±2.08 | 51.06 ±2.30 | 49.24 ±2.04 | 63.04 ±0.05 | 50.99 ±0.09 | 68.21 ±0.05 |
| LINAGORA/Canary_Luciole-1B_Step-Adapter_2h | 2.1B | 0.606 | -0.01 | 9.0 | 20.46 ±20.43 | 25.54 ±30.62 | 10.30 ±1.52 | 57.52 ±1.20 | 55.45 ±1.97 | 59.84 ±2.16 | 57.27 ±2.07 | 56.05 ±0.05 | 55.41 ±0.08 | 56.32 ±0.06 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_8h | 2.1B | 0.602 | -0.03 | 8.7 | 17.84 ±7.68 | 21.49 ±10.72 | 10.53 ±8.36 | 57.93 ±1.19 | 55.72 ±1.93 | 60.18 ±2.16 | 57.90 ±2.06 | 55.09 ±0.05 | 55.56 ±0.08 | 54.89 ±0.06 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_80h | 2.1B | 0.601 | -0.03 | 9.7 | 19.32 ±9.19 | 23.31 ±13.20 | 11.33 ±7.87 | 57.20 ±1.23 | 55.06 ±2.03 | 59.82 ±2.17 | 56.74 ±2.14 | 55.68 ±0.05 | 56.12 ±0.08 | 55.49 ±0.06 |
| microsoft/Phi-4-multimodal-instruct | 5.6B | 0.600 | 0.04 | 8.7 | 32.64 ±10.86 | 43.81 ±16.23 | 10.29 ±1.80 | 42.80 ±1.54 | 31.28 ±2.46 | 57.03 ±2.36 | 40.09 ±2.66 | 65.30 ±0.05 | 52.16 ±0.09 | 70.93 ±0.05 |
| LINAGORA/Canary-Qwen3-1.7B_data-v1_8h | 2.5B | 0.478 | -0.36 | 10.0 | 17.90 ±1.24 | 20.80 ±1.78 | 12.09 ±0.95 | 34.50 ±1.18 | 27.86 ±1.93 | 41.47 ±1.94 | 34.19 ±2.08 | 59.86 ±0.05 | 54.45 ±0.08 | 62.17 ±0.06 |
| Qwen/Qwen2.5-Omni-3B | 5.9B | 0.470 | -0.44 | 10.0 | 77.70 ±9.65 | 82.32 ±12.68 | 68.47 ±13.86 | 45.72 ±1.27 | 39.10 ±2.19 | 51.84 ±2.00 | 46.21 ±2.27 | 64.97 ±0.04 | 51.26 ±0.08 | 70.84 ±0.05 |
| mistralai/Voxtral-Mini-3B-2507 | 4.68B | 0.333 | -0.83 | 10.3 | 112.46 ±25.06 | 127.95 ±36.03 | 81.49 ±20.97 | 25.59 ±0.83 | 21.31 ±1.35 | 29.81 ±1.36 | 25.65 ±1.50 | 73.85 ±0.04 | 61.04 ±0.07 | 79.34 ±0.04 |
| nvidia/audio-flamingo-3-hf | 8.2B | 0.231 | -1.23 | 13.3 | 53.35 ±6.83 | 73.74 ±9.09 | 12.58 ±8.94 | 27.74 ±1.34 | 17.21 ±1.99 | 43.68 ±2.03 | 22.34 ±2.25 | 55.98 ±0.05 | 44.63 ±0.07 | 60.84 ±0.06 |
Tasks · ASR
Tables
Tasks · ASR — WER (%)
| Model | Size | Min-Max | Z-Score | Avg Rank | Average | FR | CommonVoice | Fleurs | Multilingual_TEDx | SUMM-RE | VoxPopuli | YouTubeFr | EN | CommonVoice | Fleurs | VoxPopuli | DE | Fleurs | Multilingual_TEDx | ES | Fleurs | Multilingual_TEDx | IT | Fleurs | Multilingual_TEDx | PT | Fleurs | Multilingual_TEDx | NL - Fleurs |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h | 2.1B | 0.999 | 0.72 | 1.1 | 10.20 | 15.12 ±9.45 | 6.98 ±1.42 | 6.86 ±1.23 | 9.25 ±3.73 | 28.70 ±55.10 | 8.16 ±1.15 | 30.76 ±11.95 | 6.59 ±0.65 | 8.27 ±1.38 | 5.90 ±0.91 | 5.62 ±0.99 | 15.99 ±1.81 | 7.39 ±1.20 | 24.58 ±3.12 | 5.95 ±0.95 | 4.28 ±0.86 | 7.61 ±1.66 | 7.64 ±6.18 | 5.76 ±0.93 | 9.52 ±12.29 | 10.37 ±8.40 | 6.97 ±1.60 | 13.76 ±16.70 | 9.77 ±1.51 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h | 2.1B | 0.991 | 0.70 | 2.3 | 11.06 | 20.53 ±11.37 | 7.00 ±1.35 | 7.46 ±1.27 | 9.94 ±2.07 | 51.94 ±64.04 | 7.97 ±1.13 | 38.86 ±22.25 | 6.78 ±0.73 | 8.37 ±1.48 | 6.09 ±0.90 | 5.87 ±1.31 | 16.50 ±1.79 | 7.96 ±1.26 | 25.03 ±3.08 | 6.38 ±1.08 | 4.64 ±0.85 | 8.12 ±1.95 | 7.87 ±6.41 | 6.03 ±0.93 | 9.72 ±12.75 | 9.43 ±1.21 | 7.44 ±1.54 | 11.43 ±1.85 | 9.93 ±1.48 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h | 2.1B | 0.989 | 0.69 | 3.1 | 11.35 | 15.75 ±9.61 | 7.71 ±1.56 | 8.10 ±1.33 | 10.32 ±2.18 | 27.14 ±24.52 | 8.41 ±1.37 | 32.85 ±51.80 | 7.26 ±0.73 | 8.67 ±1.46 | 6.75 ±0.94 | 6.36 ±1.33 | 17.60 ±2.00 | 9.16 ±1.24 | 26.04 ±3.55 | 6.90 ±1.11 | 5.11 ±0.93 | 8.69 ±1.98 | 8.73 ±6.45 | 6.73 ±0.96 | 10.73 ±12.82 | 11.38 ±2.31 | 7.83 ±1.60 | 14.94 ±4.32 | 11.85 ±1.40 |
| LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h | 2.5B | 0.978 | 0.65 | 3.4 | 12.75 | 19.79 ±10.56 | 8.09 ±1.48 | 8.49 ±1.27 | 10.62 ±6.56 | 42.54 ±56.08 | 10.49 ±20.17 | 38.53 ±19.64 | 6.67 ±1.67 | 8.31 ±1.35 | 5.74 ±0.94 | 5.96 ±4.72 | 17.84 ±1.86 | 8.78 ±1.28 | 26.91 ±3.20 | 8.67 ±7.83 | 4.37 ±0.84 | 12.96 ±15.61 | 9.50 ±6.89 | 6.76 ±0.94 | 12.24 ±13.70 | 11.20 ±2.08 | 7.52 ±1.64 | 14.88 ±3.79 | 15.57 ±2.12 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_80h | 2.1B | 0.940 | 0.51 | 7.1 | 17.13 | 23.31 ±13.20 | 17.17 ±16.96 | 8.08 ±1.22 | 12.86 ±33.42 | 43.96 ±20.96 | 8.85 ±1.46 | 48.95 ±65.81 | 11.33 ±7.87 | 17.96 ±23.52 | 8.50 ±1.13 | 7.52 ±1.51 | 23.71 ±3.08 | 15.05 ±2.19 | 32.36 ±5.60 | 14.84 ±16.91 | 10.81 ±1.91 | 18.87 ±33.72 | 14.35 ±7.27 | 11.42 ±1.77 | 17.29 ±14.39 | 16.47 ±3.76 | 11.82 ±2.88 | 21.12 ±6.92 | 15.93 ±2.04 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_8h | 2.1B | 0.936 | 0.50 | 6.4 | 17.85 | 21.49 ±10.72 | 10.00 ±1.66 | 8.22 ±1.25 | 10.17 ±1.95 | 38.22 ±27.56 | 8.55 ±1.32 | 53.79 ±57.27 | 10.53 ±8.36 | 15.93 ±24.98 | 8.25 ±1.14 | 7.40 ±1.27 | 30.11 ±7.71 | 15.29 ±2.35 | 44.94 ±15.11 | 12.82 ±1.69 | 10.50 ±1.87 | 15.15 ±2.77 | 14.22 ±7.24 | 11.15 ±1.74 | 17.29 ±14.32 | 16.26 ±3.59 | 11.70 ±2.94 | 20.83 ±6.52 | 19.54 ±8.60 |
| LINAGORA/Canary-Qwen3-4B_data-v1_8h | 4.8B | 0.935 | 0.49 | 6.4 | 18.13 | 21.70 ±13.80 | 10.65 ±1.84 | 10.15 ±1.40 | 22.07 ±55.31 | 42.83 ±55.71 | 13.37 ±24.17 | 31.16 ±8.81 | 9.43 ±1.71 | 10.89 ±1.54 | 7.49 ±0.98 | 9.92 ±4.78 | 28.29 ±36.38 | 11.23 ±1.05 | 45.35 ±72.58 | 12.58 ±32.37 | 5.66 ±0.85 | 19.50 ±64.61 | 12.59 ±7.25 | 9.10 ±1.03 | 16.08 ±14.39 | 19.23 ±7.46 | 12.14 ±2.25 | 26.31 ±14.69 | 23.10 ±2.25 |
| LINAGORA/Canary_Luciole-1B_Step-Adapter_2h | 2.1B | 0.922 | 0.45 | 7.4 | 19.57 | 25.54 ±30.62 | 14.15 ±33.16 | 8.53 ±1.34 | 15.26 ±169.56 | 54.37 ±40.13 | 9.05 ±1.30 | 51.90 ±47.03 | 10.30 ±1.52 | 11.58 ±3.99 | 11.18 ±1.55 | 8.15 ±1.52 | 28.03 ±6.27 | 21.82 ±11.24 | 34.24 ±5.50 | 12.91 ±3.50 | 10.00 ±1.84 | 15.82 ±6.72 | 15.27 ±7.40 | 12.41 ±2.66 | 18.13 ±14.51 | 18.41 ±5.11 | 13.77 ±7.87 | 23.04 ±6.52 | 26.52 ±15.14 |
| Qwen/Qwen2-Audio-7B-Instruct | 8.4B | 0.878 | 0.29 | 9.7 | 24.97 | 33.56 ±3.68 | 18.92 ±2.45 | 27.90 ±2.52 | 26.52 ±15.79 | 39.09 ±3.37 | 37.35 ±4.38 | 51.58 ±13.46 | 11.00 ±1.15 | 10.30 ±1.65 | 7.16 ±1.22 | 15.54 ±2.72 | 32.65 ±4.70 | 22.40 ±3.06 | 42.91 ±8.72 | 16.57 ±2.64 | 14.55 ±2.27 | 18.59 ±4.76 | 21.20 ±7.11 | 20.92 ±2.51 | 21.47 ±13.97 | 19.15 ±2.94 | 14.99 ±2.14 | 23.31 ±5.43 | 40.68 ±3.75 |
| LINAGORA/Canary-Qwen3-1.7B_data-v1_8h | 2.5B | 0.817 | 0.08 | 9.9 | 32.72 | 20.80 ±1.78 | 13.01 ±1.94 | 11.92 ±1.34 | 18.06 ±8.22 | 32.69 ±4.36 | 17.08 ±1.58 | 32.04 ±3.48 | 12.09 ±0.95 | 12.26 ±1.89 | 11.05 ±1.12 | 12.97 ±1.80 | 33.12 ±7.10 | 16.66 ±1.57 | 49.57 ±13.82 | 13.93 ±6.25 | 8.33 ±0.99 | 19.52 ±12.34 | 18.95 ±7.88 | 12.38 ±1.20 | 25.53 ±15.59 | 80.39 ±6.12 | 58.79 ±3.43 | 101.99 ±11.44 | 49.77 ±3.54 |
| Qwen/Qwen2.5-Omni-7B | 11B | 0.772 | -0.07 | 11.0 | 35.37 | 44.41 ±9.42 | 25.86 ±17.61 | 12.16 ±2.18 | 36.79 ±35.85 | 82.09 ±20.51 | 31.48 ±7.55 | 78.10 ±31.14 | 41.77 ±10.57 | 75.07 ±20.40 | 20.60 ±3.87 | 29.65 ±23.53 | 48.09 ±21.47 | 18.30 ±3.04 | 77.88 ±42.03 | 28.24 ±23.62 | 13.29 ±2.63 | 43.19 ±46.66 | 18.58 ±12.24 | 13.79 ±2.83 | 23.36 ±24.20 | 35.15 ±16.46 | 15.68 ±3.65 | 54.62 ±32.25 | 31.38 ±4.68 |
| microsoft/Phi-4-multimodal-instruct | 5.6B | 0.746 | -0.18 | 10.9 | 41.47 | 43.81 ±16.23 | 26.18 ±5.77 | 21.29 ±3.57 | 55.24 ±41.81 | 63.68 ±72.63 | 27.56 ±7.56 | 68.93 ±47.91 | 10.29 ±1.80 | 12.42 ±4.75 | 6.36 ±1.06 | 12.09 ±2.27 | 41.56 ±6.57 | 26.78 ±5.05 | 56.33 ±11.83 | 34.85 ±16.84 | 15.44 ±3.23 | 54.25 ±33.36 | 24.70 ±9.17 | 18.83 ±4.24 | 30.58 ±17.70 | 31.95 ±6.49 | 29.88 ±9.76 | 34.03 ±8.53 | 103.10 ±22.76 |
| Qwen/Qwen2.5-Omni-3B | 5.9B | 0.559 | -0.83 | 12.7 | 59.27 | 82.32 ±12.68 | 97.54 ±33.55 | 28.89 ±7.09 | 93.08 ±41.98 | 128.57 ±25.32 | 47.17 ±14.03 | 98.67 ±41.83 | 68.47 ±13.86 | 120.36 ±27.27 | 29.73 ±7.38 | 55.32 ±29.51 | 71.28 ±25.51 | 28.98 ±5.83 | 113.59 ±49.14 | 55.52 ±24.49 | 26.69 ±5.24 | 84.34 ±47.31 | 44.98 ±19.24 | 25.84 ±4.56 | 64.12 ±37.65 | 52.58 ±24.86 | 22.88 ±5.74 | 82.28 ±48.63 | 39.77 ±7.22 |
| nvidia/audio-flamingo-3-hf | 8.2B | 0.473 | -1.17 | 13.4 | 76.51 | 73.74 ±9.09 | 48.31 ±7.41 | 57.50 ±6.22 | 76.10 ±46.43 | 73.64 ±9.60 | 94.59 ±6.21 | 92.27 ±23.11 | 12.58 ±8.94 | 13.22 ±2.87 | 11.98 ±1.63 | 12.53 ±26.61 | 112.01 ±22.08 | 97.16 ±10.60 | 126.86 ±42.67 | 67.31 ±10.68 | 64.86 ±6.38 | 69.77 ±20.29 | 106.41 ±20.15 | 120.32 ±10.00 | 92.50 ±39.02 | 88.70 ±22.54 | 96.60 ±9.93 | 80.80 ±43.95 | 74.85 ±6.63 |
| mistralai/Voxtral-Mini-3B-2507 | 4.68B | 0.000 | -2.83 | 15.0 | 129.09 | 127.95 ±36.03 | 71.61 ±29.42 | 75.16 ±3.59 | 185.12 ±176.73 | 201.50 ±52.98 | 103.83 ±38.14 | 130.46 ±98.40 | 81.49 ±20.97 | 54.35 ±37.70 | 73.73 ±14.57 | 116.40 ±47.54 | 163.93 ±69.63 | 97.87 ±7.06 | 229.99 ±137.44 | 125.22 ±69.41 | 83.45 ±6.91 | 167.00 ±137.18 | 135.14 ±50.01 | 80.32 ±3.53 | 189.97 ±98.55 | 135.23 ±47.91 | 96.91 ±16.09 | 173.54 ±93.23 | 134.67 ±40.94 |
Tasks · AST
Tables
Tasks · AST — BLEU
| Model | Size | Min-Max | Z-Score | Avg Rank | Average | FR-EN · Multilingual_TEDx (FR→EN) | FR-ES · Multilingual_TEDx (FR→ES) | ES-FR · Multilingual_TEDx (ES→FR) | ES-IT · Multilingual_TEDx (ES→IT) |
|---|---|---|---|---|---|---|---|---|---|
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h | 2.1B | 0.999 | 1.22 | 1.2 | 41.78 | 43.63 ±2.10 | 42.92 ±2.15 | 43.27 ±2.12 | 37.30 ±3.39 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h | 2.1B | 0.976 | 1.14 | 2.2 | 40.84 | 42.66 ±2.07 | 43.00 ±2.15 | 40.41 ±2.11 | 37.28 ±3.32 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h | 2.1B | 0.940 | 1.02 | 2.8 | 39.63 | 42.83 ±2.00 | 42.12 ±2.10 | 40.93 ±2.07 | 32.65 ±3.21 |
| LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h | 2.5B | 0.876 | 0.79 | 4.5 | 37.01 | 39.92 ±2.06 | 38.93 ±2.10 | 35.08 ±1.95 | 34.09 ±3.14 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_8h | 2.1B | 0.842 | 0.66 | 5.5 | 35.80 | 35.69 ±2.20 | 36.29 ±2.02 | 39.81 ±2.07 | 31.41 ±3.00 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_80h | 2.1B | 0.835 | 0.64 | 6.2 | 35.56 | 35.97 ±2.15 | 35.91 ±2.03 | 39.30 ±2.06 | 31.07 ±2.99 |
| LINAGORA/Canary_Luciole-1B_Step-Adapter_2h | 2.1B | 0.821 | 0.58 | 6.5 | 34.97 | 33.98 ±2.17 | 34.61 ±2.03 | 39.48 ±2.10 | 31.82 ±3.13 |
| Qwen/Qwen2-Audio-7B-Instruct | 8.4B | 0.610 | -0.14 | 8.8 | 26.86 | 29.15 ±2.03 | 24.60 ±1.71 | 28.09 ±1.79 | 25.60 ±2.55 |
| LINAGORA/Canary-Qwen3-4B_data-v1_8h | 4.8B | 0.594 | -0.16 | 8.8 | 26.47 | 35.51 ±2.08 | 27.14 ±1.94 | 23.90 ±1.80 | 19.34 ±3.08 |
| microsoft/Phi-4-multimodal-instruct | 5.6B | 0.593 | -0.15 | 9.0 | 26.37 | 39.13 ±2.25 | 22.38 ±2.14 | 22.90 ±1.92 | 21.06 ±2.88 |
| Qwen/Qwen2.5-Omni-7B | 11B | 0.481 | -0.58 | 11.2 | 22.14 | 25.53 ±1.71 | 22.45 ±1.83 | 23.56 ±1.87 | 17.03 ±2.49 |
| Qwen/Qwen2.5-Omni-3B | 5.9B | 0.437 | -0.74 | 12.2 | 20.29 | 23.17 ±1.95 | 21.08 ±1.95 | 18.32 ±1.76 | 18.60 ±2.34 |
| LINAGORA/Canary-Qwen3-1.7B_data-v1_8h | 2.5B | 0.416 | -0.79 | 12.8 | 19.61 | 27.71 ±1.83 | 18.19 ±1.67 | 17.13 ±1.49 | 15.40 ±2.60 |
| nvidia/audio-flamingo-3-hf | 8.2B | 0.284 | -1.22 | 13.2 | 14.86 | 29.01 ±1.94 | 13.53 ±1.62 | 12.43 ±1.48 | 4.48 ±1.97 |
| mistralai/Voxtral-Mini-3B-2507 | 4.68B | 0.000 | -2.28 | 15.0 | 3.67 | 4.26 ±0.52 | 3.64 ±0.60 | 3.19 ±0.51 | 3.61 ±0.99 |
Tasks · AST — METEOR (%)
| Model | Size | Min-Max | Z-Score | Avg Rank | Average | FR-EN · Multilingual_TEDx (FR→EN) | FR-ES · Multilingual_TEDx (FR→ES) | ES-FR · Multilingual_TEDx (ES→FR) | ES-IT · Multilingual_TEDx (ES→IT) |
|---|---|---|---|---|---|---|---|---|---|
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h | 2.1B | 0.997 | 1.14 | 1.5 | 62.23 | 66.55 ±1.73 | 62.90 ±1.85 | 59.98 ±1.92 | 59.50 ±3.11 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h | 2.1B | 0.988 | 1.11 | 1.8 | 61.91 | 65.46 ±1.78 | 63.09 ±1.82 | 59.19 ±1.91 | 59.89 ±3.01 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h | 2.1B | 0.976 | 1.07 | 2.8 | 61.34 | 65.73 ±1.76 | 62.85 ±1.84 | 59.08 ±1.92 | 57.69 ±3.04 |
| LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h | 2.5B | 0.915 | 0.87 | 4.5 | 58.79 | 63.43 ±1.81 | 60.94 ±1.89 | 55.08 ±1.87 | 55.72 ±3.13 |
| LINAGORA/Canary_Luciole-1B_Step-Adapter_2h | 2.1B | 0.856 | 0.67 | 5.5 | 56.39 | 59.84 ±2.16 | 57.27 ±2.07 | 55.45 ±1.97 | 53.00 ±3.29 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_8h | 2.1B | 0.854 | 0.66 | 5.2 | 56.26 | 60.18 ±2.16 | 57.90 ±2.06 | 55.72 ±1.93 | 51.22 ±3.24 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_80h | 2.1B | 0.845 | 0.63 | 6.8 | 55.91 | 59.82 ±2.17 | 56.74 ±2.14 | 55.06 ±2.03 | 52.02 ±3.31 |
| Qwen/Qwen2-Audio-7B-Instruct | 8.4B | 0.653 | -0.01 | 8.8 | 48.00 | 51.06 ±2.30 | 49.24 ±2.04 | 46.22 ±2.08 | 45.46 ±3.11 |
| Qwen/Qwen2.5-Omni-7B | 11B | 0.610 | -0.15 | 9.5 | 45.93 | 53.66 ±2.01 | 47.82 ±2.26 | 42.80 ±2.25 | 39.45 ±3.23 |
| Qwen/Qwen2.5-Omni-3B | 5.9B | 0.579 | -0.24 | 9.8 | 44.77 | 51.84 ±2.00 | 46.21 ±2.27 | 39.10 ±2.19 | 41.91 ±3.00 |
| microsoft/Phi-4-multimodal-instruct | 5.6B | 0.526 | -0.39 | 10.5 | 42.31 | 57.03 ±2.36 | 40.09 ±2.66 | 31.28 ±2.46 | 40.85 ±3.57 |
| LINAGORA/Canary-Qwen3-4B_data-v1_8h | 4.8B | 0.486 | -0.56 | 11.5 | 40.85 | 47.57 ±1.93 | 45.23 ±2.00 | 34.56 ±2.00 | 36.04 ±3.14 |
| LINAGORA/Canary-Qwen3-1.7B_data-v1_8h | 2.5B | 0.307 | -1.15 | 13.2 | 33.41 | 41.47 ±1.94 | 34.19 ±2.08 | 27.86 ±1.93 | 30.12 ±3.08 |
| mistralai/Voxtral-Mini-3B-2507 | 4.68B | 0.113 | -1.81 | 14.2 | 25.60 | 29.81 ±1.36 | 25.65 ±1.50 | 21.31 ±1.35 | 25.62 ±2.25 |
| nvidia/audio-flamingo-3-hf | 8.2B | 0.094 | -1.85 | 14.5 | 23.97 | 43.68 ±2.03 | 22.34 ±2.25 | 17.21 ±1.99 | 12.67 ±2.35 |
Tasks · QA
Tables
Math Question Answering
Tasks · QA — ACC (%)
| Model | Size | Min-Max | Z-Score | Avg Rank | Average | EN · spoken-mqa_short_digit |
|---|---|---|---|---|---|---|
| Qwen/Qwen2.5-Omni-7B | 11B | 1.000 | 1.64 | 1.0 | 89.00 | 89.00 ±6.13 |
| Qwen/Qwen2.5-Omni-3B | 5.9B | 0.948 | 1.48 | 2.0 | 85.00 | 85.00 ±7.00 |
| mistralai/Voxtral-Mini-3B-2507 | 4.68B | 0.740 | 0.84 | 3.0 | 69.00 | 69.00 ±9.06 |
| LINAGORA/Canary-Qwen3-4B_data-v1_8h | 4.8B | 0.701 | 0.72 | 4.0 | 66.00 | 66.00 ±9.28 |
| LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h | 2.5B | 0.701 | 0.72 | 5.0 | 66.00 | 66.00 ±9.28 |
| Qwen/Qwen2-Audio-7B-Instruct | 8.4B | 0.701 | 0.72 | 6.0 | 66.00 | 66.00 ±9.28 |
| nvidia/audio-flamingo-3-hf | 8.2B | 0.675 | 0.64 | 7.0 | 64.00 | 64.00 ±9.41 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h | 2.1B | 0.481 | 0.03 | 8.0 | 49.00 | 49.00 ±9.80 |
| LINAGORA/Canary-Qwen3-1.7B_data-v1_8h | 2.5B | 0.364 | -0.33 | 9.0 | 40.00 | 40.00 ±9.60 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h | 2.1B | 0.286 | -0.57 | 10.0 | 34.00 | 34.00 ±9.28 |
| microsoft/Phi-4-multimodal-instruct | 5.6B | 0.143 | -1.01 | 11.0 | 23.00 | 23.00 ±8.25 |
| LINAGORA/Canary_Luciole-1B_Step-Adapter_2h | 2.1B | 0.117 | -1.09 | 12.0 | 21.00 | 21.00 ±7.98 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_80h | 2.1B | 0.117 | -1.09 | 13.0 | 21.00 | 21.00 ±7.98 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_8h | 2.1B | 0.078 | -1.22 | 14.0 | 18.00 | 18.00 ±7.53 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h | 2.1B | 0.000 | -1.46 | 15.0 | 12.00 | 12.00 ±6.37 |
Question Answering
Tasks · QA — FLOW_JUDGE
| Model | Size | Min-Max | Z-Score | Avg Rank | Average | FR · CohereLabs-Aya_collection | FR · Vigogne--Alpaca | FR · VoxPopuli-QA | EN · NationalSpeechCorpus_SQA | EN · OpenHermes_audio | EN · SLUE-P2-SQA5 | EN · SpokenWOZ_AIR-Bench | EN · alpaca_audio | EN · fisher_AIR-Bench | EN · public-sg-speech |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| mistralai/Voxtral-Mini-3B-2507 | 4.68B | 0.977 | 1.71 | 1.5 | 70.19 | 48.39 ±0.41 | 57.08 ±0.10 | 77.64 ±0.08 | 74.00 ±0.07 | 77.40 ±0.20 | 88.28 ±0.09 | 77.20 ±0.15 | 78.20 ±0.19 | 81.10 ±0.11 | 79.20 ±0.06 |
| LINAGORA/Canary-Qwen3-4B_data-v1_8h | 4.8B | 0.737 | 0.84 | 4.0 | 64.56 | 48.39 ±0.33 | 52.56 ±0.09 | 78.56 ±0.10 | 59.70 ±0.09 | 67.40 ±0.25 | 86.13 ±0.12 | 65.18 ±0.20 | 70.40 ±0.25 | 69.80 ±0.17 | 66.36 ±0.08 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h | 2.1B | 0.663 | 0.60 | 4.5 | 62.35 | 50.97 ±0.36 | 53.92 ±0.10 | 80.56 ±0.10 | 55.85 ±0.09 | 56.00 ±0.26 | 71.42 ±0.17 | 65.18 ±0.20 | 63.80 ±0.25 | 64.90 ±0.18 | 63.00 ±0.08 |
| LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h | 2.5B | 0.637 | 0.45 | 6.0 | 62.74 | 43.23 ±0.28 | 51.00 ±0.10 | 76.68 ±0.10 | 59.75 ±0.09 | 63.00 ±0.21 | 81.67 ±0.14 | 69.12 ±0.20 | 71.80 ±0.23 | 70.30 ±0.17 | 63.88 ±0.08 |
| Qwen/Qwen2.5-Omni-7B | 11B | 0.626 | 0.37 | 6.5 | 63.12 | 42.58 ±0.38 | 51.08 ±0.10 | 67.68 ±0.10 | 63.05 ±0.09 | 68.40 ±0.19 | 90.10 ±0.09 | 71.30 ±0.19 | 69.40 ±0.22 | 73.70 ±0.16 | 71.32 ±0.08 |
| microsoft/Phi-4-multimodal-instruct | 5.6B | 0.547 | 0.07 | 7.5 | 61.54 | 35.48 ±0.32 | 41.76 ±0.09 | 79.24 ±0.10 | 63.75 ±0.09 | 61.80 ±0.27 | 91.76 ±0.08 | 74.72 ±0.17 | 53.40 ±0.29 | 76.90 ±0.15 | 74.16 ±0.08 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h | 2.1B | 0.520 | 0.04 | 7.5 | 59.68 | 45.16 ±0.32 | 50.64 ±0.10 | 77.76 ±0.10 | 54.00 ±0.09 | 54.40 ±0.24 | 71.37 ±0.17 | 64.77 ±0.19 | 57.80 ±0.26 | 67.00 ±0.17 | 61.20 ±0.08 |
| Qwen/Qwen2.5-Omni-3B | 5.9B | 0.519 | -0.04 | 8.5 | 61.05 | 39.35 ±0.39 | 48.84 ±0.10 | 65.60 ±0.11 | 64.95 ±0.09 | 64.80 ±0.18 | 88.97 ±0.10 | 67.56 ±0.20 | 68.60 ±0.23 | 71.10 ±0.17 | 69.92 ±0.08 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h | 2.1B | 0.517 | 0.04 | 8.0 | 59.44 | 45.81 ±0.38 | 53.52 ±0.10 | 76.48 ±0.10 | 54.55 ±0.09 | 53.60 ±0.25 | 66.96 ±0.17 | 59.48 ±0.20 | 62.40 ±0.25 | 64.70 ±0.17 | 60.28 ±0.09 |
| Qwen/Qwen2-Audio-7B-Instruct | 8.4B | 0.457 | -0.27 | 10.5 | 59.60 | 34.84 ±0.42 | 43.28 ±0.10 | 74.84 ±0.10 | 57.45 ±0.08 | 62.60 ±0.27 | 87.94 ±0.11 | 68.91 ±0.19 | 63.80 ±0.28 | 72.00 ±0.16 | 64.76 ±0.08 |
| LINAGORA/Canary-Qwen3-1.7B_data-v1_8h | 2.5B | 0.434 | -0.31 | 9.5 | 58.31 | 42.58 ±0.31 | 48.20 ±0.10 | 72.56 ±0.11 | 54.95 ±0.09 | 57.00 ±0.22 | 77.35 ±0.15 | 57.62 ±0.20 | 60.60 ±0.26 | 63.90 ±0.18 | 63.80 ±0.08 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_80h | 2.1B | 0.347 | -0.60 | 10.5 | 55.81 | 46.45 ±0.31 | 51.00 ±0.10 | 70.92 ±0.11 | 49.30 ±0.09 | 60.20 ±0.22 | 62.79 ±0.18 | 47.98 ±0.19 | 58.20 ±0.26 | 50.50 ±0.17 | 59.48 ±0.08 |
| LINAGORA/Canary_Luciole-1B_Step-Adapter_2h | 2.1B | 0.343 | -0.62 | 11.0 | 55.87 | 45.16 ±0.33 | 49.88 ±0.10 | 71.20 ±0.11 | 49.30 ±0.09 | 62.20 ±0.23 | 60.98 ±0.18 | 48.39 ±0.19 | 59.60 ±0.27 | 53.30 ±0.17 | 60.48 ±0.08 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_8h | 2.1B | 0.318 | -0.71 | 11.5 | 55.23 | 45.81 ±0.35 | 49.68 ±0.10 | 71.20 ±0.11 | 48.80 ±0.09 | 58.60 ±0.22 | 63.14 ±0.18 | 46.94 ±0.18 | 57.20 ±0.25 | 50.70 ±0.18 | 58.88 ±0.09 |
| nvidia/audio-flamingo-3-hf | 8.2B | 0.122 | -1.54 | 13.0 | 52.74 | 37.42 ±0.32 | 38.12 ±0.09 | 58.36 ±0.10 | 63.60 ±0.08 | 48.80 ±0.27 | 78.48 ±0.13 | 69.22 ±0.17 | 24.00 ±0.41 | 71.70 ±0.13 | 70.08 ±0.07 |
Tasks · Others
Tables
Age Recognition
Tasks · Others — FLOW_JUDGE
| Model | Size | Min-Max | Z-Score | Avg Rank | Average | FR · CommonVoice_Age | EN · CommonVoice_Age |
|---|---|---|---|---|---|---|---|
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h | 2.1B | 0.984 | 1.34 | 1.5 | 34.40 | 27.60 ±0.04 | 41.20 ±0.04 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h | 2.1B | 0.913 | 1.13 | 3.0 | 32.40 | 24.00 ±0.04 | 40.80 ±0.04 |
| LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h | 2.5B | 0.879 | 1.02 | 2.0 | 31.80 | 21.00 ±0.04 | 42.60 ±0.04 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_80h | 2.1B | 0.812 | 0.83 | 4.0 | 29.70 | 18.40 ±0.03 | 41.00 ±0.04 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h | 2.1B | 0.793 | 0.79 | 4.5 | 28.70 | 19.00 ±0.03 | 38.40 ±0.04 |
| LINAGORA/Canary_Luciole-1B_Step-Adapter_2h | 2.1B | 0.764 | 0.71 | 6.5 | 27.80 | 17.80 ±0.03 | 37.80 ±0.04 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_8h | 2.1B | 0.726 | 0.59 | 6.5 | 26.90 | 15.40 ±0.03 | 38.40 ±0.04 |
| Qwen/Qwen2-Audio-7B-Instruct | 8.4B | 0.646 | 0.40 | 8.5 | 23.60 | 15.00 ±0.03 | 32.20 ±0.04 |
| nvidia/audio-flamingo-3-hf | 8.2B | 0.472 | -0.15 | 10.0 | 19.50 | 3.20 ±0.02 | 35.80 ±0.04 |
| microsoft/Phi-4-multimodal-instruct | 5.6B | 0.120 | -0.96 | 10.0 | 4.00 | 5.00 ±0.02 | 3.00 ±0.01 |
| Qwen/Qwen2.5-Omni-3B | 5.9B | 0.093 | -1.02 | 12.0 | 2.90 | 4.80 ±0.02 | 1.00 ±0.01 |
| LINAGORA/Canary-Qwen3-4B_data-v1_8h | 4.8B | 0.075 | -1.08 | 12.0 | 2.50 | 3.60 ±0.02 | 1.40 ±0.01 |
| mistralai/Voxtral-Mini-3B-2507 | 4.68B | 0.052 | -1.17 | 12.5 | 2.40 | 0.40 ±0.01 | 4.40 ±0.02 |
| LINAGORA/Canary-Qwen3-1.7B_data-v1_8h | 2.5B | 0.052 | -1.15 | 12.5 | 2.00 | 1.80 ±0.01 | 2.20 ±0.01 |
| Qwen/Qwen2.5-Omni-7B | 11B | 0.000 | -1.29 | 14.5 | 0.20 | 0.40 ±0.01 | 0.00 ±0.00 |
Dialogue Summarization
Tasks · Others — FLOW_JUDGE
| Model | Size | Min-Max | Z-Score | Avg Rank | Average | EN · NationalSpeechCorpus_SDS |
|---|---|---|---|---|---|---|
| mistralai/Voxtral-Mini-3B-2507 | 4.68B | 1.000 | 2.42 | 1.0 | 66.67 | 66.67 ±0.08 |
| microsoft/Phi-4-multimodal-instruct | 5.6B | 0.599 | 1.00 | 2.0 | 56.40 | 56.40 ±0.09 |
| nvidia/audio-flamingo-3-hf | 8.2B | 0.565 | 0.88 | 3.0 | 55.53 | 55.53 ±0.09 |
| Qwen/Qwen2.5-Omni-7B | 11B | 0.557 | 0.85 | 4.0 | 55.33 | 55.33 ±0.08 |
| Qwen/Qwen2.5-Omni-3B | 5.9B | 0.552 | 0.83 | 5.0 | 55.20 | 55.20 ±0.09 |
| Qwen/Qwen2-Audio-7B-Instruct | 8.4B | 0.424 | 0.38 | 6.0 | 51.93 | 51.93 ±0.08 |
| LINAGORA/Canary-Qwen3-4B_data-v1_8h | 4.8B | 0.326 | 0.03 | 7.0 | 49.40 | 49.40 ±0.09 |
| LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h | 2.5B | 0.250 | -0.23 | 8.0 | 47.47 | 47.47 ±0.09 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h | 2.1B | 0.148 | -0.59 | 9.0 | 44.87 | 44.87 ±0.09 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h | 2.1B | 0.138 | -0.63 | 10.0 | 44.60 | 44.60 ±0.09 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_8h | 2.1B | 0.112 | -0.72 | 11.0 | 43.93 | 43.93 ±0.09 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h | 2.1B | 0.034 | -0.99 | 12.0 | 41.93 | 41.93 ±0.10 |
| LINAGORA/Canary-Qwen3-1.7B_data-v1_8h | 2.5B | 0.021 | -1.04 | 13.0 | 41.60 | 41.60 ±0.09 |
| LINAGORA/Canary_Luciole-1B_Step-Adapter_2h | 2.1B | 0.008 | -1.09 | 14.0 | 41.27 | 41.27 ±0.08 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_80h | 2.1B | 0.000 | -1.11 | 15.0 | 41.07 | 41.07 ±0.09 |
Emotion Recognition
Tasks · Others — FLOW_JUDGE
| Model | Size | Min-Max | Z-Score | Avg Rank | Average | FR · MELD_Emotion | EN · IEMOCAP-Emotion | EN · MELD_Emotion |
|---|---|---|---|---|---|---|---|---|
| nvidia/audio-flamingo-3-hf | 8.2B | 1.000 | 3.17 | 1.0 | 52.80 | 53.00 ±0.04 | 43.20 ±0.04 | 62.00 ±0.04 |
| Qwen/Qwen2-Audio-7B-Instruct | 8.4B | 0.448 | 0.87 | 3.5 | 26.70 | 16.20 ±0.03 | 35.80 ±0.04 | 38.60 ±0.04 |
| LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h | 2.5B | 0.305 | 0.34 | 3.0 | 21.45 | 19.80 ±0.03 | 25.80 ±0.04 | 20.40 ±0.04 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h | 2.1B | 0.301 | 0.32 | 3.0 | 21.30 | 19.80 ±0.03 | 15.60 ±0.03 | 30.00 ±0.04 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h | 2.1B | 0.231 | 0.06 | 6.0 | 18.50 | 19.40 ±0.03 | 14.40 ±0.03 | 20.80 ±0.04 |
| microsoft/Phi-4-multimodal-instruct | 5.6B | 0.214 | -0.05 | 7.0 | 17.00 | 12.20 ±0.03 | 22.20 ±0.04 | 21.40 ±0.04 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h | 2.1B | 0.192 | -0.12 | 6.5 | 16.45 | 15.20 ±0.03 | 16.20 ±0.03 | 19.20 ±0.03 |
| mistralai/Voxtral-Mini-3B-2507 | 4.68B | 0.142 | -0.34 | 8.5 | 13.80 | 9.20 ±0.03 | 27.20 ±0.04 | 9.60 ±0.03 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_8h | 2.1B | 0.141 | -0.32 | 8.5 | 14.15 | 12.60 ±0.03 | 13.40 ±0.03 | 18.00 ±0.03 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_80h | 2.1B | 0.140 | -0.33 | 8.5 | 14.10 | 12.60 ±0.03 | 12.80 ±0.03 | 18.40 ±0.03 |
| LINAGORA/Canary_Luciole-1B_Step-Adapter_2h | 2.1B | 0.095 | -0.51 | 11.5 | 12.05 | 10.20 ±0.03 | 14.40 ±0.03 | 13.40 ±0.03 |
| LINAGORA/Canary-Qwen3-4B_data-v1_8h | 4.8B | 0.051 | -0.70 | 13.5 | 9.90 | 6.80 ±0.02 | 18.80 ±0.03 | 7.20 ±0.02 |
| LINAGORA/Canary-Qwen3-1.7B_data-v1_8h | 2.5B | 0.040 | -0.75 | 13.5 | 9.25 | 4.80 ±0.02 | 22.00 ±0.04 | 5.40 ±0.02 |
| Qwen/Qwen2.5-Omni-3B | 5.9B | 0.038 | -0.77 | 12.5 | 8.90 | 2.60 ±0.01 | 24.80 ±0.04 | 5.60 ±0.02 |
| Qwen/Qwen2.5-Omni-7B | 11B | 0.015 | -0.86 | 13.5 | 7.90 | 1.60 ±0.01 | 22.60 ±0.04 | 5.80 ±0.02 |
Gender Recognition
Tasks · Others — FLOW_JUDGE
| Model | Size | Min-Max | Z-Score | Avg Rank | Average | FR · CommonVoice_Gender | EN · CommonVoice_Gender | EN · IEMOCAP-gender |
|---|---|---|---|---|---|---|---|---|
| Qwen/Qwen2-Audio-7B-Instruct | 8.4B | 0.948 | 1.40 | 4.0 | 81.10 | 67.00 ±0.04 | 92.40 ±0.02 | 98.00 ±0.01 |
| nvidia/audio-flamingo-3-hf | 8.2B | 0.787 | 0.89 | 5.5 | 68.95 | 48.40 ±0.04 | 93.00 ±0.02 | 86.00 ±0.03 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h | 2.1B | 0.775 | 0.72 | 2.5 | 65.95 | 74.60 ±0.04 | 65.60 ±0.04 | 49.00 ±0.04 |
| LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h | 2.5B | 0.755 | 0.66 | 3.0 | 64.55 | 71.60 ±0.04 | 68.20 ±0.04 | 46.80 ±0.04 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h | 2.1B | 0.723 | 0.55 | 4.5 | 61.85 | 71.60 ±0.04 | 65.40 ±0.04 | 38.80 ±0.04 |
| LINAGORA/Canary_Luciole-1B_Step-Adapter_2h | 2.1B | 0.699 | 0.48 | 7.0 | 60.20 | 66.20 ±0.04 | 61.40 ±0.04 | 47.00 ±0.04 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_8h | 2.1B | 0.698 | 0.47 | 6.0 | 59.95 | 69.00 ±0.04 | 54.80 ±0.04 | 47.00 ±0.04 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_80h | 2.1B | 0.683 | 0.41 | 7.0 | 58.70 | 68.20 ±0.04 | 52.00 ±0.04 | 46.40 ±0.04 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h | 2.1B | 0.672 | 0.38 | 8.0 | 57.80 | 68.00 ±0.04 | 62.20 ±0.04 | 33.00 ±0.04 |
| microsoft/Phi-4-multimodal-instruct | 5.6B | 0.476 | -0.16 | 7.5 | 44.15 | 31.00 ±0.04 | 58.40 ±0.04 | 56.20 ±0.04 |
| LINAGORA/Canary-Qwen3-1.7B_data-v1_8h | 2.5B | 0.327 | -0.70 | 12.0 | 31.85 | 27.40 ±0.04 | 40.40 ±0.04 | 32.20 ±0.04 |
| LINAGORA/Canary-Qwen3-4B_data-v1_8h | 4.8B | 0.322 | -0.72 | 11.5 | 31.35 | 29.20 ±0.04 | 17.60 ±0.03 | 49.40 ±0.04 |
| mistralai/Voxtral-Mini-3B-2507 | 4.68B | 0.195 | -1.18 | 13.0 | 20.75 | 28.20 ±0.04 | 26.20 ±0.04 | 0.40 ±0.01 |
| Qwen/Qwen2.5-Omni-7B | 11B | 0.098 | -1.43 | 13.5 | 14.25 | 6.40 ±0.02 | 8.80 ±0.02 | 35.40 ±0.04 |
| Qwen/Qwen2.5-Omni-3B | 5.9B | 0.000 | -1.77 | 15.0 | 6.35 | 1.60 ±0.01 | 9.20 ±0.03 | 13.00 ±0.03 |
Spoken Language Identification
Tasks · Others — FLOW_JUDGE
| Model | Size | Min-Max | Z-Score | Avg Rank | Average | OTHER · CoVost2_AIR-Bench |
|---|---|---|---|---|---|---|
| nvidia/audio-flamingo-3-hf | 8.2B | 1.000 | 1.87 | 1.0 | 93.00 | 93.00 ±0.02 |
| Qwen/Qwen2-Audio-7B-Instruct | 8.4B | 0.912 | 1.60 | 2.0 | 88.40 | 88.40 ±0.03 |
| Qwen/Qwen2.5-Omni-7B | 11B | 0.748 | 1.08 | 3.0 | 79.80 | 79.80 ±0.04 |
| mistralai/Voxtral-Mini-3B-2507 | 4.68B | 0.668 | 0.83 | 4.0 | 75.60 | 75.60 ±0.04 |
| LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h | 2.5B | 0.611 | 0.65 | 5.0 | 72.60 | 72.60 ±0.04 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h | 2.1B | 0.435 | 0.11 | 6.0 | 63.40 | 63.40 ±0.04 |
| Qwen/Qwen2.5-Omni-3B | 5.9B | 0.431 | 0.09 | 7.0 | 63.20 | 63.20 ±0.04 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h | 2.1B | 0.401 | -0.00 | 8.0 | 61.60 | 61.60 ±0.04 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h | 2.1B | 0.393 | -0.03 | 9.0 | 61.20 | 61.20 ±0.04 |
| microsoft/Phi-4-multimodal-instruct | 5.6B | 0.195 | -0.65 | 10.0 | 50.80 | 50.80 ±0.04 |
| LINAGORA/Canary_Luciole-1B_Step-Adapter_2h | 2.1B | 0.076 | -1.02 | 11.0 | 44.60 | 44.60 ±0.04 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_8h | 2.1B | 0.057 | -1.07 | 12.0 | 43.60 | 43.60 ±0.04 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_80h | 2.1B | 0.050 | -1.10 | 13.0 | 43.20 | 43.20 ±0.04 |
| LINAGORA/Canary-Qwen3-1.7B_data-v1_8h | 2.5B | 0.042 | -1.12 | 14.0 | 42.80 | 42.80 ±0.04 |
| LINAGORA/Canary-Qwen3-4B_data-v1_8h | 4.8B | 0.000 | -1.25 | 15.0 | 40.60 | 40.60 ±0.04 |
Tasks · Music
Tables
Music Captioning
Tasks · Music — FLOW_JUDGE
| Model | Size | Min-Max | Z-Score | Avg Rank | Average | OTHER · MusicCaps |
|---|---|---|---|---|---|---|
| Qwen/Qwen2-Audio-7B-Instruct | 8.4B | 1.000 | 2.04 | 1.0 | 59.76 | 59.76 ±0.04 |
| nvidia/audio-flamingo-3-hf | 8.2B | 0.864 | 1.58 | 2.0 | 55.48 | 55.48 ±0.11 |
| mistralai/Voxtral-Mini-3B-2507 | 4.68B | 0.709 | 1.05 | 3.0 | 50.60 | 50.60 ±0.07 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_8h | 2.1B | 0.555 | 0.53 | 4.0 | 45.76 | 45.76 ±0.05 |
| LINAGORA/Canary_Luciole-1B_Step-Adapter_2h | 2.1B | 0.551 | 0.51 | 5.0 | 45.64 | 45.64 ±0.05 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_80h | 2.1B | 0.541 | 0.48 | 6.0 | 45.32 | 45.32 ±0.05 |
| microsoft/Phi-4-multimodal-instruct | 5.6B | 0.507 | 0.36 | 7.0 | 44.24 | 44.24 ±0.06 |
| Qwen/Qwen2.5-Omni-7B | 11B | 0.290 | -0.38 | 8.0 | 37.40 | 37.40 ±0.07 |
| LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h | 2.5B | 0.257 | -0.49 | 9.0 | 36.36 | 36.36 ±0.06 |
| Qwen/Qwen2.5-Omni-3B | 5.9B | 0.254 | -0.50 | 10.0 | 36.28 | 36.28 ±0.07 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h | 2.1B | 0.189 | -0.72 | 11.0 | 34.24 | 34.24 ±0.06 |
| LINAGORA/Canary-Qwen3-4B_data-v1_8h | 4.8B | 0.178 | -0.76 | 12.0 | 33.88 | 33.88 ±0.07 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h | 2.1B | 0.067 | -1.13 | 13.0 | 30.40 | 30.40 ±0.06 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h | 2.1B | 0.046 | -1.21 | 14.0 | 29.72 | 29.72 ±0.06 |
| LINAGORA/Canary-Qwen3-1.7B_data-v1_8h | 2.5B | 0.000 | -1.36 | 15.0 | 28.28 | 28.28 ±0.06 |
Music Question Answering
Tasks · Music — FLOW_JUDGE
| Model | Size | Min-Max | Z-Score | Avg Rank | Average | FR · MusicCaps-QA | EN · MTJ-Jamendo_AIR-Bench | EN · MusicCaps-QA |
|---|---|---|---|---|---|---|---|---|
| Qwen/Qwen2-Audio-7B-Instruct | 8.4B | 0.970 | 1.99 | 1.5 | 59.00 | 55.32 ±0.07 | 63.20 ±0.04 | 62.16 ±0.08 |
| nvidia/audio-flamingo-3-hf | 8.2B | 0.906 | 1.77 | 2.5 | 57.16 | 54.72 ±0.09 | 62.60 ±0.04 | 56.60 ±0.08 |
| mistralai/Voxtral-Mini-3B-2507 | 4.68B | 0.684 | 0.99 | 2.0 | 49.57 | 56.48 ±0.07 | 28.00 ±0.04 | 57.32 ±0.08 |
| microsoft/Phi-4-multimodal-instruct | 5.6B | 0.560 | 0.59 | 4.0 | 46.87 | 52.68 ±0.08 | 31.80 ±0.04 | 50.32 ±0.09 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h | 2.1B | 0.396 | 0.05 | 6.0 | 42.88 | 48.84 ±0.07 | 27.60 ±0.04 | 46.24 ±0.08 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h | 2.1B | 0.382 | 0.00 | 7.5 | 42.27 | 49.36 ±0.08 | 23.40 ±0.04 | 46.96 ±0.09 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h | 2.1B | 0.317 | -0.22 | 10.5 | 40.42 | 48.68 ±0.08 | 20.60 ±0.04 | 43.72 ±0.09 |
| LINAGORA/Canary_Luciole-1B_Step-Adapter_2h | 2.1B | 0.288 | -0.29 | 8.5 | 40.76 | 44.76 ±0.07 | 29.60 ±0.04 | 43.92 ±0.08 |
| LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h | 2.5B | 0.258 | -0.41 | 10.5 | 39.32 | 46.32 ±0.08 | 18.80 ±0.03 | 45.84 ±0.08 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_80h | 2.1B | 0.255 | -0.40 | 9.0 | 40.11 | 43.44 ±0.07 | 31.20 ±0.04 | 42.36 ±0.08 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_8h | 2.1B | 0.219 | -0.52 | 10.5 | 39.22 | 42.64 ±0.07 | 28.80 ±0.04 | 42.80 ±0.08 |
| LINAGORA/Canary-Qwen3-4B_data-v1_8h | 4.8B | 0.205 | -0.57 | 10.5 | 38.48 | 43.60 ±0.07 | 24.80 ±0.04 | 41.92 ±0.08 |
| Qwen/Qwen2.5-Omni-7B | 11B | 0.143 | -0.75 | 9.5 | 38.22 | 38.28 ±0.09 | 41.00 ±0.04 | 35.32 ±0.08 |
| LINAGORA/Canary-Qwen3-1.7B_data-v1_8h | 2.5B | 0.069 | -1.01 | 12.5 | 35.77 | 38.56 ±0.07 | 23.80 ±0.04 | 42.16 ±0.08 |
| Qwen/Qwen2.5-Omni-3B | 5.9B | 0.000 | -1.23 | 15.0 | 34.06 | 37.08 ±0.08 | 21.00 ±0.04 | 41.08 ±0.08 |
Tasks · Sound
Tables
Audio Captioning
Tasks · Sound — FLOW_JUDGE
| Model | Size | Min-Max | Z-Score | Avg Rank | Average | OTHER · AudioCaps | OTHER · WavCaps |
|---|---|---|---|---|---|---|---|
| nvidia/audio-flamingo-3-hf | 8.2B | 1.000 | 1.90 | 1.0 | 57.76 | 64.00 ±0.06 | 51.52 ±0.10 |
| Qwen/Qwen2.5-Omni-7B | 11B | 0.914 | 1.63 | 2.0 | 54.68 | 56.60 ±0.09 | 52.76 ±0.10 |
| Qwen/Qwen2-Audio-7B-Instruct | 8.4B | 0.887 | 1.55 | 3.0 | 53.70 | 54.36 ±0.09 | 53.04 ±0.10 |
| microsoft/Phi-4-multimodal-instruct | 5.6B | 0.711 | 1.00 | 4.0 | 47.34 | 48.40 ±0.08 | 46.28 ±0.09 |
| mistralai/Voxtral-Mini-3B-2507 | 4.68B | 0.476 | 0.27 | 5.0 | 38.90 | 38.56 ±0.07 | 39.24 ±0.07 |
| Qwen/Qwen2.5-Omni-3B | 5.9B | 0.393 | 0.01 | 6.0 | 35.88 | 37.48 ±0.09 | 34.28 ±0.09 |
| LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h | 2.5B | 0.325 | -0.20 | 7.0 | 33.46 | 31.80 ±0.07 | 35.12 ±0.08 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h | 2.1B | 0.263 | -0.40 | 8.0 | 31.22 | 30.60 ±0.07 | 31.84 ±0.08 |
| LINAGORA/Canary_Luciole-1B_Step-Adapter_2h | 2.1B | 0.192 | -0.62 | 9.0 | 28.64 | 25.96 ±0.05 | 31.32 ±0.07 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_8h | 2.1B | 0.191 | -0.62 | 10.0 | 28.62 | 26.84 ±0.06 | 30.40 ±0.07 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h | 2.1B | 0.166 | -0.70 | 11.0 | 27.72 | 26.56 ±0.06 | 28.88 ±0.07 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_80h | 2.1B | 0.163 | -0.71 | 12.0 | 27.62 | 25.28 ±0.05 | 29.96 ±0.07 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h | 2.1B | 0.157 | -0.73 | 13.0 | 27.38 | 25.68 ±0.05 | 29.08 ±0.07 |
| LINAGORA/Canary-Qwen3-4B_data-v1_8h | 4.8B | 0.018 | -1.16 | 14.0 | 22.38 | 21.24 ±0.03 | 23.52 ±0.05 |
| LINAGORA/Canary-Qwen3-1.7B_data-v1_8h | 2.5B | 0.000 | -1.21 | 15.0 | 21.74 | 21.04 ±0.03 | 22.44 ±0.05 |
Audio Question Answering
Tasks · Sound — FLOW_JUDGE
| Model | Size | Min-Max | Z-Score | Avg Rank | Average | EN · AudioCaps-QA | EN · Clotho-AQA | EN · WavCaps-QA |
|---|---|---|---|---|---|---|---|---|
| Qwen/Qwen2-Audio-7B-Instruct | 8.4B | 1.000 | 1.47 | 1.0 | 60.99 | 63.26 ±0.15 | 57.88 ±0.14 | 61.84 ±0.16 |
| nvidia/audio-flamingo-3-hf | 8.2B | 0.928 | 1.27 | 2.0 | 59.73 | 63.71 ±0.11 | 52.64 ±0.13 | 62.83 ±0.14 |
| Qwen/Qwen2.5-Omni-7B | 11B | 0.897 | 1.18 | 3.0 | 59.19 | 58.08 ±0.16 | 65.92 ±0.14 | 53.55 ±0.17 |
| microsoft/Phi-4-multimodal-instruct | 5.6B | 0.888 | 1.16 | 4.0 | 59.02 | 56.10 ±0.15 | 63.72 ±0.15 | 57.24 ±0.17 |
| mistralai/Voxtral-Mini-3B-2507 | 4.68B | 0.869 | 1.11 | 5.0 | 58.69 | 60.13 ±0.12 | 58.72 ±0.11 | 57.24 ±0.14 |
| Qwen/Qwen2.5-Omni-3B | 5.9B | 0.542 | 0.18 | 6.0 | 52.93 | 53.42 ±0.15 | 58.28 ±0.13 | 47.11 ±0.16 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h | 2.1B | 0.504 | 0.07 | 7.0 | 52.27 | 47.60 ±0.14 | 61.52 ±0.15 | 47.70 ±0.16 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h | 2.1B | 0.426 | -0.15 | 8.0 | 50.90 | 44.41 ±0.14 | 61.84 ±0.16 | 46.45 ±0.16 |
| LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h | 2.5B | 0.421 | -0.16 | 9.0 | 50.80 | 45.69 ±0.14 | 61.20 ±0.16 | 45.53 ±0.16 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h | 2.1B | 0.401 | -0.22 | 10.0 | 50.46 | 44.66 ±0.15 | 60.32 ±0.16 | 46.38 ±0.16 |
| LINAGORA/Canary_Luciole-1B_Step-Adapter_2h | 2.1B | 0.146 | -0.94 | 11.0 | 45.98 | 40.77 ±0.14 | 53.68 ±0.15 | 43.49 ±0.15 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_8h | 2.1B | 0.111 | -1.04 | 12.0 | 45.35 | 39.62 ±0.14 | 52.76 ±0.15 | 43.68 ±0.15 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_80h | 2.1B | 0.045 | -1.23 | 13.0 | 44.21 | 39.62 ±0.13 | 51.16 ±0.15 | 41.84 ±0.15 |
| LINAGORA/Canary-Qwen3-1.7B_data-v1_8h | 2.5B | 0.005 | -1.34 | 14.0 | 43.49 | 38.34 ±0.14 | 52.72 ±0.15 | 39.41 ±0.15 |
| LINAGORA/Canary-Qwen3-4B_data-v1_8h | 4.8B | 0.000 | -1.35 | 15.0 | 43.41 | 39.68 ±0.14 | 49.56 ±0.15 | 40.99 ±0.15 |
Languages · French
Tables
French — Models × Tasks
| Model | Size | Min-Max | Z-Score | Avg Rank | ASR (WER %) | CommonVoice | Fleurs | Multilingual_TEDx | SUMM-RE | VoxPopuli | YouTubeFr | AST (METEOR %) | Multilingual_TEDx (FR→EN) | Multilingual_TEDx (FR→ES) | QUESTION ANSWERING (FLOW_JUDGE) | CohereLabs-Aya_collection | Vigogne--Alpaca | VoxPopuli-QA | MUSIC QUESTION ANSWERING (FLOW_JUDGE) | EMOTION RECOGNITION (FLOW_JUDGE) | GENDER RECOGNITION (FLOW_JUDGE) | AGE RECOGNITION (FLOW_JUDGE) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h | 2.1B | 0.833 | 0.95 | 2.0 | 15.12 ±9.45 | 6.98 ±1.42 | 6.86 ±1.23 | 9.25 ±3.73 | 28.70 ±55.10 | 8.16 ±1.15 | 30.76 ±11.95 | 64.72 ±1.27 | 66.55 ±1.73 | 62.90 ±1.85 | 61.82 ±0.08 | 50.97 ±0.36 | 53.92 ±0.10 | 80.56 ±0.10 | 48.84 ±0.07 | 19.80 ±0.03 | 74.60 ±0.04 | 24.00 ±0.04 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h | 2.1B | 0.801 | 0.81 | 3.6 | 20.53 ±11.37 | 7.00 ±1.35 | 7.46 ±1.27 | 9.94 ±2.07 | 51.94 ±64.04 | 7.97 ±1.13 | 38.86 ±22.25 | 64.28 ±1.28 | 65.46 ±1.78 | 63.09 ±1.82 | 58.60 ±0.08 | 45.81 ±0.38 | 53.52 ±0.10 | 76.48 ±0.10 | 49.36 ±0.08 | 15.20 ±0.03 | 71.60 ±0.04 | 27.60 ±0.04 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h | 2.1B | 0.756 | 0.68 | 4.3 | 15.75 ±9.61 | 7.71 ±1.56 | 8.10 ±1.33 | 10.32 ±2.18 | 27.14 ±24.52 | 8.41 ±1.37 | 32.85 ±51.80 | 64.29 ±1.27 | 65.73 ±1.76 | 62.85 ±1.84 | 57.85 ±0.08 | 45.16 ±0.32 | 50.64 ±0.10 | 77.76 ±0.10 | 48.68 ±0.08 | 19.40 ±0.03 | 68.00 ±0.04 | 19.00 ±0.03 |
| LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h | 2.5B | 0.736 | 0.61 | 4.3 | 19.79 ±10.56 | 8.09 ±1.48 | 8.49 ±1.27 | 10.62 ±6.56 | 42.54 ±56.08 | 10.49 ±20.17 | 38.53 ±19.64 | 62.18 ±1.31 | 63.43 ±1.81 | 60.94 ±1.89 | 56.97 ±0.08 | 43.23 ±0.28 | 51.00 ±0.10 | 76.68 ±0.10 | 46.32 ±0.08 | 19.80 ±0.03 | 71.60 ±0.04 | 21.00 ±0.04 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_80h | 2.1B | 0.648 | 0.30 | 7.1 | 23.31 ±13.20 | 17.17 ±16.96 | 8.08 ±1.22 | 12.86 ±33.42 | 43.96 ±20.96 | 8.85 ±1.46 | 48.95 ±65.81 | 58.28 ±1.53 | 59.82 ±2.17 | 56.74 ±2.14 | 56.12 ±0.08 | 46.45 ±0.31 | 51.00 ±0.10 | 70.92 ±0.11 | 43.44 ±0.07 | 12.60 ±0.03 | 68.20 ±0.04 | 18.40 ±0.03 |
| Qwen/Qwen2-Audio-7B-Instruct | 8.4B | 0.638 | 0.24 | 7.9 | 33.56 ±3.68 | 18.92 ±2.45 | 27.90 ±2.52 | 26.52 ±15.79 | 39.09 ±3.37 | 37.35 ±4.38 | 51.58 ±13.46 | 50.15 ±1.54 | 51.06 ±2.30 | 49.24 ±2.04 | 50.99 ±0.09 | 34.84 ±0.42 | 43.28 ±0.10 | 74.84 ±0.10 | 55.32 ±0.07 | 16.20 ±0.03 | 67.00 ±0.04 | 15.00 ±0.03 |
| LINAGORA/Canary_Luciole-1B_Step-Adapter_2h | 2.1B | 0.637 | 0.25 | 8.1 | 25.54 ±30.62 | 14.15 ±33.16 | 8.53 ±1.34 | 15.26 ±169.56 | 54.37 ±40.13 | 9.05 ±1.30 | 51.90 ±47.03 | 58.55 ±1.50 | 59.84 ±2.16 | 57.27 ±2.07 | 55.41 ±0.08 | 45.16 ±0.33 | 49.88 ±0.10 | 71.20 ±0.11 | 44.76 ±0.07 | 10.20 ±0.03 | 66.20 ±0.04 | 17.80 ±0.03 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_8h | 2.1B | 0.629 | 0.24 | 7.1 | 21.49 ±10.72 | 10.00 ±1.66 | 8.22 ±1.25 | 10.17 ±1.95 | 38.22 ±27.56 | 8.55 ±1.32 | 53.79 ±57.27 | 59.04 ±1.50 | 60.18 ±2.16 | 57.90 ±2.06 | 55.56 ±0.08 | 45.81 ±0.35 | 49.68 ±0.10 | 71.20 ±0.11 | 42.64 ±0.07 | 12.60 ±0.03 | 69.00 ±0.04 | 15.40 ±0.03 |
| microsoft/Phi-4-multimodal-instruct | 5.6B | 0.476 | -0.26 | 9.4 | 43.81 ±16.23 | 26.18 ±5.77 | 21.29 ±3.57 | 55.24 ±41.81 | 63.68 ±72.63 | 27.56 ±7.56 | 68.93 ±47.91 | 48.56 ±1.85 | 57.03 ±2.36 | 40.09 ±2.66 | 52.16 ±0.09 | 35.48 ±0.32 | 41.76 ±0.09 | 79.24 ±0.10 | 52.68 ±0.08 | 12.20 ±0.03 | 31.00 ±0.04 | 5.00 ±0.02 |
| nvidia/audio-flamingo-3-hf | 8.2B | 0.468 | -0.23 | 9.6 | 73.74 ±9.09 | 48.31 ±7.41 | 57.50 ±6.22 | 76.10 ±46.43 | 73.64 ±9.60 | 94.59 ±6.21 | 92.27 ±23.11 | 33.01 ±1.65 | 43.68 ±2.03 | 22.34 ±2.25 | 44.63 ±0.07 | 37.42 ±0.32 | 38.12 ±0.09 | 58.36 ±0.10 | 54.72 ±0.09 | 53.00 ±0.04 | 48.40 ±0.04 | 3.20 ±0.02 |
| LINAGORA/Canary-Qwen3-4B_data-v1_8h | 4.8B | 0.466 | -0.24 | 9.4 | 21.70 ±13.80 | 10.65 ±1.84 | 10.15 ±1.40 | 22.07 ±55.31 | 42.83 ±55.71 | 13.37 ±24.17 | 31.16 ±8.81 | 46.40 ±1.39 | 47.57 ±1.93 | 45.23 ±2.00 | 59.84 ±0.08 | 48.39 ±0.33 | 52.56 ±0.09 | 78.56 ±0.10 | 43.60 ±0.07 | 6.80 ±0.02 | 29.20 ±0.04 | 3.60 ±0.02 |
| mistralai/Voxtral-Mini-3B-2507 | 4.68B | 0.352 | -0.66 | 10.1 | 127.95 ±36.03 | 71.61 ±29.42 | 75.16 ±3.59 | 185.12 ±176.73 | 201.50 ±52.98 | 103.83 ±38.14 | 130.46 ±98.40 | 27.73 ±1.02 | 29.81 ±1.36 | 25.65 ±1.50 | 61.04 ±0.07 | 48.39 ±0.41 | 57.08 ±0.10 | 77.64 ±0.08 | 56.48 ±0.07 | 9.20 ±0.03 | 28.20 ±0.04 | 0.40 ±0.01 |
| LINAGORA/Canary-Qwen3-1.7B_data-v1_8h | 2.5B | 0.334 | -0.71 | 11.4 | 20.80 ±1.78 | 13.01 ±1.94 | 11.92 ±1.34 | 18.06 ±8.22 | 32.69 ±4.36 | 17.08 ±1.58 | 32.04 ±3.48 | 37.83 ±1.44 | 41.47 ±1.94 | 34.19 ±2.08 | 54.45 ±0.08 | 42.58 ±0.31 | 48.20 ±0.10 | 72.56 ±0.11 | 38.56 ±0.07 | 4.80 ±0.02 | 27.40 ±0.04 | 1.80 ±0.01 |
| Qwen/Qwen2.5-Omni-7B | 11B | 0.289 | -0.87 | 12.6 | 44.41 ±9.42 | 25.86 ±17.61 | 12.16 ±2.18 | 36.79 ±35.85 | 82.09 ±20.51 | 31.48 ±7.55 | 78.10 ±31.14 | 50.74 ±1.52 | 53.66 ±2.01 | 47.82 ±2.26 | 53.78 ±0.07 | 42.58 ±0.38 | 51.08 ±0.10 | 67.68 ±0.10 | 38.28 ±0.09 | 1.60 ±0.01 | 6.40 ±0.02 | 0.40 ±0.01 |
| Qwen/Qwen2.5-Omni-3B | 5.9B | 0.221 | -1.12 | 13.0 | 82.32 ±12.68 | 97.54 ±33.55 | 28.89 ±7.09 | 93.08 ±41.98 | 128.57 ±25.32 | 47.17 ±14.03 | 98.67 ±41.83 | 49.03 ±1.52 | 51.84 ±2.00 | 46.21 ±2.27 | 51.26 ±0.08 | 39.35 ±0.39 | 48.84 ±0.10 | 65.60 ±0.11 | 37.08 ±0.08 | 2.60 ±0.01 | 1.60 ±0.01 | 4.80 ±0.02 |
Languages · English
Tables
English — Models × Tasks
| Model | Size | Min-Max | Z-Score | Avg Rank | ASR (WER %) | CommonVoice | Fleurs | VoxPopuli | QUESTION ANSWERING (FLOW_JUDGE) | NationalSpeechCorpus_SQA | OpenHermes_audio | SLUE-P2-SQA5 | SpokenWOZ_AIR-Bench | alpaca_audio | fisher_AIR-Bench | public-sg-speech | MUSIC QUESTION ANSWERING (FLOW_JUDGE) | MTJ-Jamendo_AIR-Bench | MusicCaps-QA | EMOTION RECOGNITION (FLOW_JUDGE) | IEMOCAP-Emotion | MELD_Emotion | GENDER RECOGNITION (FLOW_JUDGE) | CommonVoice_Gender | IEMOCAP-gender | AGE RECOGNITION (FLOW_JUDGE) | AUDIO QUESTION ANSWERING (FLOW_JUDGE) | AudioCaps-QA | Clotho-AQA | WavCaps-QA | DIALOGUE SUMMARIZATION (FLOW_JUDGE) | MATH QUESTION ANSWERING (ACC %) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| nvidia/audio-flamingo-3-hf | 8.2B | 0.779 | 1.14 | 5.3 | 12.58 ±8.94 | 13.22 ±2.87 | 11.98 ±1.63 | 12.53 ±26.61 | 60.84 ±0.06 | 63.60 ±0.08 | 48.80 ±0.27 | 78.48 ±0.13 | 69.22 ±0.17 | 24.00 ±0.41 | 71.70 ±0.13 | 70.08 ±0.07 | 59.60 ±0.08 | 62.60 ±0.04 | 56.60 ±0.08 | 52.60 ±0.03 | 43.20 ±0.04 | 62.00 ±0.04 | 89.50 ±0.02 | 93.00 ±0.02 | 86.00 ±0.03 | 35.80 ±0.04 | 59.73 ±0.08 | 63.71 ±0.11 | 52.64 ±0.13 | 62.83 ±0.14 | 55.53 ±0.09 | 64.00 ±9.41 |
| Qwen/Qwen2-Audio-7B-Instruct | 8.4B | 0.775 | 1.12 | 4.7 | 11.00 ±1.15 | 10.30 ±1.65 | 7.16 ±1.22 | 15.54 ±2.72 | 68.21 ±0.05 | 57.45 ±0.08 | 62.60 ±0.27 | 87.94 ±0.11 | 68.91 ±0.19 | 63.80 ±0.28 | 72.00 ±0.16 | 64.76 ±0.08 | 62.68 ±0.09 | 63.20 ±0.04 | 62.16 ±0.08 | 37.20 ±0.03 | 35.80 ±0.04 | 38.60 ±0.04 | 95.20 ±0.01 | 92.40 ±0.02 | 98.00 ±0.01 | 32.20 ±0.04 | 60.99 ±0.09 | 63.26 ±0.15 | 57.88 ±0.14 | 61.84 ±0.16 | 51.93 ±0.08 | 66.00 ±9.28 |
| LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h | 2.5B | 0.531 | 0.26 | 5.6 | 6.67 ±1.67 | 8.31 ±1.35 | 5.74 ±0.94 | 5.96 ±4.72 | 68.50 ±0.06 | 59.75 ±0.09 | 63.00 ±0.21 | 81.67 ±0.14 | 69.12 ±0.20 | 71.80 ±0.23 | 70.30 ±0.17 | 63.88 ±0.08 | 32.32 ±0.08 | 18.80 ±0.03 | 45.84 ±0.08 | 23.10 ±0.03 | 25.80 ±0.04 | 20.40 ±0.04 | 57.50 ±0.03 | 68.20 ±0.04 | 46.80 ±0.04 | 42.60 ±0.04 | 50.80 ±0.09 | 45.69 ±0.14 | 61.20 ±0.16 | 45.53 ±0.16 | 47.47 ±0.09 | 66.00 ±9.28 |
| microsoft/Phi-4-multimodal-instruct | 5.6B | 0.488 | 0.22 | 5.7 | 10.29 ±1.80 | 12.42 ±4.75 | 6.36 ±1.06 | 12.09 ±2.27 | 70.93 ±0.05 | 63.75 ±0.09 | 61.80 ±0.27 | 91.76 ±0.08 | 74.72 ±0.17 | 53.40 ±0.29 | 76.90 ±0.15 | 74.16 ±0.08 | 41.06 ±0.08 | 31.80 ±0.04 | 50.32 ±0.09 | 21.80 ±0.03 | 22.20 ±0.04 | 21.40 ±0.04 | 57.30 ±0.03 | 58.40 ±0.04 | 56.20 ±0.04 | 3.00 ±0.01 | 59.02 ±0.09 | 56.10 ±0.15 | 63.72 ±0.15 | 57.24 ±0.17 | 56.40 ±0.09 | 23.00 ±8.25 |
| mistralai/Voxtral-Mini-3B-2507 | 4.68B | 0.471 | 0.15 | 6.4 | 81.49 ±20.97 | 54.35 ±37.70 | 73.73 ±14.57 | 116.40 ±47.54 | 79.34 ±0.04 | 74.00 ±0.07 | 77.40 ±0.20 | 88.28 ±0.09 | 77.20 ±0.15 | 78.20 ±0.19 | 81.10 ±0.11 | 79.20 ±0.06 | 42.66 ±0.09 | 28.00 ±0.04 | 57.32 ±0.08 | 18.40 ±0.02 | 27.20 ±0.04 | 9.60 ±0.03 | 13.30 ±0.02 | 26.20 ±0.04 | 0.40 ±0.01 | 4.40 ±0.02 | 58.69 ±0.07 | 60.13 ±0.12 | 58.72 ±0.11 | 57.24 ±0.14 | 66.67 ±0.08 | 69.00 ±9.06 |
| Qwen/Qwen2.5-Omni-7B | 11B | 0.454 | 0.07 | 7.6 | 41.77 ±10.57 | 75.07 ±20.40 | 20.60 ±3.87 | 29.65 ±23.53 | 72.47 ±0.05 | 63.05 ±0.09 | 68.40 ±0.19 | 90.10 ±0.09 | 71.30 ±0.19 | 69.40 ±0.22 | 73.70 ±0.16 | 71.32 ±0.08 | 38.16 ±0.06 | 41.00 ±0.04 | 35.32 ±0.08 | 14.20 ±0.02 | 22.60 ±0.04 | 5.80 ±0.02 | 22.10 ±0.03 | 8.80 ±0.02 | 35.40 ±0.04 | 0.00 ±0.00 | 59.19 ±0.09 | 58.08 ±0.16 | 65.92 ±0.14 | 53.55 ±0.17 | 55.33 ±0.08 | 89.00 ±6.13 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h | 2.1B | 0.451 | -0.02 | 7.2 | 6.78 ±0.73 | 8.37 ±1.48 | 6.09 ±0.90 | 5.87 ±1.31 | 60.28 ±0.06 | 54.55 ±0.09 | 53.60 ±0.25 | 66.96 ±0.17 | 59.48 ±0.20 | 62.40 ±0.25 | 64.70 ±0.17 | 60.28 ±0.09 | 35.18 ±0.08 | 23.40 ±0.04 | 46.96 ±0.09 | 17.70 ±0.02 | 16.20 ±0.03 | 19.20 ±0.03 | 52.10 ±0.03 | 65.40 ±0.04 | 38.80 ±0.04 | 41.20 ±0.04 | 52.27 ±0.09 | 47.60 ±0.14 | 61.52 ±0.15 | 47.70 ±0.16 | 44.87 ±0.09 | 49.00 ±9.80 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h | 2.1B | 0.446 | -0.02 | 6.3 | 6.59 ±0.65 | 8.27 ±1.38 | 5.90 ±0.91 | 5.62 ±0.99 | 62.88 ±0.06 | 55.85 ±0.09 | 56.00 ±0.26 | 71.42 ±0.17 | 65.18 ±0.20 | 63.80 ±0.25 | 64.90 ±0.18 | 63.00 ±0.08 | 36.92 ±0.08 | 27.60 ±0.04 | 46.24 ±0.08 | 22.80 ±0.03 | 15.60 ±0.03 | 30.00 ±0.04 | 57.30 ±0.03 | 65.60 ±0.04 | 49.00 ±0.04 | 40.80 ±0.04 | 50.90 ±0.09 | 44.41 ±0.14 | 61.84 ±0.16 | 46.45 ±0.16 | 41.93 ±0.10 | 34.00 ±9.28 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h | 2.1B | 0.365 | -0.28 | 9.6 | 7.26 ±0.73 | 8.67 ±1.46 | 6.75 ±0.94 | 6.36 ±1.33 | 61.51 ±0.06 | 54.00 ±0.09 | 54.40 ±0.24 | 71.37 ±0.17 | 64.77 ±0.19 | 57.80 ±0.26 | 67.00 ±0.17 | 61.20 ±0.08 | 32.16 ±0.08 | 20.60 ±0.04 | 43.72 ±0.09 | 17.60 ±0.02 | 14.40 ±0.03 | 20.80 ±0.04 | 47.60 ±0.03 | 62.20 ±0.04 | 33.00 ±0.04 | 38.40 ±0.04 | 50.46 ±0.10 | 44.66 ±0.15 | 60.32 ±0.16 | 46.38 ±0.16 | 44.60 ±0.09 | 12.00 ±6.37 |
| LINAGORA/Canary-Qwen3-4B_data-v1_8h | 4.8B | 0.328 | -0.31 | 9.7 | 9.43 ±1.71 | 10.89 ±1.54 | 7.49 ±0.98 | 9.92 ±4.78 | 69.28 ±0.05 | 59.70 ±0.09 | 67.40 ±0.25 | 86.13 ±0.12 | 65.18 ±0.20 | 70.40 ±0.25 | 69.80 ±0.17 | 66.36 ±0.08 | 33.36 ±0.07 | 24.80 ±0.04 | 41.92 ±0.08 | 13.00 ±0.02 | 18.80 ±0.03 | 7.20 ±0.02 | 33.50 ±0.03 | 17.60 ±0.03 | 49.40 ±0.04 | 1.40 ±0.01 | 43.41 ±0.09 | 39.68 ±0.14 | 49.56 ±0.15 | 40.99 ±0.15 | 49.40 ±0.09 | 66.00 ±9.28 |
| Qwen/Qwen2.5-Omni-3B | 5.9B | 0.327 | -0.34 | 9.6 | 68.47 ±13.86 | 120.36 ±27.27 | 29.73 ±7.38 | 55.32 ±29.51 | 70.84 ±0.05 | 64.95 ±0.09 | 64.80 ±0.18 | 88.97 ±0.10 | 67.56 ±0.20 | 68.60 ±0.23 | 71.10 ±0.17 | 69.92 ±0.08 | 31.04 ±0.07 | 21.00 ±0.04 | 41.08 ±0.08 | 15.20 ±0.02 | 24.80 ±0.04 | 5.60 ±0.02 | 11.10 ±0.02 | 9.20 ±0.03 | 13.00 ±0.03 | 1.00 ±0.01 | 52.93 ±0.08 | 53.42 ±0.15 | 58.28 ±0.13 | 47.11 ±0.16 | 55.20 ±0.09 | 85.00 ±7.00 |
| LINAGORA/Canary_Luciole-1B_Step-Adapter_2h | 2.1B | 0.320 | -0.43 | 10.1 | 10.30 ±1.52 | 11.58 ±3.99 | 11.18 ±1.55 | 8.15 ±1.52 | 56.32 ±0.06 | 49.30 ±0.09 | 62.20 ±0.23 | 60.98 ±0.18 | 48.39 ±0.19 | 59.60 ±0.27 | 53.30 ±0.17 | 60.48 ±0.08 | 36.76 ±0.07 | 29.60 ±0.04 | 43.92 ±0.08 | 13.90 ±0.02 | 14.40 ±0.03 | 13.40 ±0.03 | 54.20 ±0.03 | 61.40 ±0.04 | 47.00 ±0.04 | 37.80 ±0.04 | 45.98 ±0.09 | 40.77 ±0.14 | 53.68 ±0.15 | 43.49 ±0.15 | 41.27 ±0.08 | 21.00 ±7.98 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_8h | 2.1B | 0.316 | -0.44 | 10.2 | 10.53 ±8.36 | 15.93 ±24.98 | 8.25 ±1.14 | 7.40 ±1.27 | 54.89 ±0.06 | 48.80 ±0.09 | 58.60 ±0.22 | 63.14 ±0.18 | 46.94 ±0.18 | 57.20 ±0.25 | 50.70 ±0.18 | 58.88 ±0.09 | 35.80 ±0.07 | 28.80 ±0.04 | 42.80 ±0.08 | 15.70 ±0.02 | 13.40 ±0.03 | 18.00 ±0.03 | 50.90 ±0.03 | 54.80 ±0.04 | 47.00 ±0.04 | 38.40 ±0.04 | 45.35 ±0.09 | 39.62 ±0.14 | 52.76 ±0.15 | 43.68 ±0.15 | 43.93 ±0.09 | 18.00 ±7.53 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_80h | 2.1B | 0.310 | -0.47 | 10.4 | 11.33 ±7.87 | 17.96 ±23.52 | 8.50 ±1.13 | 7.52 ±1.51 | 55.49 ±0.06 | 49.30 ±0.09 | 60.20 ±0.22 | 62.79 ±0.18 | 47.98 ±0.19 | 58.20 ±0.26 | 50.50 ±0.17 | 59.48 ±0.08 | 36.78 ±0.07 | 31.20 ±0.04 | 42.36 ±0.08 | 15.60 ±0.02 | 12.80 ±0.03 | 18.40 ±0.03 | 49.20 ±0.03 | 52.00 ±0.04 | 46.40 ±0.04 | 41.00 ±0.04 | 44.21 ±0.09 | 39.62 ±0.13 | 51.16 ±0.15 | 41.84 ±0.15 | 41.07 ±0.09 | 21.00 ±7.98 |
| LINAGORA/Canary-Qwen3-1.7B_data-v1_8h | 2.5B | 0.227 | -0.65 | 11.7 | 12.09 ±0.95 | 12.26 ±1.89 | 11.05 ±1.12 | 12.97 ±1.80 | 62.17 ±0.06 | 54.95 ±0.09 | 57.00 ±0.22 | 77.35 ±0.15 | 57.62 ±0.20 | 60.60 ±0.26 | 63.90 ±0.18 | 63.80 ±0.08 | 32.98 ±0.07 | 23.80 ±0.04 | 42.16 ±0.08 | 13.70 ±0.02 | 22.00 ±0.04 | 5.40 ±0.02 | 36.30 ±0.03 | 40.40 ±0.04 | 32.20 ±0.04 | 2.20 ±0.01 | 43.49 ±0.09 | 38.34 ±0.14 | 52.72 ±0.15 | 39.41 ±0.15 | 41.60 ±0.09 | 40.00 ±9.60 |
Languages · Others
Tables
Others — Models × Tasks
| Model | Size | Min-Max | Z-Score | Avg Rank | ASR (WER %) | Fleurs | Multilingual_TEDx | AST (METEOR %) | Multilingual_TEDx (ES→FR) | Multilingual_TEDx (ES→IT) | AUDIO CAPTIONING (FLOW_JUDGE) | AudioCaps | WavCaps | MUSIC CAPTIONING (FLOW_JUDGE) | SPOKEN LANGUAGE IDENTIFICATION (FLOW_JUDGE) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Qwen/Qwen2-Audio-7B-Instruct | 8.4B | 0.875 | 1.13 | 4.6 | 24.43 ±2.19 | 14.99 ±2.14 | 23.31 ±5.43 | 45.84 ±1.73 | 46.22 ±2.08 | 45.46 ±3.11 | 53.70 ±0.06 | 54.36 ±0.09 | 53.04 ±0.10 | 59.76 ±0.04 | 88.40 ±0.03 |
| Qwen/Qwen2.5-Omni-7B | 11B | 0.673 | 0.44 | 6.4 | 32.39 ±8.72 | 15.68 ±3.65 | 54.62 ±32.25 | 41.12 ±1.85 | 42.80 ±2.25 | 39.45 ±3.23 | 54.68 ±0.07 | 56.60 ±0.09 | 52.76 ±0.10 | 37.40 ±0.07 | 79.80 ±0.04 |
| nvidia/audio-flamingo-3-hf | 8.2B | 0.647 | 0.32 | 6.6 | 91.53 ±8.97 | 96.60 ±9.93 | 80.80 ±43.95 | 14.94 ±1.55 | 17.21 ±1.99 | 12.67 ±2.35 | 57.76 ±0.06 | 64.00 ±0.06 | 51.52 ±0.10 | 55.48 ±0.11 | 93.00 ±0.02 |
| LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h | 2.5B | 0.616 | 0.30 | 5.8 | 12.22 ±2.48 | 7.52 ±1.64 | 14.88 ±3.79 | 55.40 ±1.63 | 55.08 ±1.87 | 55.72 ±3.13 | 33.46 ±0.05 | 31.80 ±0.07 | 35.12 ±0.08 | 36.36 ±0.06 | 72.60 ±0.04 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h | 2.1B | 0.553 | 0.10 | 5.8 | 9.96 ±2.44 | 6.97 ±1.60 | 13.76 ±16.70 | 59.74 ±1.65 | 59.98 ±1.92 | 59.50 ±3.11 | 31.22 ±0.05 | 30.60 ±0.07 | 31.84 ±0.08 | 30.40 ±0.06 | 63.40 ±0.04 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h | 2.1B | 0.543 | 0.07 | 7.2 | 11.23 ±1.66 | 7.83 ±1.60 | 14.94 ±4.32 | 58.38 ±1.63 | 59.08 ±1.92 | 57.69 ±3.04 | 27.72 ±0.04 | 26.56 ±0.06 | 28.88 ±0.07 | 34.24 ±0.06 | 61.60 ±0.04 |
| microsoft/Phi-4-multimodal-instruct | 5.6B | 0.529 | -0.00 | 8.8 | 41.02 ±5.26 | 29.88 ±9.76 | 34.03 ±8.53 | 36.07 ±2.05 | 31.28 ±2.46 | 40.85 ±3.57 | 47.34 ±0.06 | 48.40 ±0.08 | 46.28 ±0.09 | 44.24 ±0.06 | 50.80 ±0.04 |
| LINAGORA/Canary_Luciole-1B_Step-Adapter_2h | 2.1B | 0.524 | 0.02 | 7.6 | 19.53 ±2.93 | 13.77 ±7.87 | 23.04 ±6.52 | 54.22 ±1.71 | 55.45 ±1.97 | 53.00 ±3.29 | 28.64 ±0.05 | 25.96 ±0.05 | 31.32 ±0.07 | 45.64 ±0.05 | 44.60 ±0.04 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_8h | 2.1B | 0.520 | 0.01 | 7.8 | 18.49 ±2.69 | 11.70 ±2.94 | 20.83 ±6.52 | 53.47 ±1.69 | 55.72 ±1.93 | 51.22 ±3.24 | 28.62 ±0.04 | 26.84 ±0.06 | 30.40 ±0.07 | 45.76 ±0.05 | 43.60 ±0.04 |
| LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h | 2.1B | 0.518 | -0.01 | 8.0 | 10.03 ±1.58 | 7.44 ±1.54 | 11.43 ±1.85 | 59.54 ±1.62 | 59.19 ±1.91 | 59.89 ±3.01 | 27.38 ±0.05 | 25.68 ±0.05 | 29.08 ±0.07 | 29.72 ±0.06 | 61.20 ±0.04 |
| LINAGORA/Canary_Luciole-1B_Step-Encoder_80h | 2.1B | 0.512 | -0.02 | 8.4 | 17.19 ±4.37 | 11.82 ±2.88 | 21.12 ±6.92 | 53.54 ±1.75 | 55.06 ±2.03 | 52.02 ±3.31 | 27.62 ±0.04 | 25.28 ±0.05 | 29.96 ±0.07 | 45.32 ±0.05 | 43.20 ±0.04 |
| Qwen/Qwen2.5-Omni-3B | 5.9B | 0.461 | -0.24 | 9.2 | 54.28 ±10.89 | 22.88 ±5.74 | 82.28 ±48.63 | 40.50 ±1.77 | 39.10 ±2.19 | 41.91 ±3.00 | 35.88 ±0.06 | 37.48 ±0.09 | 34.28 ±0.09 | 36.28 ±0.07 | 63.20 ±0.04 |
| mistralai/Voxtral-Mini-3B-2507 | 4.68B | 0.409 | -0.47 | 8.2 | 139.30 ±27.78 | 96.91 ±16.09 | 173.54 ±93.23 | 23.46 ±1.18 | 21.31 ±1.35 | 25.62 ±2.25 | 38.90 ±0.05 | 38.56 ±0.07 | 39.24 ±0.07 | 50.60 ±0.07 | 75.60 ±0.04 |
| LINAGORA/Canary-Qwen3-4B_data-v1_8h | 4.8B | 0.316 | -0.66 | 12.0 | 18.72 ±11.42 | 12.14 ±2.25 | 26.31 ±14.69 | 35.30 ±1.69 | 34.56 ±2.00 | 36.04 ±3.14 | 22.38 ±0.03 | 21.24 ±0.03 | 23.52 ±0.05 | 33.88 ±0.07 | 40.60 ±0.04 |
| LINAGORA/Canary-Qwen3-1.7B_data-v1_8h | 2.5B | 0.228 | -0.97 | 13.6 | 38.06 ±3.23 | 58.79 ±3.43 | 101.99 ±11.44 | 28.99 ±1.64 | 27.86 ±1.93 | 30.12 ±3.08 | 21.74 ±0.03 | 21.04 ±0.03 | 22.44 ±0.05 | 28.28 ±0.06 | 42.80 ±0.04 |