Overview

Tables
1st2ndSecond to lastLast
Model Size Min-Max Z-Score Avg Rank ASR (WER %) FR EN DE ES IT PT NL AST (METEOR %) ES-FR · Multilingual_TEDx ES-IT · Multilingual_TEDx FR-EN · Multilingual_TEDx FR-ES · Multilingual_TEDx Question Answering (FLOW_JUDGE) FR EN Others Emotion Recognition (FLOW_JUDGE) Gender Recognition (FLOW_JUDGE) Age Recognition (FLOW_JUDGE) Dialogue Summarization (FLOW_JUDGE) Spoken Language Identification (FLOW_JUDGE) Music Music Question Answering (FLOW_JUDGE) Music Captioning (FLOW_JUDGE) Sound Audio Captioning (FLOW_JUDGE) Audio Question Answering (FLOW_JUDGE)
Qwen/Qwen2-Audio-7B-Instruct 8.4B 0.796 1.03 4.8 25.23 ±1.67 33.56 ±3.68 11.00 ±1.15 32.65 ±4.70 16.57 ±2.64 21.20 ±7.11 19.15 ±2.94 40.68 ±3.75 48.00 ±1.15 46.22 ±2.08 45.46 ±3.11 51.06 ±2.30 49.24 ±2.04 63.04 ±0.05 50.99 ±0.09 68.21 ±0.05 55.99 30.20 ±0.02 85.80 ±0.02 23.60 ±0.03 51.93 ±0.08 88.40 ±0.03 59.99 60.23 ±0.07 59.76 ±0.04 57.35 53.70 ±0.06 60.99 ±0.09
LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h 2.5B 0.599 0.37 5.3 13.82 ±3.79 19.79 ±10.56 6.67 ±1.67 17.84 ±1.86 8.67 ±7.83 9.50 ±6.89 11.20 ±2.08 15.57 ±2.12 58.79 ±1.04 55.08 ±1.87 55.72 ±3.13 63.43 ±1.81 60.94 ±1.89 65.04 ±0.05 56.97 ±0.08 68.50 ±0.06 47.21 22.00 ±0.02 62.20 ±0.02 31.80 ±0.03 47.47 ±0.09 72.60 ±0.04 36.67 36.99 ±0.06 36.36 ±0.06 42.13 33.46 ±0.05 50.80 ±0.09
LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h 2.1B 0.570 0.26 5.5 11.12 ±3.42 15.12 ±9.45 6.59 ±0.65 15.99 ±1.81 5.95 ±0.95 7.64 ±6.18 10.37 ±8.40 9.77 ±1.51 62.23 ±1.02 59.98 ±1.92 59.50 ±3.11 66.55 ±1.73 62.90 ±1.85 62.56 ±0.05 61.82 ±0.08 62.88 ±0.06 44.52 21.80 ±0.02 63.07 ±0.02 32.40 ±0.03 41.93 ±0.10 63.40 ±0.04 35.65 40.89 ±0.06 30.40 ±0.06 41.06 31.22 ±0.05 50.90 ±0.09
nvidia/audio-flamingo-3-hf 8.2B 0.566 0.23 7.7 72.44 ±5.63 73.74 ±9.09 12.58 ±8.94 112.01 ±22.08 67.31 ±10.68 106.41 ±20.15 88.70 ±22.54 74.85 ±6.63 23.97 ±1.22 17.21 ±1.99 12.67 ±2.35 43.68 ±2.03 22.34 ±2.25 55.98 ±0.05 44.63 ±0.07 60.84 ±0.06 59.31 52.73 ±0.03 75.80 ±0.02 19.50 ±0.02 55.53 ±0.09 93.00 ±0.02 56.73 57.97 ±0.07 55.48 ±0.11 58.74 57.76 ±0.06 59.73 ±0.08
microsoft/Phi-4-multimodal-instruct 5.6B 0.555 0.20 7.3 36.83 ±6.09 43.81 ±16.23 10.29 ±1.80 41.56 ±6.57 34.85 ±16.84 24.70 ±9.17 31.95 ±6.49 103.10 ±22.76 42.31 ±1.41 31.28 ±2.46 40.85 ±3.57 57.03 ±2.36 40.09 ±2.66 65.30 ±0.05 52.16 ±0.09 70.93 ±0.05 35.67 18.60 ±0.02 48.53 ±0.03 4.00 ±0.01 56.40 ±0.09 50.80 ±0.04 44.59 44.93 ±0.07 44.24 ±0.06 53.18 47.34 ±0.06 59.02 ±0.09
Qwen/Qwen2.5-Omni-7B 11B 0.551 0.17 7.7 37.96 ±5.64 44.41 ±9.42 41.77 ±10.57 48.09 ±21.47 28.24 ±23.62 18.58 ±12.24 35.15 ±16.46 31.38 ±4.68 45.93 ±1.20 42.80 ±2.25 39.45 ±3.23 53.66 ±2.01 47.82 ±2.26 66.86 ±0.04 53.78 ±0.07 72.47 ±0.05 32.44 10.00 ±0.02 16.87 ±0.02 0.20 ±0.00 55.33 ±0.08 79.80 ±0.04 37.80 38.20 ±0.05 37.40 ±0.07 56.93 54.68 ±0.07 59.19 ±0.09
LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h 2.1B 0.522 0.09 7.2 12.99 ±3.93 20.53 ±11.37 6.78 ±0.73 16.50 ±1.79 6.38 ±1.08 7.87 ±6.41 9.43 ±1.21 9.93 ±1.48 61.91 ±1.01 59.19 ±1.91 59.89 ±3.01 65.46 ±1.78 63.09 ±1.82 59.78 ±0.05 58.60 ±0.08 60.28 ±0.06 43.19 16.87 ±0.02 58.60 ±0.02 34.40 ±0.03 44.87 ±0.09 61.20 ±0.04 34.81 39.91 ±0.07 29.72 ±0.06 39.83 27.38 ±0.05 52.27 ±0.09
LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h 2.1B 0.520 0.09 6.7 12.08 ±3.36 15.75 ±9.61 7.26 ±0.73 17.60 ±2.00 6.90 ±1.11 8.73 ±6.45 11.38 ±2.31 11.85 ±1.40 61.34 ±1.02 59.08 ±1.92 57.69 ±3.04 65.73 ±1.76 62.85 ±1.84 60.41 ±0.05 57.85 ±0.08 61.51 ±0.06 41.50 18.20 ±0.02 54.40 ±0.03 28.70 ±0.03 44.60 ±0.09 61.60 ±0.04 35.95 37.67 ±0.06 34.24 ±0.06 39.09 27.72 ±0.04 50.46 ±0.10
LINAGORA/Canary_Luciole-1B_Step-Adapter_2h 2.1B 0.452 -0.14 8.2 20.00 ±10.47 25.54 ±30.62 10.30 ±1.52 28.03 ±6.27 12.91 ±3.50 15.27 ±7.40 18.41 ±5.11 26.52 ±15.14 56.39 ±1.13 55.45 ±1.97 53.00 ±3.29 59.84 ±2.16 57.27 ±2.07 56.05 ±0.05 55.41 ±0.08 56.32 ±0.06 36.91 12.67 ±0.02 58.20 ±0.02 27.80 ±0.03 41.27 ±0.08 44.60 ±0.04 42.53 39.43 ±0.06 45.64 ±0.05 37.31 28.64 ±0.05 45.98 ±0.09
LINAGORA/Canary_Luciole-1B_Step-Encoder_8h 2.1B 0.441 -0.18 8.7 18.16 ±4.12 21.49 ±10.72 10.53 ±8.36 30.11 ±7.71 12.82 ±1.69 14.22 ±7.24 16.26 ±3.59 19.54 ±8.60 56.26 ±1.13 55.72 ±1.93 51.22 ±3.24 60.18 ±2.16 57.90 ±2.06 55.09 ±0.05 55.56 ±0.08 54.89 ±0.06 37.21 14.67 ±0.02 56.93 ±0.03 26.90 ±0.03 43.93 ±0.09 43.60 ±0.04 41.92 38.08 ±0.06 45.76 ±0.05 36.99 28.62 ±0.04 45.35 ±0.09
LINAGORA/Canary_Luciole-1B_Step-Encoder_80h 2.1B 0.438 -0.19 9.3 18.25 ±5.14 23.31 ±13.20 11.33 ±7.87 23.71 ±3.08 14.84 ±16.91 14.35 ±7.27 16.47 ±3.76 15.93 ±2.04 55.91 ±1.16 55.06 ±2.03 52.02 ±3.31 59.82 ±2.17 56.74 ±2.14 55.68 ±0.05 56.12 ±0.08 55.49 ±0.06 36.82 14.60 ±0.02 55.53 ±0.03 29.70 ±0.03 41.07 ±0.09 43.20 ±0.04 42.16 39.00 ±0.06 45.32 ±0.05 35.91 27.62 ±0.04 44.21 ±0.09
mistralai/Voxtral-Mini-3B-2507 4.68B 0.427 -0.22 8.2 125.88 ±18.69 127.95 ±36.03 81.49 ±20.97 163.93 ±69.63 125.22 ±69.41 135.14 ±50.01 135.23 ±47.91 134.67 ±40.94 25.60 ±0.78 21.31 ±1.35 25.62 ±2.25 29.81 ±1.36 25.65 ±1.50 73.85 ±0.04 61.04 ±0.07 79.34 ±0.04 35.65 15.33 ±0.02 18.27 ±0.02 2.40 ±0.01 66.67 ±0.08 75.60 ±0.04 48.93 47.27 ±0.07 50.60 ±0.07 48.80 38.90 ±0.05 58.69 ±0.07
Qwen/Qwen2.5-Omni-3B 5.9B 0.369 -0.44 10.3 65.99 ±7.27 82.32 ±12.68 68.47 ±13.86 71.28 ±25.51 55.52 ±24.49 44.98 ±19.24 52.58 ±24.86 39.77 ±7.22 44.77 ±1.17 39.10 ±2.19 41.91 ±3.00 51.84 ±2.00 46.21 ±2.27 64.97 ±0.04 51.26 ±0.08 70.84 ±0.05 28.05 11.00 ±0.02 7.93 ±0.01 2.90 ±0.01 55.20 ±0.09 63.20 ±0.04 34.67 33.05 ±0.06 36.28 ±0.07 44.41 35.88 ±0.06 52.93 ±0.08
LINAGORA/Canary-Qwen3-4B_data-v1_8h 4.8B 0.360 -0.41 10.2 18.17 ±7.32 21.70 ±13.80 9.43 ±1.71 28.29 ±36.38 12.58 ±32.37 12.59 ±7.25 19.23 ±7.46 23.10 ±2.25 40.85 ±1.11 34.56 ±2.00 36.04 ±3.14 47.57 ±1.93 45.23 ±2.00 66.45 ±0.05 59.84 ±0.08 69.28 ±0.05 27.10 10.93 ±0.02 32.07 ±0.02 2.50 ±0.01 49.40 ±0.09 40.60 ±0.04 35.33 36.77 ±0.06 33.88 ±0.07 32.89 22.38 ±0.03 43.41 ±0.09
LINAGORA/Canary-Qwen3-1.7B_data-v1_8h 2.5B 0.226 -0.88 13.0 27.98 ±1.73 20.80 ±1.78 12.09 ±0.95 33.12 ±7.10 13.93 ±6.25 18.95 ±7.88 80.39 ±6.12 49.77 ±3.54 33.41 ±1.11 27.86 ±1.93 30.12 ±3.08 41.47 ±1.94 34.19 ±2.08 59.86 ±0.05 54.45 ±0.08 62.17 ±0.06 26.09 10.73 ±0.02 33.33 ±0.02 2.00 ±0.01 41.60 ±0.09 42.80 ±0.04 31.56 34.84 ±0.06 28.28 ±0.06 32.61 21.74 ±0.03 43.49 ±0.09

Overview (FR/EN — ASR, AST, QA)

Tables
1st2ndSecond to lastLast
Model Size Min-Max Z-Score Avg Rank ASR (WER %) FR EN AST (METEOR %) ES-FR · Multilingual_TEDx FR-EN · Multilingual_TEDx FR-ES · Multilingual_TEDx Question Answering (FLOW_JUDGE) FR EN
LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h 2.5B 0.803 0.70 4.0 15.42 ±7.07 19.79 ±10.56 6.67 ±1.67 59.82 ±1.09 55.08 ±1.87 63.43 ±1.81 60.94 ±1.89 65.04 ±0.05 56.97 ±0.08 68.50 ±0.06
LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h 2.1B 0.799 0.67 3.3 12.28 ±6.31 15.12 ±9.45 6.59 ±0.65 63.14 ±1.07 59.98 ±1.92 66.55 ±1.73 62.90 ±1.85 62.56 ±0.05 61.82 ±0.08 62.88 ±0.06
LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h 2.1B 0.754 0.50 4.7 12.92 ±6.41 15.75 ±9.61 7.26 ±0.73 62.55 ±1.07 59.08 ±1.92 65.73 ±1.76 62.85 ±1.84 60.41 ±0.05 57.85 ±0.08 61.51 ±0.06
LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h 2.1B 0.733 0.43 5.7 15.95 ±7.59 20.53 ±11.37 6.78 ±0.73 62.58 ±1.07 59.19 ±1.91 65.46 ±1.78 63.09 ±1.82 59.78 ±0.05 58.60 ±0.08 60.28 ±0.06
LINAGORA/Canary-Qwen3-4B_data-v1_8h 4.8B 0.667 0.29 6.7 17.61 ±9.23 21.70 ±13.80 9.43 ±1.71 42.45 ±1.18 34.56 ±2.00 47.57 ±1.93 45.23 ±2.00 66.45 ±0.05 59.84 ±0.08 69.28 ±0.05
Qwen/Qwen2.5-Omni-7B 11B 0.638 0.16 7.7 43.53 ±7.21 44.41 ±9.42 41.77 ±10.57 48.09 ±1.28 42.80 ±2.25 53.66 ±2.01 47.82 ±2.26 66.86 ±0.04 53.78 ±0.07 72.47 ±0.05
Qwen/Qwen2-Audio-7B-Instruct 8.4B 0.635 0.14 8.3 26.04 ±2.52 33.56 ±3.68 11.00 ±1.15 48.84 ±1.24 46.22 ±2.08 51.06 ±2.30 49.24 ±2.04 63.04 ±0.05 50.99 ±0.09 68.21 ±0.05
LINAGORA/Canary_Luciole-1B_Step-Adapter_2h 2.1B 0.606 -0.01 9.0 20.46 ±20.43 25.54 ±30.62 10.30 ±1.52 57.52 ±1.20 55.45 ±1.97 59.84 ±2.16 57.27 ±2.07 56.05 ±0.05 55.41 ±0.08 56.32 ±0.06
LINAGORA/Canary_Luciole-1B_Step-Encoder_8h 2.1B 0.602 -0.03 8.7 17.84 ±7.68 21.49 ±10.72 10.53 ±8.36 57.93 ±1.19 55.72 ±1.93 60.18 ±2.16 57.90 ±2.06 55.09 ±0.05 55.56 ±0.08 54.89 ±0.06
LINAGORA/Canary_Luciole-1B_Step-Encoder_80h 2.1B 0.601 -0.03 9.7 19.32 ±9.19 23.31 ±13.20 11.33 ±7.87 57.20 ±1.23 55.06 ±2.03 59.82 ±2.17 56.74 ±2.14 55.68 ±0.05 56.12 ±0.08 55.49 ±0.06
microsoft/Phi-4-multimodal-instruct 5.6B 0.600 0.04 8.7 32.64 ±10.86 43.81 ±16.23 10.29 ±1.80 42.80 ±1.54 31.28 ±2.46 57.03 ±2.36 40.09 ±2.66 65.30 ±0.05 52.16 ±0.09 70.93 ±0.05
LINAGORA/Canary-Qwen3-1.7B_data-v1_8h 2.5B 0.478 -0.36 10.0 17.90 ±1.24 20.80 ±1.78 12.09 ±0.95 34.50 ±1.18 27.86 ±1.93 41.47 ±1.94 34.19 ±2.08 59.86 ±0.05 54.45 ±0.08 62.17 ±0.06
Qwen/Qwen2.5-Omni-3B 5.9B 0.470 -0.44 10.0 77.70 ±9.65 82.32 ±12.68 68.47 ±13.86 45.72 ±1.27 39.10 ±2.19 51.84 ±2.00 46.21 ±2.27 64.97 ±0.04 51.26 ±0.08 70.84 ±0.05
mistralai/Voxtral-Mini-3B-2507 4.68B 0.333 -0.83 10.3 112.46 ±25.06 127.95 ±36.03 81.49 ±20.97 25.59 ±0.83 21.31 ±1.35 29.81 ±1.36 25.65 ±1.50 73.85 ±0.04 61.04 ±0.07 79.34 ±0.04
nvidia/audio-flamingo-3-hf 8.2B 0.231 -1.23 13.3 53.35 ±6.83 73.74 ±9.09 12.58 ±8.94 27.74 ±1.34 17.21 ±1.99 43.68 ±2.03 22.34 ±2.25 55.98 ±0.05 44.63 ±0.07 60.84 ±0.06

Tasks · ASR

Tables
Tasks · ASR — WER (%)
Model Size Min-Max Z-Score Avg Rank Average FR CommonVoice Fleurs Multilingual_TEDx SUMM-RE VoxPopuli YouTubeFr EN CommonVoice Fleurs VoxPopuli DE Fleurs Multilingual_TEDx ES Fleurs Multilingual_TEDx IT Fleurs Multilingual_TEDx PT Fleurs Multilingual_TEDx NL - Fleurs
LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h 2.1B 0.999 0.72 1.1 10.20 15.12 ±9.45 6.98 ±1.42 6.86 ±1.23 9.25 ±3.73 28.70 ±55.10 8.16 ±1.15 30.76 ±11.95 6.59 ±0.65 8.27 ±1.38 5.90 ±0.91 5.62 ±0.99 15.99 ±1.81 7.39 ±1.20 24.58 ±3.12 5.95 ±0.95 4.28 ±0.86 7.61 ±1.66 7.64 ±6.18 5.76 ±0.93 9.52 ±12.29 10.37 ±8.40 6.97 ±1.60 13.76 ±16.70 9.77 ±1.51
LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h 2.1B 0.991 0.70 2.3 11.06 20.53 ±11.37 7.00 ±1.35 7.46 ±1.27 9.94 ±2.07 51.94 ±64.04 7.97 ±1.13 38.86 ±22.25 6.78 ±0.73 8.37 ±1.48 6.09 ±0.90 5.87 ±1.31 16.50 ±1.79 7.96 ±1.26 25.03 ±3.08 6.38 ±1.08 4.64 ±0.85 8.12 ±1.95 7.87 ±6.41 6.03 ±0.93 9.72 ±12.75 9.43 ±1.21 7.44 ±1.54 11.43 ±1.85 9.93 ±1.48
LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h 2.1B 0.989 0.69 3.1 11.35 15.75 ±9.61 7.71 ±1.56 8.10 ±1.33 10.32 ±2.18 27.14 ±24.52 8.41 ±1.37 32.85 ±51.80 7.26 ±0.73 8.67 ±1.46 6.75 ±0.94 6.36 ±1.33 17.60 ±2.00 9.16 ±1.24 26.04 ±3.55 6.90 ±1.11 5.11 ±0.93 8.69 ±1.98 8.73 ±6.45 6.73 ±0.96 10.73 ±12.82 11.38 ±2.31 7.83 ±1.60 14.94 ±4.32 11.85 ±1.40
LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h 2.5B 0.978 0.65 3.4 12.75 19.79 ±10.56 8.09 ±1.48 8.49 ±1.27 10.62 ±6.56 42.54 ±56.08 10.49 ±20.17 38.53 ±19.64 6.67 ±1.67 8.31 ±1.35 5.74 ±0.94 5.96 ±4.72 17.84 ±1.86 8.78 ±1.28 26.91 ±3.20 8.67 ±7.83 4.37 ±0.84 12.96 ±15.61 9.50 ±6.89 6.76 ±0.94 12.24 ±13.70 11.20 ±2.08 7.52 ±1.64 14.88 ±3.79 15.57 ±2.12
LINAGORA/Canary_Luciole-1B_Step-Encoder_80h 2.1B 0.940 0.51 7.1 17.13 23.31 ±13.20 17.17 ±16.96 8.08 ±1.22 12.86 ±33.42 43.96 ±20.96 8.85 ±1.46 48.95 ±65.81 11.33 ±7.87 17.96 ±23.52 8.50 ±1.13 7.52 ±1.51 23.71 ±3.08 15.05 ±2.19 32.36 ±5.60 14.84 ±16.91 10.81 ±1.91 18.87 ±33.72 14.35 ±7.27 11.42 ±1.77 17.29 ±14.39 16.47 ±3.76 11.82 ±2.88 21.12 ±6.92 15.93 ±2.04
LINAGORA/Canary_Luciole-1B_Step-Encoder_8h 2.1B 0.936 0.50 6.4 17.85 21.49 ±10.72 10.00 ±1.66 8.22 ±1.25 10.17 ±1.95 38.22 ±27.56 8.55 ±1.32 53.79 ±57.27 10.53 ±8.36 15.93 ±24.98 8.25 ±1.14 7.40 ±1.27 30.11 ±7.71 15.29 ±2.35 44.94 ±15.11 12.82 ±1.69 10.50 ±1.87 15.15 ±2.77 14.22 ±7.24 11.15 ±1.74 17.29 ±14.32 16.26 ±3.59 11.70 ±2.94 20.83 ±6.52 19.54 ±8.60
LINAGORA/Canary-Qwen3-4B_data-v1_8h 4.8B 0.935 0.49 6.4 18.13 21.70 ±13.80 10.65 ±1.84 10.15 ±1.40 22.07 ±55.31 42.83 ±55.71 13.37 ±24.17 31.16 ±8.81 9.43 ±1.71 10.89 ±1.54 7.49 ±0.98 9.92 ±4.78 28.29 ±36.38 11.23 ±1.05 45.35 ±72.58 12.58 ±32.37 5.66 ±0.85 19.50 ±64.61 12.59 ±7.25 9.10 ±1.03 16.08 ±14.39 19.23 ±7.46 12.14 ±2.25 26.31 ±14.69 23.10 ±2.25
LINAGORA/Canary_Luciole-1B_Step-Adapter_2h 2.1B 0.922 0.45 7.4 19.57 25.54 ±30.62 14.15 ±33.16 8.53 ±1.34 15.26 ±169.56 54.37 ±40.13 9.05 ±1.30 51.90 ±47.03 10.30 ±1.52 11.58 ±3.99 11.18 ±1.55 8.15 ±1.52 28.03 ±6.27 21.82 ±11.24 34.24 ±5.50 12.91 ±3.50 10.00 ±1.84 15.82 ±6.72 15.27 ±7.40 12.41 ±2.66 18.13 ±14.51 18.41 ±5.11 13.77 ±7.87 23.04 ±6.52 26.52 ±15.14
Qwen/Qwen2-Audio-7B-Instruct 8.4B 0.878 0.29 9.7 24.97 33.56 ±3.68 18.92 ±2.45 27.90 ±2.52 26.52 ±15.79 39.09 ±3.37 37.35 ±4.38 51.58 ±13.46 11.00 ±1.15 10.30 ±1.65 7.16 ±1.22 15.54 ±2.72 32.65 ±4.70 22.40 ±3.06 42.91 ±8.72 16.57 ±2.64 14.55 ±2.27 18.59 ±4.76 21.20 ±7.11 20.92 ±2.51 21.47 ±13.97 19.15 ±2.94 14.99 ±2.14 23.31 ±5.43 40.68 ±3.75
LINAGORA/Canary-Qwen3-1.7B_data-v1_8h 2.5B 0.817 0.08 9.9 32.72 20.80 ±1.78 13.01 ±1.94 11.92 ±1.34 18.06 ±8.22 32.69 ±4.36 17.08 ±1.58 32.04 ±3.48 12.09 ±0.95 12.26 ±1.89 11.05 ±1.12 12.97 ±1.80 33.12 ±7.10 16.66 ±1.57 49.57 ±13.82 13.93 ±6.25 8.33 ±0.99 19.52 ±12.34 18.95 ±7.88 12.38 ±1.20 25.53 ±15.59 80.39 ±6.12 58.79 ±3.43 101.99 ±11.44 49.77 ±3.54
Qwen/Qwen2.5-Omni-7B 11B 0.772 -0.07 11.0 35.37 44.41 ±9.42 25.86 ±17.61 12.16 ±2.18 36.79 ±35.85 82.09 ±20.51 31.48 ±7.55 78.10 ±31.14 41.77 ±10.57 75.07 ±20.40 20.60 ±3.87 29.65 ±23.53 48.09 ±21.47 18.30 ±3.04 77.88 ±42.03 28.24 ±23.62 13.29 ±2.63 43.19 ±46.66 18.58 ±12.24 13.79 ±2.83 23.36 ±24.20 35.15 ±16.46 15.68 ±3.65 54.62 ±32.25 31.38 ±4.68
microsoft/Phi-4-multimodal-instruct 5.6B 0.746 -0.18 10.9 41.47 43.81 ±16.23 26.18 ±5.77 21.29 ±3.57 55.24 ±41.81 63.68 ±72.63 27.56 ±7.56 68.93 ±47.91 10.29 ±1.80 12.42 ±4.75 6.36 ±1.06 12.09 ±2.27 41.56 ±6.57 26.78 ±5.05 56.33 ±11.83 34.85 ±16.84 15.44 ±3.23 54.25 ±33.36 24.70 ±9.17 18.83 ±4.24 30.58 ±17.70 31.95 ±6.49 29.88 ±9.76 34.03 ±8.53 103.10 ±22.76
Qwen/Qwen2.5-Omni-3B 5.9B 0.559 -0.83 12.7 59.27 82.32 ±12.68 97.54 ±33.55 28.89 ±7.09 93.08 ±41.98 128.57 ±25.32 47.17 ±14.03 98.67 ±41.83 68.47 ±13.86 120.36 ±27.27 29.73 ±7.38 55.32 ±29.51 71.28 ±25.51 28.98 ±5.83 113.59 ±49.14 55.52 ±24.49 26.69 ±5.24 84.34 ±47.31 44.98 ±19.24 25.84 ±4.56 64.12 ±37.65 52.58 ±24.86 22.88 ±5.74 82.28 ±48.63 39.77 ±7.22
nvidia/audio-flamingo-3-hf 8.2B 0.473 -1.17 13.4 76.51 73.74 ±9.09 48.31 ±7.41 57.50 ±6.22 76.10 ±46.43 73.64 ±9.60 94.59 ±6.21 92.27 ±23.11 12.58 ±8.94 13.22 ±2.87 11.98 ±1.63 12.53 ±26.61 112.01 ±22.08 97.16 ±10.60 126.86 ±42.67 67.31 ±10.68 64.86 ±6.38 69.77 ±20.29 106.41 ±20.15 120.32 ±10.00 92.50 ±39.02 88.70 ±22.54 96.60 ±9.93 80.80 ±43.95 74.85 ±6.63
mistralai/Voxtral-Mini-3B-2507 4.68B 0.000 -2.83 15.0 129.09 127.95 ±36.03 71.61 ±29.42 75.16 ±3.59 185.12 ±176.73 201.50 ±52.98 103.83 ±38.14 130.46 ±98.40 81.49 ±20.97 54.35 ±37.70 73.73 ±14.57 116.40 ±47.54 163.93 ±69.63 97.87 ±7.06 229.99 ±137.44 125.22 ±69.41 83.45 ±6.91 167.00 ±137.18 135.14 ±50.01 80.32 ±3.53 189.97 ±98.55 135.23 ±47.91 96.91 ±16.09 173.54 ±93.23 134.67 ±40.94

Tasks · AST

Tables
Tasks · AST — BLEU
Model Size Min-Max Z-Score Avg Rank Average FR-EN · Multilingual_TEDx (FR→EN) FR-ES · Multilingual_TEDx (FR→ES) ES-FR · Multilingual_TEDx (ES→FR) ES-IT · Multilingual_TEDx (ES→IT)
LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h 2.1B 0.999 1.22 1.2 41.78 43.63 ±2.10 42.92 ±2.15 43.27 ±2.12 37.30 ±3.39
LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h 2.1B 0.976 1.14 2.2 40.84 42.66 ±2.07 43.00 ±2.15 40.41 ±2.11 37.28 ±3.32
LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h 2.1B 0.940 1.02 2.8 39.63 42.83 ±2.00 42.12 ±2.10 40.93 ±2.07 32.65 ±3.21
LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h 2.5B 0.876 0.79 4.5 37.01 39.92 ±2.06 38.93 ±2.10 35.08 ±1.95 34.09 ±3.14
LINAGORA/Canary_Luciole-1B_Step-Encoder_8h 2.1B 0.842 0.66 5.5 35.80 35.69 ±2.20 36.29 ±2.02 39.81 ±2.07 31.41 ±3.00
LINAGORA/Canary_Luciole-1B_Step-Encoder_80h 2.1B 0.835 0.64 6.2 35.56 35.97 ±2.15 35.91 ±2.03 39.30 ±2.06 31.07 ±2.99
LINAGORA/Canary_Luciole-1B_Step-Adapter_2h 2.1B 0.821 0.58 6.5 34.97 33.98 ±2.17 34.61 ±2.03 39.48 ±2.10 31.82 ±3.13
Qwen/Qwen2-Audio-7B-Instruct 8.4B 0.610 -0.14 8.8 26.86 29.15 ±2.03 24.60 ±1.71 28.09 ±1.79 25.60 ±2.55
LINAGORA/Canary-Qwen3-4B_data-v1_8h 4.8B 0.594 -0.16 8.8 26.47 35.51 ±2.08 27.14 ±1.94 23.90 ±1.80 19.34 ±3.08
microsoft/Phi-4-multimodal-instruct 5.6B 0.593 -0.15 9.0 26.37 39.13 ±2.25 22.38 ±2.14 22.90 ±1.92 21.06 ±2.88
Qwen/Qwen2.5-Omni-7B 11B 0.481 -0.58 11.2 22.14 25.53 ±1.71 22.45 ±1.83 23.56 ±1.87 17.03 ±2.49
Qwen/Qwen2.5-Omni-3B 5.9B 0.437 -0.74 12.2 20.29 23.17 ±1.95 21.08 ±1.95 18.32 ±1.76 18.60 ±2.34
LINAGORA/Canary-Qwen3-1.7B_data-v1_8h 2.5B 0.416 -0.79 12.8 19.61 27.71 ±1.83 18.19 ±1.67 17.13 ±1.49 15.40 ±2.60
nvidia/audio-flamingo-3-hf 8.2B 0.284 -1.22 13.2 14.86 29.01 ±1.94 13.53 ±1.62 12.43 ±1.48 4.48 ±1.97
mistralai/Voxtral-Mini-3B-2507 4.68B 0.000 -2.28 15.0 3.67 4.26 ±0.52 3.64 ±0.60 3.19 ±0.51 3.61 ±0.99
Tasks · AST — METEOR (%)
Model Size Min-Max Z-Score Avg Rank Average FR-EN · Multilingual_TEDx (FR→EN) FR-ES · Multilingual_TEDx (FR→ES) ES-FR · Multilingual_TEDx (ES→FR) ES-IT · Multilingual_TEDx (ES→IT)
LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h 2.1B 0.997 1.14 1.5 62.23 66.55 ±1.73 62.90 ±1.85 59.98 ±1.92 59.50 ±3.11
LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h 2.1B 0.988 1.11 1.8 61.91 65.46 ±1.78 63.09 ±1.82 59.19 ±1.91 59.89 ±3.01
LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h 2.1B 0.976 1.07 2.8 61.34 65.73 ±1.76 62.85 ±1.84 59.08 ±1.92 57.69 ±3.04
LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h 2.5B 0.915 0.87 4.5 58.79 63.43 ±1.81 60.94 ±1.89 55.08 ±1.87 55.72 ±3.13
LINAGORA/Canary_Luciole-1B_Step-Adapter_2h 2.1B 0.856 0.67 5.5 56.39 59.84 ±2.16 57.27 ±2.07 55.45 ±1.97 53.00 ±3.29
LINAGORA/Canary_Luciole-1B_Step-Encoder_8h 2.1B 0.854 0.66 5.2 56.26 60.18 ±2.16 57.90 ±2.06 55.72 ±1.93 51.22 ±3.24
LINAGORA/Canary_Luciole-1B_Step-Encoder_80h 2.1B 0.845 0.63 6.8 55.91 59.82 ±2.17 56.74 ±2.14 55.06 ±2.03 52.02 ±3.31
Qwen/Qwen2-Audio-7B-Instruct 8.4B 0.653 -0.01 8.8 48.00 51.06 ±2.30 49.24 ±2.04 46.22 ±2.08 45.46 ±3.11
Qwen/Qwen2.5-Omni-7B 11B 0.610 -0.15 9.5 45.93 53.66 ±2.01 47.82 ±2.26 42.80 ±2.25 39.45 ±3.23
Qwen/Qwen2.5-Omni-3B 5.9B 0.579 -0.24 9.8 44.77 51.84 ±2.00 46.21 ±2.27 39.10 ±2.19 41.91 ±3.00
microsoft/Phi-4-multimodal-instruct 5.6B 0.526 -0.39 10.5 42.31 57.03 ±2.36 40.09 ±2.66 31.28 ±2.46 40.85 ±3.57
LINAGORA/Canary-Qwen3-4B_data-v1_8h 4.8B 0.486 -0.56 11.5 40.85 47.57 ±1.93 45.23 ±2.00 34.56 ±2.00 36.04 ±3.14
LINAGORA/Canary-Qwen3-1.7B_data-v1_8h 2.5B 0.307 -1.15 13.2 33.41 41.47 ±1.94 34.19 ±2.08 27.86 ±1.93 30.12 ±3.08
mistralai/Voxtral-Mini-3B-2507 4.68B 0.113 -1.81 14.2 25.60 29.81 ±1.36 25.65 ±1.50 21.31 ±1.35 25.62 ±2.25
nvidia/audio-flamingo-3-hf 8.2B 0.094 -1.85 14.5 23.97 43.68 ±2.03 22.34 ±2.25 17.21 ±1.99 12.67 ±2.35

Tasks · QA

Tables
Math Question Answering
Tasks · QA — ACC (%)
Model Size Min-Max Z-Score Avg Rank Average EN · spoken-mqa_short_digit
Qwen/Qwen2.5-Omni-7B 11B 1.000 1.64 1.0 89.00 89.00 ±6.13
Qwen/Qwen2.5-Omni-3B 5.9B 0.948 1.48 2.0 85.00 85.00 ±7.00
mistralai/Voxtral-Mini-3B-2507 4.68B 0.740 0.84 3.0 69.00 69.00 ±9.06
LINAGORA/Canary-Qwen3-4B_data-v1_8h 4.8B 0.701 0.72 4.0 66.00 66.00 ±9.28
LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h 2.5B 0.701 0.72 5.0 66.00 66.00 ±9.28
Qwen/Qwen2-Audio-7B-Instruct 8.4B 0.701 0.72 6.0 66.00 66.00 ±9.28
nvidia/audio-flamingo-3-hf 8.2B 0.675 0.64 7.0 64.00 64.00 ±9.41
LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h 2.1B 0.481 0.03 8.0 49.00 49.00 ±9.80
LINAGORA/Canary-Qwen3-1.7B_data-v1_8h 2.5B 0.364 -0.33 9.0 40.00 40.00 ±9.60
LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h 2.1B 0.286 -0.57 10.0 34.00 34.00 ±9.28
microsoft/Phi-4-multimodal-instruct 5.6B 0.143 -1.01 11.0 23.00 23.00 ±8.25
LINAGORA/Canary_Luciole-1B_Step-Adapter_2h 2.1B 0.117 -1.09 12.0 21.00 21.00 ±7.98
LINAGORA/Canary_Luciole-1B_Step-Encoder_80h 2.1B 0.117 -1.09 13.0 21.00 21.00 ±7.98
LINAGORA/Canary_Luciole-1B_Step-Encoder_8h 2.1B 0.078 -1.22 14.0 18.00 18.00 ±7.53
LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h 2.1B 0.000 -1.46 15.0 12.00 12.00 ±6.37
Question Answering
Tasks · QA — FLOW_JUDGE
Model Size Min-Max Z-Score Avg Rank Average FR · CohereLabs-Aya_collection FR · Vigogne--Alpaca FR · VoxPopuli-QA EN · NationalSpeechCorpus_SQA EN · OpenHermes_audio EN · SLUE-P2-SQA5 EN · SpokenWOZ_AIR-Bench EN · alpaca_audio EN · fisher_AIR-Bench EN · public-sg-speech
mistralai/Voxtral-Mini-3B-2507 4.68B 0.977 1.71 1.5 70.19 48.39 ±0.41 57.08 ±0.10 77.64 ±0.08 74.00 ±0.07 77.40 ±0.20 88.28 ±0.09 77.20 ±0.15 78.20 ±0.19 81.10 ±0.11 79.20 ±0.06
LINAGORA/Canary-Qwen3-4B_data-v1_8h 4.8B 0.737 0.84 4.0 64.56 48.39 ±0.33 52.56 ±0.09 78.56 ±0.10 59.70 ±0.09 67.40 ±0.25 86.13 ±0.12 65.18 ±0.20 70.40 ±0.25 69.80 ±0.17 66.36 ±0.08
LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h 2.1B 0.663 0.60 4.5 62.35 50.97 ±0.36 53.92 ±0.10 80.56 ±0.10 55.85 ±0.09 56.00 ±0.26 71.42 ±0.17 65.18 ±0.20 63.80 ±0.25 64.90 ±0.18 63.00 ±0.08
LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h 2.5B 0.637 0.45 6.0 62.74 43.23 ±0.28 51.00 ±0.10 76.68 ±0.10 59.75 ±0.09 63.00 ±0.21 81.67 ±0.14 69.12 ±0.20 71.80 ±0.23 70.30 ±0.17 63.88 ±0.08
Qwen/Qwen2.5-Omni-7B 11B 0.626 0.37 6.5 63.12 42.58 ±0.38 51.08 ±0.10 67.68 ±0.10 63.05 ±0.09 68.40 ±0.19 90.10 ±0.09 71.30 ±0.19 69.40 ±0.22 73.70 ±0.16 71.32 ±0.08
microsoft/Phi-4-multimodal-instruct 5.6B 0.547 0.07 7.5 61.54 35.48 ±0.32 41.76 ±0.09 79.24 ±0.10 63.75 ±0.09 61.80 ±0.27 91.76 ±0.08 74.72 ±0.17 53.40 ±0.29 76.90 ±0.15 74.16 ±0.08
LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h 2.1B 0.520 0.04 7.5 59.68 45.16 ±0.32 50.64 ±0.10 77.76 ±0.10 54.00 ±0.09 54.40 ±0.24 71.37 ±0.17 64.77 ±0.19 57.80 ±0.26 67.00 ±0.17 61.20 ±0.08
Qwen/Qwen2.5-Omni-3B 5.9B 0.519 -0.04 8.5 61.05 39.35 ±0.39 48.84 ±0.10 65.60 ±0.11 64.95 ±0.09 64.80 ±0.18 88.97 ±0.10 67.56 ±0.20 68.60 ±0.23 71.10 ±0.17 69.92 ±0.08
LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h 2.1B 0.517 0.04 8.0 59.44 45.81 ±0.38 53.52 ±0.10 76.48 ±0.10 54.55 ±0.09 53.60 ±0.25 66.96 ±0.17 59.48 ±0.20 62.40 ±0.25 64.70 ±0.17 60.28 ±0.09
Qwen/Qwen2-Audio-7B-Instruct 8.4B 0.457 -0.27 10.5 59.60 34.84 ±0.42 43.28 ±0.10 74.84 ±0.10 57.45 ±0.08 62.60 ±0.27 87.94 ±0.11 68.91 ±0.19 63.80 ±0.28 72.00 ±0.16 64.76 ±0.08
LINAGORA/Canary-Qwen3-1.7B_data-v1_8h 2.5B 0.434 -0.31 9.5 58.31 42.58 ±0.31 48.20 ±0.10 72.56 ±0.11 54.95 ±0.09 57.00 ±0.22 77.35 ±0.15 57.62 ±0.20 60.60 ±0.26 63.90 ±0.18 63.80 ±0.08
LINAGORA/Canary_Luciole-1B_Step-Encoder_80h 2.1B 0.347 -0.60 10.5 55.81 46.45 ±0.31 51.00 ±0.10 70.92 ±0.11 49.30 ±0.09 60.20 ±0.22 62.79 ±0.18 47.98 ±0.19 58.20 ±0.26 50.50 ±0.17 59.48 ±0.08
LINAGORA/Canary_Luciole-1B_Step-Adapter_2h 2.1B 0.343 -0.62 11.0 55.87 45.16 ±0.33 49.88 ±0.10 71.20 ±0.11 49.30 ±0.09 62.20 ±0.23 60.98 ±0.18 48.39 ±0.19 59.60 ±0.27 53.30 ±0.17 60.48 ±0.08
LINAGORA/Canary_Luciole-1B_Step-Encoder_8h 2.1B 0.318 -0.71 11.5 55.23 45.81 ±0.35 49.68 ±0.10 71.20 ±0.11 48.80 ±0.09 58.60 ±0.22 63.14 ±0.18 46.94 ±0.18 57.20 ±0.25 50.70 ±0.18 58.88 ±0.09
nvidia/audio-flamingo-3-hf 8.2B 0.122 -1.54 13.0 52.74 37.42 ±0.32 38.12 ±0.09 58.36 ±0.10 63.60 ±0.08 48.80 ±0.27 78.48 ±0.13 69.22 ±0.17 24.00 ±0.41 71.70 ±0.13 70.08 ±0.07

Tasks · Others

Tables
Age Recognition
Tasks · Others — FLOW_JUDGE
Model Size Min-Max Z-Score Avg Rank Average FR · CommonVoice_Age EN · CommonVoice_Age
LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h 2.1B 0.984 1.34 1.5 34.40 27.60 ±0.04 41.20 ±0.04
LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h 2.1B 0.913 1.13 3.0 32.40 24.00 ±0.04 40.80 ±0.04
LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h 2.5B 0.879 1.02 2.0 31.80 21.00 ±0.04 42.60 ±0.04
LINAGORA/Canary_Luciole-1B_Step-Encoder_80h 2.1B 0.812 0.83 4.0 29.70 18.40 ±0.03 41.00 ±0.04
LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h 2.1B 0.793 0.79 4.5 28.70 19.00 ±0.03 38.40 ±0.04
LINAGORA/Canary_Luciole-1B_Step-Adapter_2h 2.1B 0.764 0.71 6.5 27.80 17.80 ±0.03 37.80 ±0.04
LINAGORA/Canary_Luciole-1B_Step-Encoder_8h 2.1B 0.726 0.59 6.5 26.90 15.40 ±0.03 38.40 ±0.04
Qwen/Qwen2-Audio-7B-Instruct 8.4B 0.646 0.40 8.5 23.60 15.00 ±0.03 32.20 ±0.04
nvidia/audio-flamingo-3-hf 8.2B 0.472 -0.15 10.0 19.50 3.20 ±0.02 35.80 ±0.04
microsoft/Phi-4-multimodal-instruct 5.6B 0.120 -0.96 10.0 4.00 5.00 ±0.02 3.00 ±0.01
Qwen/Qwen2.5-Omni-3B 5.9B 0.093 -1.02 12.0 2.90 4.80 ±0.02 1.00 ±0.01
LINAGORA/Canary-Qwen3-4B_data-v1_8h 4.8B 0.075 -1.08 12.0 2.50 3.60 ±0.02 1.40 ±0.01
mistralai/Voxtral-Mini-3B-2507 4.68B 0.052 -1.17 12.5 2.40 0.40 ±0.01 4.40 ±0.02
LINAGORA/Canary-Qwen3-1.7B_data-v1_8h 2.5B 0.052 -1.15 12.5 2.00 1.80 ±0.01 2.20 ±0.01
Qwen/Qwen2.5-Omni-7B 11B 0.000 -1.29 14.5 0.20 0.40 ±0.01 0.00 ±0.00
Dialogue Summarization
Tasks · Others — FLOW_JUDGE
Model Size Min-Max Z-Score Avg Rank Average EN · NationalSpeechCorpus_SDS
mistralai/Voxtral-Mini-3B-2507 4.68B 1.000 2.42 1.0 66.67 66.67 ±0.08
microsoft/Phi-4-multimodal-instruct 5.6B 0.599 1.00 2.0 56.40 56.40 ±0.09
nvidia/audio-flamingo-3-hf 8.2B 0.565 0.88 3.0 55.53 55.53 ±0.09
Qwen/Qwen2.5-Omni-7B 11B 0.557 0.85 4.0 55.33 55.33 ±0.08
Qwen/Qwen2.5-Omni-3B 5.9B 0.552 0.83 5.0 55.20 55.20 ±0.09
Qwen/Qwen2-Audio-7B-Instruct 8.4B 0.424 0.38 6.0 51.93 51.93 ±0.08
LINAGORA/Canary-Qwen3-4B_data-v1_8h 4.8B 0.326 0.03 7.0 49.40 49.40 ±0.09
LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h 2.5B 0.250 -0.23 8.0 47.47 47.47 ±0.09
LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h 2.1B 0.148 -0.59 9.0 44.87 44.87 ±0.09
LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h 2.1B 0.138 -0.63 10.0 44.60 44.60 ±0.09
LINAGORA/Canary_Luciole-1B_Step-Encoder_8h 2.1B 0.112 -0.72 11.0 43.93 43.93 ±0.09
LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h 2.1B 0.034 -0.99 12.0 41.93 41.93 ±0.10
LINAGORA/Canary-Qwen3-1.7B_data-v1_8h 2.5B 0.021 -1.04 13.0 41.60 41.60 ±0.09
LINAGORA/Canary_Luciole-1B_Step-Adapter_2h 2.1B 0.008 -1.09 14.0 41.27 41.27 ±0.08
LINAGORA/Canary_Luciole-1B_Step-Encoder_80h 2.1B 0.000 -1.11 15.0 41.07 41.07 ±0.09
Emotion Recognition
Tasks · Others — FLOW_JUDGE
Model Size Min-Max Z-Score Avg Rank Average FR · MELD_Emotion EN · IEMOCAP-Emotion EN · MELD_Emotion
nvidia/audio-flamingo-3-hf 8.2B 1.000 3.17 1.0 52.80 53.00 ±0.04 43.20 ±0.04 62.00 ±0.04
Qwen/Qwen2-Audio-7B-Instruct 8.4B 0.448 0.87 3.5 26.70 16.20 ±0.03 35.80 ±0.04 38.60 ±0.04
LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h 2.5B 0.305 0.34 3.0 21.45 19.80 ±0.03 25.80 ±0.04 20.40 ±0.04
LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h 2.1B 0.301 0.32 3.0 21.30 19.80 ±0.03 15.60 ±0.03 30.00 ±0.04
LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h 2.1B 0.231 0.06 6.0 18.50 19.40 ±0.03 14.40 ±0.03 20.80 ±0.04
microsoft/Phi-4-multimodal-instruct 5.6B 0.214 -0.05 7.0 17.00 12.20 ±0.03 22.20 ±0.04 21.40 ±0.04
LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h 2.1B 0.192 -0.12 6.5 16.45 15.20 ±0.03 16.20 ±0.03 19.20 ±0.03
mistralai/Voxtral-Mini-3B-2507 4.68B 0.142 -0.34 8.5 13.80 9.20 ±0.03 27.20 ±0.04 9.60 ±0.03
LINAGORA/Canary_Luciole-1B_Step-Encoder_8h 2.1B 0.141 -0.32 8.5 14.15 12.60 ±0.03 13.40 ±0.03 18.00 ±0.03
LINAGORA/Canary_Luciole-1B_Step-Encoder_80h 2.1B 0.140 -0.33 8.5 14.10 12.60 ±0.03 12.80 ±0.03 18.40 ±0.03
LINAGORA/Canary_Luciole-1B_Step-Adapter_2h 2.1B 0.095 -0.51 11.5 12.05 10.20 ±0.03 14.40 ±0.03 13.40 ±0.03
LINAGORA/Canary-Qwen3-4B_data-v1_8h 4.8B 0.051 -0.70 13.5 9.90 6.80 ±0.02 18.80 ±0.03 7.20 ±0.02
LINAGORA/Canary-Qwen3-1.7B_data-v1_8h 2.5B 0.040 -0.75 13.5 9.25 4.80 ±0.02 22.00 ±0.04 5.40 ±0.02
Qwen/Qwen2.5-Omni-3B 5.9B 0.038 -0.77 12.5 8.90 2.60 ±0.01 24.80 ±0.04 5.60 ±0.02
Qwen/Qwen2.5-Omni-7B 11B 0.015 -0.86 13.5 7.90 1.60 ±0.01 22.60 ±0.04 5.80 ±0.02
Gender Recognition
Tasks · Others — FLOW_JUDGE
Model Size Min-Max Z-Score Avg Rank Average FR · CommonVoice_Gender EN · CommonVoice_Gender EN · IEMOCAP-gender
Qwen/Qwen2-Audio-7B-Instruct 8.4B 0.948 1.40 4.0 81.10 67.00 ±0.04 92.40 ±0.02 98.00 ±0.01
nvidia/audio-flamingo-3-hf 8.2B 0.787 0.89 5.5 68.95 48.40 ±0.04 93.00 ±0.02 86.00 ±0.03
LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h 2.1B 0.775 0.72 2.5 65.95 74.60 ±0.04 65.60 ±0.04 49.00 ±0.04
LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h 2.5B 0.755 0.66 3.0 64.55 71.60 ±0.04 68.20 ±0.04 46.80 ±0.04
LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h 2.1B 0.723 0.55 4.5 61.85 71.60 ±0.04 65.40 ±0.04 38.80 ±0.04
LINAGORA/Canary_Luciole-1B_Step-Adapter_2h 2.1B 0.699 0.48 7.0 60.20 66.20 ±0.04 61.40 ±0.04 47.00 ±0.04
LINAGORA/Canary_Luciole-1B_Step-Encoder_8h 2.1B 0.698 0.47 6.0 59.95 69.00 ±0.04 54.80 ±0.04 47.00 ±0.04
LINAGORA/Canary_Luciole-1B_Step-Encoder_80h 2.1B 0.683 0.41 7.0 58.70 68.20 ±0.04 52.00 ±0.04 46.40 ±0.04
LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h 2.1B 0.672 0.38 8.0 57.80 68.00 ±0.04 62.20 ±0.04 33.00 ±0.04
microsoft/Phi-4-multimodal-instruct 5.6B 0.476 -0.16 7.5 44.15 31.00 ±0.04 58.40 ±0.04 56.20 ±0.04
LINAGORA/Canary-Qwen3-1.7B_data-v1_8h 2.5B 0.327 -0.70 12.0 31.85 27.40 ±0.04 40.40 ±0.04 32.20 ±0.04
LINAGORA/Canary-Qwen3-4B_data-v1_8h 4.8B 0.322 -0.72 11.5 31.35 29.20 ±0.04 17.60 ±0.03 49.40 ±0.04
mistralai/Voxtral-Mini-3B-2507 4.68B 0.195 -1.18 13.0 20.75 28.20 ±0.04 26.20 ±0.04 0.40 ±0.01
Qwen/Qwen2.5-Omni-7B 11B 0.098 -1.43 13.5 14.25 6.40 ±0.02 8.80 ±0.02 35.40 ±0.04
Qwen/Qwen2.5-Omni-3B 5.9B 0.000 -1.77 15.0 6.35 1.60 ±0.01 9.20 ±0.03 13.00 ±0.03
Spoken Language Identification
Tasks · Others — FLOW_JUDGE
Model Size Min-Max Z-Score Avg Rank Average OTHER · CoVost2_AIR-Bench
nvidia/audio-flamingo-3-hf 8.2B 1.000 1.87 1.0 93.00 93.00 ±0.02
Qwen/Qwen2-Audio-7B-Instruct 8.4B 0.912 1.60 2.0 88.40 88.40 ±0.03
Qwen/Qwen2.5-Omni-7B 11B 0.748 1.08 3.0 79.80 79.80 ±0.04
mistralai/Voxtral-Mini-3B-2507 4.68B 0.668 0.83 4.0 75.60 75.60 ±0.04
LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h 2.5B 0.611 0.65 5.0 72.60 72.60 ±0.04
LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h 2.1B 0.435 0.11 6.0 63.40 63.40 ±0.04
Qwen/Qwen2.5-Omni-3B 5.9B 0.431 0.09 7.0 63.20 63.20 ±0.04
LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h 2.1B 0.401 -0.00 8.0 61.60 61.60 ±0.04
LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h 2.1B 0.393 -0.03 9.0 61.20 61.20 ±0.04
microsoft/Phi-4-multimodal-instruct 5.6B 0.195 -0.65 10.0 50.80 50.80 ±0.04
LINAGORA/Canary_Luciole-1B_Step-Adapter_2h 2.1B 0.076 -1.02 11.0 44.60 44.60 ±0.04
LINAGORA/Canary_Luciole-1B_Step-Encoder_8h 2.1B 0.057 -1.07 12.0 43.60 43.60 ±0.04
LINAGORA/Canary_Luciole-1B_Step-Encoder_80h 2.1B 0.050 -1.10 13.0 43.20 43.20 ±0.04
LINAGORA/Canary-Qwen3-1.7B_data-v1_8h 2.5B 0.042 -1.12 14.0 42.80 42.80 ±0.04
LINAGORA/Canary-Qwen3-4B_data-v1_8h 4.8B 0.000 -1.25 15.0 40.60 40.60 ±0.04

Tasks · Music

Tables
Music Captioning
Tasks · Music — FLOW_JUDGE
Model Size Min-Max Z-Score Avg Rank Average OTHER · MusicCaps
Qwen/Qwen2-Audio-7B-Instruct 8.4B 1.000 2.04 1.0 59.76 59.76 ±0.04
nvidia/audio-flamingo-3-hf 8.2B 0.864 1.58 2.0 55.48 55.48 ±0.11
mistralai/Voxtral-Mini-3B-2507 4.68B 0.709 1.05 3.0 50.60 50.60 ±0.07
LINAGORA/Canary_Luciole-1B_Step-Encoder_8h 2.1B 0.555 0.53 4.0 45.76 45.76 ±0.05
LINAGORA/Canary_Luciole-1B_Step-Adapter_2h 2.1B 0.551 0.51 5.0 45.64 45.64 ±0.05
LINAGORA/Canary_Luciole-1B_Step-Encoder_80h 2.1B 0.541 0.48 6.0 45.32 45.32 ±0.05
microsoft/Phi-4-multimodal-instruct 5.6B 0.507 0.36 7.0 44.24 44.24 ±0.06
Qwen/Qwen2.5-Omni-7B 11B 0.290 -0.38 8.0 37.40 37.40 ±0.07
LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h 2.5B 0.257 -0.49 9.0 36.36 36.36 ±0.06
Qwen/Qwen2.5-Omni-3B 5.9B 0.254 -0.50 10.0 36.28 36.28 ±0.07
LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h 2.1B 0.189 -0.72 11.0 34.24 34.24 ±0.06
LINAGORA/Canary-Qwen3-4B_data-v1_8h 4.8B 0.178 -0.76 12.0 33.88 33.88 ±0.07
LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h 2.1B 0.067 -1.13 13.0 30.40 30.40 ±0.06
LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h 2.1B 0.046 -1.21 14.0 29.72 29.72 ±0.06
LINAGORA/Canary-Qwen3-1.7B_data-v1_8h 2.5B 0.000 -1.36 15.0 28.28 28.28 ±0.06
Music Question Answering
Tasks · Music — FLOW_JUDGE
Model Size Min-Max Z-Score Avg Rank Average FR · MusicCaps-QA EN · MTJ-Jamendo_AIR-Bench EN · MusicCaps-QA
Qwen/Qwen2-Audio-7B-Instruct 8.4B 0.970 1.99 1.5 59.00 55.32 ±0.07 63.20 ±0.04 62.16 ±0.08
nvidia/audio-flamingo-3-hf 8.2B 0.906 1.77 2.5 57.16 54.72 ±0.09 62.60 ±0.04 56.60 ±0.08
mistralai/Voxtral-Mini-3B-2507 4.68B 0.684 0.99 2.0 49.57 56.48 ±0.07 28.00 ±0.04 57.32 ±0.08
microsoft/Phi-4-multimodal-instruct 5.6B 0.560 0.59 4.0 46.87 52.68 ±0.08 31.80 ±0.04 50.32 ±0.09
LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h 2.1B 0.396 0.05 6.0 42.88 48.84 ±0.07 27.60 ±0.04 46.24 ±0.08
LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h 2.1B 0.382 0.00 7.5 42.27 49.36 ±0.08 23.40 ±0.04 46.96 ±0.09
LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h 2.1B 0.317 -0.22 10.5 40.42 48.68 ±0.08 20.60 ±0.04 43.72 ±0.09
LINAGORA/Canary_Luciole-1B_Step-Adapter_2h 2.1B 0.288 -0.29 8.5 40.76 44.76 ±0.07 29.60 ±0.04 43.92 ±0.08
LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h 2.5B 0.258 -0.41 10.5 39.32 46.32 ±0.08 18.80 ±0.03 45.84 ±0.08
LINAGORA/Canary_Luciole-1B_Step-Encoder_80h 2.1B 0.255 -0.40 9.0 40.11 43.44 ±0.07 31.20 ±0.04 42.36 ±0.08
LINAGORA/Canary_Luciole-1B_Step-Encoder_8h 2.1B 0.219 -0.52 10.5 39.22 42.64 ±0.07 28.80 ±0.04 42.80 ±0.08
LINAGORA/Canary-Qwen3-4B_data-v1_8h 4.8B 0.205 -0.57 10.5 38.48 43.60 ±0.07 24.80 ±0.04 41.92 ±0.08
Qwen/Qwen2.5-Omni-7B 11B 0.143 -0.75 9.5 38.22 38.28 ±0.09 41.00 ±0.04 35.32 ±0.08
LINAGORA/Canary-Qwen3-1.7B_data-v1_8h 2.5B 0.069 -1.01 12.5 35.77 38.56 ±0.07 23.80 ±0.04 42.16 ±0.08
Qwen/Qwen2.5-Omni-3B 5.9B 0.000 -1.23 15.0 34.06 37.08 ±0.08 21.00 ±0.04 41.08 ±0.08

Tasks · Sound

Tables
Audio Captioning
Tasks · Sound — FLOW_JUDGE
Model Size Min-Max Z-Score Avg Rank Average OTHER · AudioCaps OTHER · WavCaps
nvidia/audio-flamingo-3-hf 8.2B 1.000 1.90 1.0 57.76 64.00 ±0.06 51.52 ±0.10
Qwen/Qwen2.5-Omni-7B 11B 0.914 1.63 2.0 54.68 56.60 ±0.09 52.76 ±0.10
Qwen/Qwen2-Audio-7B-Instruct 8.4B 0.887 1.55 3.0 53.70 54.36 ±0.09 53.04 ±0.10
microsoft/Phi-4-multimodal-instruct 5.6B 0.711 1.00 4.0 47.34 48.40 ±0.08 46.28 ±0.09
mistralai/Voxtral-Mini-3B-2507 4.68B 0.476 0.27 5.0 38.90 38.56 ±0.07 39.24 ±0.07
Qwen/Qwen2.5-Omni-3B 5.9B 0.393 0.01 6.0 35.88 37.48 ±0.09 34.28 ±0.09
LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h 2.5B 0.325 -0.20 7.0 33.46 31.80 ±0.07 35.12 ±0.08
LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h 2.1B 0.263 -0.40 8.0 31.22 30.60 ±0.07 31.84 ±0.08
LINAGORA/Canary_Luciole-1B_Step-Adapter_2h 2.1B 0.192 -0.62 9.0 28.64 25.96 ±0.05 31.32 ±0.07
LINAGORA/Canary_Luciole-1B_Step-Encoder_8h 2.1B 0.191 -0.62 10.0 28.62 26.84 ±0.06 30.40 ±0.07
LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h 2.1B 0.166 -0.70 11.0 27.72 26.56 ±0.06 28.88 ±0.07
LINAGORA/Canary_Luciole-1B_Step-Encoder_80h 2.1B 0.163 -0.71 12.0 27.62 25.28 ±0.05 29.96 ±0.07
LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h 2.1B 0.157 -0.73 13.0 27.38 25.68 ±0.05 29.08 ±0.07
LINAGORA/Canary-Qwen3-4B_data-v1_8h 4.8B 0.018 -1.16 14.0 22.38 21.24 ±0.03 23.52 ±0.05
LINAGORA/Canary-Qwen3-1.7B_data-v1_8h 2.5B 0.000 -1.21 15.0 21.74 21.04 ±0.03 22.44 ±0.05
Audio Question Answering
Tasks · Sound — FLOW_JUDGE
Model Size Min-Max Z-Score Avg Rank Average EN · AudioCaps-QA EN · Clotho-AQA EN · WavCaps-QA
Qwen/Qwen2-Audio-7B-Instruct 8.4B 1.000 1.47 1.0 60.99 63.26 ±0.15 57.88 ±0.14 61.84 ±0.16
nvidia/audio-flamingo-3-hf 8.2B 0.928 1.27 2.0 59.73 63.71 ±0.11 52.64 ±0.13 62.83 ±0.14
Qwen/Qwen2.5-Omni-7B 11B 0.897 1.18 3.0 59.19 58.08 ±0.16 65.92 ±0.14 53.55 ±0.17
microsoft/Phi-4-multimodal-instruct 5.6B 0.888 1.16 4.0 59.02 56.10 ±0.15 63.72 ±0.15 57.24 ±0.17
mistralai/Voxtral-Mini-3B-2507 4.68B 0.869 1.11 5.0 58.69 60.13 ±0.12 58.72 ±0.11 57.24 ±0.14
Qwen/Qwen2.5-Omni-3B 5.9B 0.542 0.18 6.0 52.93 53.42 ±0.15 58.28 ±0.13 47.11 ±0.16
LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h 2.1B 0.504 0.07 7.0 52.27 47.60 ±0.14 61.52 ±0.15 47.70 ±0.16
LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h 2.1B 0.426 -0.15 8.0 50.90 44.41 ±0.14 61.84 ±0.16 46.45 ±0.16
LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h 2.5B 0.421 -0.16 9.0 50.80 45.69 ±0.14 61.20 ±0.16 45.53 ±0.16
LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h 2.1B 0.401 -0.22 10.0 50.46 44.66 ±0.15 60.32 ±0.16 46.38 ±0.16
LINAGORA/Canary_Luciole-1B_Step-Adapter_2h 2.1B 0.146 -0.94 11.0 45.98 40.77 ±0.14 53.68 ±0.15 43.49 ±0.15
LINAGORA/Canary_Luciole-1B_Step-Encoder_8h 2.1B 0.111 -1.04 12.0 45.35 39.62 ±0.14 52.76 ±0.15 43.68 ±0.15
LINAGORA/Canary_Luciole-1B_Step-Encoder_80h 2.1B 0.045 -1.23 13.0 44.21 39.62 ±0.13 51.16 ±0.15 41.84 ±0.15
LINAGORA/Canary-Qwen3-1.7B_data-v1_8h 2.5B 0.005 -1.34 14.0 43.49 38.34 ±0.14 52.72 ±0.15 39.41 ±0.15
LINAGORA/Canary-Qwen3-4B_data-v1_8h 4.8B 0.000 -1.35 15.0 43.41 39.68 ±0.14 49.56 ±0.15 40.99 ±0.15

Languages · French

Tables
French — Models × Tasks
Model Size Min-Max Z-Score Avg Rank ASR (WER %) CommonVoice Fleurs Multilingual_TEDx SUMM-RE VoxPopuli YouTubeFr AST (METEOR %) Multilingual_TEDx (FR→EN) Multilingual_TEDx (FR→ES) QUESTION ANSWERING (FLOW_JUDGE) CohereLabs-Aya_collection Vigogne--Alpaca VoxPopuli-QA MUSIC QUESTION ANSWERING (FLOW_JUDGE) EMOTION RECOGNITION (FLOW_JUDGE) GENDER RECOGNITION (FLOW_JUDGE) AGE RECOGNITION (FLOW_JUDGE)
LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h 2.1B 0.833 0.95 2.0 15.12 ±9.45 6.98 ±1.42 6.86 ±1.23 9.25 ±3.73 28.70 ±55.10 8.16 ±1.15 30.76 ±11.95 64.72 ±1.27 66.55 ±1.73 62.90 ±1.85 61.82 ±0.08 50.97 ±0.36 53.92 ±0.10 80.56 ±0.10 48.84 ±0.07 19.80 ±0.03 74.60 ±0.04 24.00 ±0.04
LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h 2.1B 0.801 0.81 3.6 20.53 ±11.37 7.00 ±1.35 7.46 ±1.27 9.94 ±2.07 51.94 ±64.04 7.97 ±1.13 38.86 ±22.25 64.28 ±1.28 65.46 ±1.78 63.09 ±1.82 58.60 ±0.08 45.81 ±0.38 53.52 ±0.10 76.48 ±0.10 49.36 ±0.08 15.20 ±0.03 71.60 ±0.04 27.60 ±0.04
LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h 2.1B 0.756 0.68 4.3 15.75 ±9.61 7.71 ±1.56 8.10 ±1.33 10.32 ±2.18 27.14 ±24.52 8.41 ±1.37 32.85 ±51.80 64.29 ±1.27 65.73 ±1.76 62.85 ±1.84 57.85 ±0.08 45.16 ±0.32 50.64 ±0.10 77.76 ±0.10 48.68 ±0.08 19.40 ±0.03 68.00 ±0.04 19.00 ±0.03
LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h 2.5B 0.736 0.61 4.3 19.79 ±10.56 8.09 ±1.48 8.49 ±1.27 10.62 ±6.56 42.54 ±56.08 10.49 ±20.17 38.53 ±19.64 62.18 ±1.31 63.43 ±1.81 60.94 ±1.89 56.97 ±0.08 43.23 ±0.28 51.00 ±0.10 76.68 ±0.10 46.32 ±0.08 19.80 ±0.03 71.60 ±0.04 21.00 ±0.04
LINAGORA/Canary_Luciole-1B_Step-Encoder_80h 2.1B 0.648 0.30 7.1 23.31 ±13.20 17.17 ±16.96 8.08 ±1.22 12.86 ±33.42 43.96 ±20.96 8.85 ±1.46 48.95 ±65.81 58.28 ±1.53 59.82 ±2.17 56.74 ±2.14 56.12 ±0.08 46.45 ±0.31 51.00 ±0.10 70.92 ±0.11 43.44 ±0.07 12.60 ±0.03 68.20 ±0.04 18.40 ±0.03
Qwen/Qwen2-Audio-7B-Instruct 8.4B 0.638 0.24 7.9 33.56 ±3.68 18.92 ±2.45 27.90 ±2.52 26.52 ±15.79 39.09 ±3.37 37.35 ±4.38 51.58 ±13.46 50.15 ±1.54 51.06 ±2.30 49.24 ±2.04 50.99 ±0.09 34.84 ±0.42 43.28 ±0.10 74.84 ±0.10 55.32 ±0.07 16.20 ±0.03 67.00 ±0.04 15.00 ±0.03
LINAGORA/Canary_Luciole-1B_Step-Adapter_2h 2.1B 0.637 0.25 8.1 25.54 ±30.62 14.15 ±33.16 8.53 ±1.34 15.26 ±169.56 54.37 ±40.13 9.05 ±1.30 51.90 ±47.03 58.55 ±1.50 59.84 ±2.16 57.27 ±2.07 55.41 ±0.08 45.16 ±0.33 49.88 ±0.10 71.20 ±0.11 44.76 ±0.07 10.20 ±0.03 66.20 ±0.04 17.80 ±0.03
LINAGORA/Canary_Luciole-1B_Step-Encoder_8h 2.1B 0.629 0.24 7.1 21.49 ±10.72 10.00 ±1.66 8.22 ±1.25 10.17 ±1.95 38.22 ±27.56 8.55 ±1.32 53.79 ±57.27 59.04 ±1.50 60.18 ±2.16 57.90 ±2.06 55.56 ±0.08 45.81 ±0.35 49.68 ±0.10 71.20 ±0.11 42.64 ±0.07 12.60 ±0.03 69.00 ±0.04 15.40 ±0.03
microsoft/Phi-4-multimodal-instruct 5.6B 0.476 -0.26 9.4 43.81 ±16.23 26.18 ±5.77 21.29 ±3.57 55.24 ±41.81 63.68 ±72.63 27.56 ±7.56 68.93 ±47.91 48.56 ±1.85 57.03 ±2.36 40.09 ±2.66 52.16 ±0.09 35.48 ±0.32 41.76 ±0.09 79.24 ±0.10 52.68 ±0.08 12.20 ±0.03 31.00 ±0.04 5.00 ±0.02
nvidia/audio-flamingo-3-hf 8.2B 0.468 -0.23 9.6 73.74 ±9.09 48.31 ±7.41 57.50 ±6.22 76.10 ±46.43 73.64 ±9.60 94.59 ±6.21 92.27 ±23.11 33.01 ±1.65 43.68 ±2.03 22.34 ±2.25 44.63 ±0.07 37.42 ±0.32 38.12 ±0.09 58.36 ±0.10 54.72 ±0.09 53.00 ±0.04 48.40 ±0.04 3.20 ±0.02
LINAGORA/Canary-Qwen3-4B_data-v1_8h 4.8B 0.466 -0.24 9.4 21.70 ±13.80 10.65 ±1.84 10.15 ±1.40 22.07 ±55.31 42.83 ±55.71 13.37 ±24.17 31.16 ±8.81 46.40 ±1.39 47.57 ±1.93 45.23 ±2.00 59.84 ±0.08 48.39 ±0.33 52.56 ±0.09 78.56 ±0.10 43.60 ±0.07 6.80 ±0.02 29.20 ±0.04 3.60 ±0.02
mistralai/Voxtral-Mini-3B-2507 4.68B 0.352 -0.66 10.1 127.95 ±36.03 71.61 ±29.42 75.16 ±3.59 185.12 ±176.73 201.50 ±52.98 103.83 ±38.14 130.46 ±98.40 27.73 ±1.02 29.81 ±1.36 25.65 ±1.50 61.04 ±0.07 48.39 ±0.41 57.08 ±0.10 77.64 ±0.08 56.48 ±0.07 9.20 ±0.03 28.20 ±0.04 0.40 ±0.01
LINAGORA/Canary-Qwen3-1.7B_data-v1_8h 2.5B 0.334 -0.71 11.4 20.80 ±1.78 13.01 ±1.94 11.92 ±1.34 18.06 ±8.22 32.69 ±4.36 17.08 ±1.58 32.04 ±3.48 37.83 ±1.44 41.47 ±1.94 34.19 ±2.08 54.45 ±0.08 42.58 ±0.31 48.20 ±0.10 72.56 ±0.11 38.56 ±0.07 4.80 ±0.02 27.40 ±0.04 1.80 ±0.01
Qwen/Qwen2.5-Omni-7B 11B 0.289 -0.87 12.6 44.41 ±9.42 25.86 ±17.61 12.16 ±2.18 36.79 ±35.85 82.09 ±20.51 31.48 ±7.55 78.10 ±31.14 50.74 ±1.52 53.66 ±2.01 47.82 ±2.26 53.78 ±0.07 42.58 ±0.38 51.08 ±0.10 67.68 ±0.10 38.28 ±0.09 1.60 ±0.01 6.40 ±0.02 0.40 ±0.01
Qwen/Qwen2.5-Omni-3B 5.9B 0.221 -1.12 13.0 82.32 ±12.68 97.54 ±33.55 28.89 ±7.09 93.08 ±41.98 128.57 ±25.32 47.17 ±14.03 98.67 ±41.83 49.03 ±1.52 51.84 ±2.00 46.21 ±2.27 51.26 ±0.08 39.35 ±0.39 48.84 ±0.10 65.60 ±0.11 37.08 ±0.08 2.60 ±0.01 1.60 ±0.01 4.80 ±0.02

Languages · English

Tables
English — Models × Tasks
Model Size Min-Max Z-Score Avg Rank ASR (WER %) CommonVoice Fleurs VoxPopuli QUESTION ANSWERING (FLOW_JUDGE) NationalSpeechCorpus_SQA OpenHermes_audio SLUE-P2-SQA5 SpokenWOZ_AIR-Bench alpaca_audio fisher_AIR-Bench public-sg-speech MUSIC QUESTION ANSWERING (FLOW_JUDGE) MTJ-Jamendo_AIR-Bench MusicCaps-QA EMOTION RECOGNITION (FLOW_JUDGE) IEMOCAP-Emotion MELD_Emotion GENDER RECOGNITION (FLOW_JUDGE) CommonVoice_Gender IEMOCAP-gender AGE RECOGNITION (FLOW_JUDGE) AUDIO QUESTION ANSWERING (FLOW_JUDGE) AudioCaps-QA Clotho-AQA WavCaps-QA DIALOGUE SUMMARIZATION (FLOW_JUDGE) MATH QUESTION ANSWERING (ACC %)
nvidia/audio-flamingo-3-hf 8.2B 0.779 1.14 5.3 12.58 ±8.94 13.22 ±2.87 11.98 ±1.63 12.53 ±26.61 60.84 ±0.06 63.60 ±0.08 48.80 ±0.27 78.48 ±0.13 69.22 ±0.17 24.00 ±0.41 71.70 ±0.13 70.08 ±0.07 59.60 ±0.08 62.60 ±0.04 56.60 ±0.08 52.60 ±0.03 43.20 ±0.04 62.00 ±0.04 89.50 ±0.02 93.00 ±0.02 86.00 ±0.03 35.80 ±0.04 59.73 ±0.08 63.71 ±0.11 52.64 ±0.13 62.83 ±0.14 55.53 ±0.09 64.00 ±9.41
Qwen/Qwen2-Audio-7B-Instruct 8.4B 0.775 1.12 4.7 11.00 ±1.15 10.30 ±1.65 7.16 ±1.22 15.54 ±2.72 68.21 ±0.05 57.45 ±0.08 62.60 ±0.27 87.94 ±0.11 68.91 ±0.19 63.80 ±0.28 72.00 ±0.16 64.76 ±0.08 62.68 ±0.09 63.20 ±0.04 62.16 ±0.08 37.20 ±0.03 35.80 ±0.04 38.60 ±0.04 95.20 ±0.01 92.40 ±0.02 98.00 ±0.01 32.20 ±0.04 60.99 ±0.09 63.26 ±0.15 57.88 ±0.14 61.84 ±0.16 51.93 ±0.08 66.00 ±9.28
LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h 2.5B 0.531 0.26 5.6 6.67 ±1.67 8.31 ±1.35 5.74 ±0.94 5.96 ±4.72 68.50 ±0.06 59.75 ±0.09 63.00 ±0.21 81.67 ±0.14 69.12 ±0.20 71.80 ±0.23 70.30 ±0.17 63.88 ±0.08 32.32 ±0.08 18.80 ±0.03 45.84 ±0.08 23.10 ±0.03 25.80 ±0.04 20.40 ±0.04 57.50 ±0.03 68.20 ±0.04 46.80 ±0.04 42.60 ±0.04 50.80 ±0.09 45.69 ±0.14 61.20 ±0.16 45.53 ±0.16 47.47 ±0.09 66.00 ±9.28
microsoft/Phi-4-multimodal-instruct 5.6B 0.488 0.22 5.7 10.29 ±1.80 12.42 ±4.75 6.36 ±1.06 12.09 ±2.27 70.93 ±0.05 63.75 ±0.09 61.80 ±0.27 91.76 ±0.08 74.72 ±0.17 53.40 ±0.29 76.90 ±0.15 74.16 ±0.08 41.06 ±0.08 31.80 ±0.04 50.32 ±0.09 21.80 ±0.03 22.20 ±0.04 21.40 ±0.04 57.30 ±0.03 58.40 ±0.04 56.20 ±0.04 3.00 ±0.01 59.02 ±0.09 56.10 ±0.15 63.72 ±0.15 57.24 ±0.17 56.40 ±0.09 23.00 ±8.25
mistralai/Voxtral-Mini-3B-2507 4.68B 0.471 0.15 6.4 81.49 ±20.97 54.35 ±37.70 73.73 ±14.57 116.40 ±47.54 79.34 ±0.04 74.00 ±0.07 77.40 ±0.20 88.28 ±0.09 77.20 ±0.15 78.20 ±0.19 81.10 ±0.11 79.20 ±0.06 42.66 ±0.09 28.00 ±0.04 57.32 ±0.08 18.40 ±0.02 27.20 ±0.04 9.60 ±0.03 13.30 ±0.02 26.20 ±0.04 0.40 ±0.01 4.40 ±0.02 58.69 ±0.07 60.13 ±0.12 58.72 ±0.11 57.24 ±0.14 66.67 ±0.08 69.00 ±9.06
Qwen/Qwen2.5-Omni-7B 11B 0.454 0.07 7.6 41.77 ±10.57 75.07 ±20.40 20.60 ±3.87 29.65 ±23.53 72.47 ±0.05 63.05 ±0.09 68.40 ±0.19 90.10 ±0.09 71.30 ±0.19 69.40 ±0.22 73.70 ±0.16 71.32 ±0.08 38.16 ±0.06 41.00 ±0.04 35.32 ±0.08 14.20 ±0.02 22.60 ±0.04 5.80 ±0.02 22.10 ±0.03 8.80 ±0.02 35.40 ±0.04 0.00 ±0.00 59.19 ±0.09 58.08 ±0.16 65.92 ±0.14 53.55 ±0.17 55.33 ±0.08 89.00 ±6.13
LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h 2.1B 0.451 -0.02 7.2 6.78 ±0.73 8.37 ±1.48 6.09 ±0.90 5.87 ±1.31 60.28 ±0.06 54.55 ±0.09 53.60 ±0.25 66.96 ±0.17 59.48 ±0.20 62.40 ±0.25 64.70 ±0.17 60.28 ±0.09 35.18 ±0.08 23.40 ±0.04 46.96 ±0.09 17.70 ±0.02 16.20 ±0.03 19.20 ±0.03 52.10 ±0.03 65.40 ±0.04 38.80 ±0.04 41.20 ±0.04 52.27 ±0.09 47.60 ±0.14 61.52 ±0.15 47.70 ±0.16 44.87 ±0.09 49.00 ±9.80
LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h 2.1B 0.446 -0.02 6.3 6.59 ±0.65 8.27 ±1.38 5.90 ±0.91 5.62 ±0.99 62.88 ±0.06 55.85 ±0.09 56.00 ±0.26 71.42 ±0.17 65.18 ±0.20 63.80 ±0.25 64.90 ±0.18 63.00 ±0.08 36.92 ±0.08 27.60 ±0.04 46.24 ±0.08 22.80 ±0.03 15.60 ±0.03 30.00 ±0.04 57.30 ±0.03 65.60 ±0.04 49.00 ±0.04 40.80 ±0.04 50.90 ±0.09 44.41 ±0.14 61.84 ±0.16 46.45 ±0.16 41.93 ±0.10 34.00 ±9.28
LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h 2.1B 0.365 -0.28 9.6 7.26 ±0.73 8.67 ±1.46 6.75 ±0.94 6.36 ±1.33 61.51 ±0.06 54.00 ±0.09 54.40 ±0.24 71.37 ±0.17 64.77 ±0.19 57.80 ±0.26 67.00 ±0.17 61.20 ±0.08 32.16 ±0.08 20.60 ±0.04 43.72 ±0.09 17.60 ±0.02 14.40 ±0.03 20.80 ±0.04 47.60 ±0.03 62.20 ±0.04 33.00 ±0.04 38.40 ±0.04 50.46 ±0.10 44.66 ±0.15 60.32 ±0.16 46.38 ±0.16 44.60 ±0.09 12.00 ±6.37
LINAGORA/Canary-Qwen3-4B_data-v1_8h 4.8B 0.328 -0.31 9.7 9.43 ±1.71 10.89 ±1.54 7.49 ±0.98 9.92 ±4.78 69.28 ±0.05 59.70 ±0.09 67.40 ±0.25 86.13 ±0.12 65.18 ±0.20 70.40 ±0.25 69.80 ±0.17 66.36 ±0.08 33.36 ±0.07 24.80 ±0.04 41.92 ±0.08 13.00 ±0.02 18.80 ±0.03 7.20 ±0.02 33.50 ±0.03 17.60 ±0.03 49.40 ±0.04 1.40 ±0.01 43.41 ±0.09 39.68 ±0.14 49.56 ±0.15 40.99 ±0.15 49.40 ±0.09 66.00 ±9.28
Qwen/Qwen2.5-Omni-3B 5.9B 0.327 -0.34 9.6 68.47 ±13.86 120.36 ±27.27 29.73 ±7.38 55.32 ±29.51 70.84 ±0.05 64.95 ±0.09 64.80 ±0.18 88.97 ±0.10 67.56 ±0.20 68.60 ±0.23 71.10 ±0.17 69.92 ±0.08 31.04 ±0.07 21.00 ±0.04 41.08 ±0.08 15.20 ±0.02 24.80 ±0.04 5.60 ±0.02 11.10 ±0.02 9.20 ±0.03 13.00 ±0.03 1.00 ±0.01 52.93 ±0.08 53.42 ±0.15 58.28 ±0.13 47.11 ±0.16 55.20 ±0.09 85.00 ±7.00
LINAGORA/Canary_Luciole-1B_Step-Adapter_2h 2.1B 0.320 -0.43 10.1 10.30 ±1.52 11.58 ±3.99 11.18 ±1.55 8.15 ±1.52 56.32 ±0.06 49.30 ±0.09 62.20 ±0.23 60.98 ±0.18 48.39 ±0.19 59.60 ±0.27 53.30 ±0.17 60.48 ±0.08 36.76 ±0.07 29.60 ±0.04 43.92 ±0.08 13.90 ±0.02 14.40 ±0.03 13.40 ±0.03 54.20 ±0.03 61.40 ±0.04 47.00 ±0.04 37.80 ±0.04 45.98 ±0.09 40.77 ±0.14 53.68 ±0.15 43.49 ±0.15 41.27 ±0.08 21.00 ±7.98
LINAGORA/Canary_Luciole-1B_Step-Encoder_8h 2.1B 0.316 -0.44 10.2 10.53 ±8.36 15.93 ±24.98 8.25 ±1.14 7.40 ±1.27 54.89 ±0.06 48.80 ±0.09 58.60 ±0.22 63.14 ±0.18 46.94 ±0.18 57.20 ±0.25 50.70 ±0.18 58.88 ±0.09 35.80 ±0.07 28.80 ±0.04 42.80 ±0.08 15.70 ±0.02 13.40 ±0.03 18.00 ±0.03 50.90 ±0.03 54.80 ±0.04 47.00 ±0.04 38.40 ±0.04 45.35 ±0.09 39.62 ±0.14 52.76 ±0.15 43.68 ±0.15 43.93 ±0.09 18.00 ±7.53
LINAGORA/Canary_Luciole-1B_Step-Encoder_80h 2.1B 0.310 -0.47 10.4 11.33 ±7.87 17.96 ±23.52 8.50 ±1.13 7.52 ±1.51 55.49 ±0.06 49.30 ±0.09 60.20 ±0.22 62.79 ±0.18 47.98 ±0.19 58.20 ±0.26 50.50 ±0.17 59.48 ±0.08 36.78 ±0.07 31.20 ±0.04 42.36 ±0.08 15.60 ±0.02 12.80 ±0.03 18.40 ±0.03 49.20 ±0.03 52.00 ±0.04 46.40 ±0.04 41.00 ±0.04 44.21 ±0.09 39.62 ±0.13 51.16 ±0.15 41.84 ±0.15 41.07 ±0.09 21.00 ±7.98
LINAGORA/Canary-Qwen3-1.7B_data-v1_8h 2.5B 0.227 -0.65 11.7 12.09 ±0.95 12.26 ±1.89 11.05 ±1.12 12.97 ±1.80 62.17 ±0.06 54.95 ±0.09 57.00 ±0.22 77.35 ±0.15 57.62 ±0.20 60.60 ±0.26 63.90 ±0.18 63.80 ±0.08 32.98 ±0.07 23.80 ±0.04 42.16 ±0.08 13.70 ±0.02 22.00 ±0.04 5.40 ±0.02 36.30 ±0.03 40.40 ±0.04 32.20 ±0.04 2.20 ±0.01 43.49 ±0.09 38.34 ±0.14 52.72 ±0.15 39.41 ±0.15 41.60 ±0.09 40.00 ±9.60

Languages · Others

Tables
Others — Models × Tasks
Model Size Min-Max Z-Score Avg Rank ASR (WER %) Fleurs Multilingual_TEDx AST (METEOR %) Multilingual_TEDx (ES→FR) Multilingual_TEDx (ES→IT) AUDIO CAPTIONING (FLOW_JUDGE) AudioCaps WavCaps MUSIC CAPTIONING (FLOW_JUDGE) SPOKEN LANGUAGE IDENTIFICATION (FLOW_JUDGE)
Qwen/Qwen2-Audio-7B-Instruct 8.4B 0.875 1.13 4.6 24.43 ±2.19 14.99 ±2.14 23.31 ±5.43 45.84 ±1.73 46.22 ±2.08 45.46 ±3.11 53.70 ±0.06 54.36 ±0.09 53.04 ±0.10 59.76 ±0.04 88.40 ±0.03
Qwen/Qwen2.5-Omni-7B 11B 0.673 0.44 6.4 32.39 ±8.72 15.68 ±3.65 54.62 ±32.25 41.12 ±1.85 42.80 ±2.25 39.45 ±3.23 54.68 ±0.07 56.60 ±0.09 52.76 ±0.10 37.40 ±0.07 79.80 ±0.04
nvidia/audio-flamingo-3-hf 8.2B 0.647 0.32 6.6 91.53 ±8.97 96.60 ±9.93 80.80 ±43.95 14.94 ±1.55 17.21 ±1.99 12.67 ±2.35 57.76 ±0.06 64.00 ±0.06 51.52 ±0.10 55.48 ±0.11 93.00 ±0.02
LINAGORA/Canary_Qwen3-1.7B_LLM-LoRA_8h 2.5B 0.616 0.30 5.8 12.22 ±2.48 7.52 ±1.64 14.88 ±3.79 55.40 ±1.63 55.08 ±1.87 55.72 ±3.13 33.46 ±0.05 31.80 ±0.07 35.12 ±0.08 36.36 ±0.06 72.60 ±0.04
LINAGORA/Canary_Luciole-1B_LLM-LoRA_40h 2.1B 0.553 0.10 5.8 9.96 ±2.44 6.97 ±1.60 13.76 ±16.70 59.74 ±1.65 59.98 ±1.92 59.50 ±3.11 31.22 ±0.05 30.60 ±0.07 31.84 ±0.08 30.40 ±0.06 63.40 ±0.04
LINAGORA/Canary_Luciole-1B_LLM-LoRA-no_bucket_8h 2.1B 0.543 0.07 7.2 11.23 ±1.66 7.83 ±1.60 14.94 ±4.32 58.38 ±1.63 59.08 ±1.92 57.69 ±3.04 27.72 ±0.04 26.56 ±0.06 28.88 ±0.07 34.24 ±0.06 61.60 ±0.04
microsoft/Phi-4-multimodal-instruct 5.6B 0.529 -0.00 8.8 41.02 ±5.26 29.88 ±9.76 34.03 ±8.53 36.07 ±2.05 31.28 ±2.46 40.85 ±3.57 47.34 ±0.06 48.40 ±0.08 46.28 ±0.09 44.24 ±0.06 50.80 ±0.04
LINAGORA/Canary_Luciole-1B_Step-Adapter_2h 2.1B 0.524 0.02 7.6 19.53 ±2.93 13.77 ±7.87 23.04 ±6.52 54.22 ±1.71 55.45 ±1.97 53.00 ±3.29 28.64 ±0.05 25.96 ±0.05 31.32 ±0.07 45.64 ±0.05 44.60 ±0.04
LINAGORA/Canary_Luciole-1B_Step-Encoder_8h 2.1B 0.520 0.01 7.8 18.49 ±2.69 11.70 ±2.94 20.83 ±6.52 53.47 ±1.69 55.72 ±1.93 51.22 ±3.24 28.62 ±0.04 26.84 ±0.06 30.40 ±0.07 45.76 ±0.05 43.60 ±0.04
LINAGORA/Canary_Luciole-1B_LLM-LoRA_8h 2.1B 0.518 -0.01 8.0 10.03 ±1.58 7.44 ±1.54 11.43 ±1.85 59.54 ±1.62 59.19 ±1.91 59.89 ±3.01 27.38 ±0.05 25.68 ±0.05 29.08 ±0.07 29.72 ±0.06 61.20 ±0.04
LINAGORA/Canary_Luciole-1B_Step-Encoder_80h 2.1B 0.512 -0.02 8.4 17.19 ±4.37 11.82 ±2.88 21.12 ±6.92 53.54 ±1.75 55.06 ±2.03 52.02 ±3.31 27.62 ±0.04 25.28 ±0.05 29.96 ±0.07 45.32 ±0.05 43.20 ±0.04
Qwen/Qwen2.5-Omni-3B 5.9B 0.461 -0.24 9.2 54.28 ±10.89 22.88 ±5.74 82.28 ±48.63 40.50 ±1.77 39.10 ±2.19 41.91 ±3.00 35.88 ±0.06 37.48 ±0.09 34.28 ±0.09 36.28 ±0.07 63.20 ±0.04
mistralai/Voxtral-Mini-3B-2507 4.68B 0.409 -0.47 8.2 139.30 ±27.78 96.91 ±16.09 173.54 ±93.23 23.46 ±1.18 21.31 ±1.35 25.62 ±2.25 38.90 ±0.05 38.56 ±0.07 39.24 ±0.07 50.60 ±0.07 75.60 ±0.04
LINAGORA/Canary-Qwen3-4B_data-v1_8h 4.8B 0.316 -0.66 12.0 18.72 ±11.42 12.14 ±2.25 26.31 ±14.69 35.30 ±1.69 34.56 ±2.00 36.04 ±3.14 22.38 ±0.03 21.24 ±0.03 23.52 ±0.05 33.88 ±0.07 40.60 ±0.04
LINAGORA/Canary-Qwen3-1.7B_data-v1_8h 2.5B 0.228 -0.97 13.6 38.06 ±3.23 58.79 ±3.43 101.99 ±11.44 28.99 ±1.64 27.86 ±1.93 30.12 ±3.08 21.74 ±0.03 21.04 ±0.03 22.44 ±0.05 28.28 ±0.06 42.80 ±0.04