Validated compute savings across MoE architectures
Updated: January 2025| Model | Total Params | Experts | K (base โ adaptive) | Compute Savings | Status |
|---|---|---|---|---|---|
| Mixtral 8x7B | 46.7B | 8 | 8 โ 3.80 | 52.5% | โ Validated |
| Qwen1.5-MoE | 14.3B | 60 | 4 โ 2.70 | 32.4% | โ Validated |
| OLMoE 1B-7B | 6.9B | 64 | 8 โ 6.0 (per-layer) | 24.7% | โ Validated |
| DeepSeek-V3 | 671B | 256 | 8 โ TBD | Pending | โ Pending |
| DBRX | 132B | 16 | 4 โ TBD | ~30% est. | ~ Estimated |
I modelli AI moderni come DeepSeek-V3 e Mixtral usano una tecnologia chiamata Mixture of Experts (MoE): invece di usare tutto il "cervello" del modello per ogni domanda, attivano solo alcuni "esperti" specializzati.
Il problema? Attivano sempre lo stesso numero di esperti, anche per domande semplici come "Che ore sono?" che non richiedono tutta quella potenza di calcolo.
Adaptive-K risolve questo: analizza la "confidenza" del modello e decide automaticamente quanti esperti attivare. Domanda facile โ pochi esperti. Domanda complessa โ piรน esperti.
| Provider | Model | Params | Experts | Top-K | Price |
|---|---|---|---|---|---|
| Together | DeepSeek-V3.1 | 671B | 256 | 8 | $0.60/1M |
| Together | Qwen3-235B-MoE | 235B | 128 | 22 | $0.65/1M |
| Together | Qwen3-Coder-480B | 480B | 160 | 35 | $2.00/1M |
| Together | Cogito-109B-MoE | 109B | 64 | 8 | $0.18/1M |
| OpenRouter | Mixtral-8x7B | 46.7B | 8 | 2 | $0.24/1M |
| DeepSeek | deepseek-chat | 671B | 256 | 8 | $0.14/1M |
High volume, many simple queries
Savings: 40-50%
Variable complexity extraction
Savings: 30-40%
High complexity tasks
Savings: 20-30%
Massive scale, cost-sensitive
Savings: 45-55%
Integrate Adaptive-K routing in minutes with our SDK
Learn More pip install adaptive-k-routing Read Whitepaper