Zyphra Releases ZAYA1-8B

08-05-2026

ZAYA1-8B is an open-weight MoE reasoning model that delivers frontier-class math and coding performance with under one billion active parameters.

Written by:

Jorick van Weelie

Jorick van Weelie | Marketing Lead & AI Pioneer at DataNorth AI Jorick specializes in translating complex AI architectures into actionable business strategies.

zyphra releases zaya1 8b reasoning model

May 8, 2026

Zyphra has released ZAYA1-8B, a mixture-of-experts (MoE) reasoning model with 8.4 billion total parameters and only 760 million active parameters per inference pass. Trained entirely on AMD Instinct MI300X hardware, ZAYA1-8B matches or exceeds substantially larger open-weight and proprietary models on mathematics and coding benchmarks. The model is available under an Apache 2.0 license on Hugging Face and as a serverless endpoint on Zyphra Cloud.

What is ZAYA1-8B and what can it do?

ZAYA1-8B is a small but highly capable reasoning model built on Zyphra’s MoE++ architecture. Despite activating only 760 million parameters per token, it scores:

89.1 on AIME ’26,
71.6 on HMMT Feb. ’26,
59.3 on IMO-AnswerBench,
65.8 on LiveCodeBench-v6,
1.0 on GPQA-Diamond.

These results place it ahead of models like Qwen3-4B-Thinking-2507 and Gemma-4-E4B-it across all mathematics and coding categories.

What makes ZAYA1-8B particularly notable is its efficiency. With 760 million active parameters, it surpasses Mistral-Small-4-119B (which uses 6 billion active parameters out of 119 billion total) on math and coding benchmarks specifically, scoring:

89.1 vs 86.4 on AIME ’26,
63.8 vs 57.9 on LiveCodeBench-v6.

Mistral-Small-4-119B retains advantages on knowledge-heavy benchmarks like GPQA-Diamond (77.2 vs 71.0) and MMLU-Pro (81.6 vs 74.2), where breadth matters more than reasoning depth.

ZAYA1-8B architecture and technical innovations

ZAYA1-8B is built on Zyphra’s MoE++ architecture, which introduces three specific changes over standard MoE designs. First, Compressed Convolutional Attention (CCA) operates in a compressed latent space and achieves 8x KV-cache compression versus standard attention, directly lowering memory requirements at inference time. Second, an MLP-based router with PID-controller bias balancing replaces the standard linear projection router, improving routing stability and preventing the load imbalance that is a known failure mode in MoE training. Third, learned residual scaling controls residual-norm growth through depth at negligible parameter and compute cost.

The full training pipeline ran on a cluster of 1,024 AMD Instinct MI300X nodes connected via AMD Pensando Pollara interconnect, in a custom training cluster built with IBM. ZAYA1-8B is the first MoE model to be pretrained, midtrained, and supervised fine-tuned entirely on AMD hardware.

Markovian RSA: how ZAYA1-8B scales reasoning at test time

Alongside the model, Zyphra introduces Markovian RSA, a novel test-time compute method that combines two existing ideas in a new way. Recursive Self-Aggregation (RSA) generates multiple reasoning traces in parallel and aggregates them recursively across iterations. The Markovian thinker idea performs reasoning in fixed-duration chunks, passing only the tail end of the previous chunk to the next, keeping the context window bounded regardless of how long the model reasons.

With Markovian RSA at an extra-high test-time compute budget of 5.5 million tokens per problem, ZAYA1-8B outperforms DeepSeek-V3.2 and GPT-OSS-High on the challenging APEX-shortlist mathematics benchmark with a score of 32.2. This demonstrates that even a small model can approach frontier-level performance when given sufficient compute at inference time.

ZAYA1-8B availability, pricing, and licensing

ZAYA1-8B is available immediately as a free serverless endpoint on Zyphra Cloud at cloud.zyphra.com, with model weights downloadable from Hugging Face under an Apache 2.0 license. The open-weight release means developers and researchers can run the model locally, fine-tune it for specific use cases, or integrate it into existing pipelines without licensing restrictions.

Given its small active parameter count of 760 million, ZAYA1-8B is particularly suited for deployment on edge devices and in resource-constrained environments where larger models would be impractical. Zyphra has also announced a partnership with AMD to power Zyphra Cloud on AMD Instinct MI355X GPUs going forward.

The full technical report and model weights for ZAYA1-8B are available on Zyphra’s website at zyphra.com/post/zaya1-8b and on Hugging Face.