State of the Local Models – Q2 2026
The AI space is moving incredibly quickly, and the local model ecosystem is evolving just as fast. One of my ongoing hobbies (riveting, I know) is benchmarking and testing local models to see what actually holds up in real-world usage.
Every so often, something surfaces that replaces my daily driver. Below are some of the most notable local models released recently, along with hands-on benchmark insights.
LFM2 24B
https://ollama.com/library/lfm2
This hybrid model from Liquid AI emphasizes efficiency and extremely fast inference.
And it delivers.
If you want something that feels closer to cloud-level responsiveness while running locally, this is one of the closest comparisons I’ve seen. However, that speed comes at a cost. On the HumanEval benchmark, it scored just 16% across 50 questions.
Takeaway:
- Excellent for lightweight tasks or latency-sensitive applications
- Not reliable for coding or agentic workflows without fine-tuning
This is a classic example of optimizing for speed and memory footprint over reasoning capability.
LFM 2.5 1.2B Thinking
https://ollama.com/library/lfm2.5-thinking
Another hybrid from Liquid AI, but this one is dramatically smaller at 1.2B parameters and only 731MB in memory.
Despite its size, it achieved 58% on HumanEval, outperforming its larger sibling.
What makes it interesting:
- Runs easily on edge devices (phones, laptops, low-spec hardware)
- Large context window relative to size
- Strong candidate for embedded or distributed AI use cases
This is where things start getting interesting for enterprise edge deployments.
Qwen 3.5 27B
https://ollama.com/library/qwen3.5
From Alibaba Cloud, Qwen 3.5 continues to be one of the most well-rounded model families.
The 27B variant scored 92% on HumanEval, making it one of the strongest performers in this group.
Strengths:
- High accuracy across coding and reasoning tasks
- Strong “ChatGPT-like” conversational experience
- Performs well in agentic workflows
Trade-off:
- Slower inference compared to smaller or optimized models
In many enterprise scenarios, that trade-off is acceptable for the gain in reliability.
Qwen 3.5 9B
https://ollama.com/library/qwen3.5
A smaller version of the above, scoring 86% on HumanEval.
Why it matters:
- More efficient use of hardware
- Larger available context windows
- Strong balance between performance and resource usage
This model was my go-to for months, especially in constrained environments.
Gemma4 26B
https://ollama.com/library/gemma4
Developed by Google DeepMind, Gemma4 models are consistently strong performers.
The 26B version scored 88% on HumanEval, with faster inference than Qwen 3.5.
Positioning:
Solid middle ground between speed and reasoning
Strong candidate for production-grade local deployments
Gemma4 e4b
https://ollama.com/library/gemma4
This is currently my go-to model.
A 4B parameter model with a 9.6GB footprint, it scored an impressive 94% on HumanEval, outperforming significantly larger models.
Why it stands out:
- Excellent reasoning for its size
- Fast inference
- Performs well in both coding and non-coding agentic flows
If you’re working with constrained hardware but still need strong performance, this is one of the best options available right now.
Nemotron Cascade 2 30B
https://ollama.com/library/nemotron-cascade-2
From NVIDIA, this mixture-of-experts model delivers decent performance and speed.
However, it scored 74% on HumanEval, which felt lower than expected.
There’s a strong chance some of this is tied to benchmarking methodology, particularly around response parsing.
Still worth watching, especially as MoE architectures continue to evolve.
What This Means for Enterprise AI Strategy
This isn’t just hobbyist experimentation anymore. The implications for enterprise organizations are significant.
1. The End of One-Size-Fits-All AI
There is no single “best” model. The landscape is shifting toward task-specific optimization:
- Fast inference for real-time applications
- Lightweight models for edge and embedded systems
- High-reasoning models for complex workflows
2. Cost and Infrastructure Control
Local models give enterprises:
- Reduced reliance on external APIs
- Lower long-term inference costs
- Greater control over data privacy and compliance
This is especially critical in regulated industries or environments with sensitive data.
3. Edge and Distributed AI Is Becoming Real
Models like LFM 2.5 1.2B make it viable to:
- Run AI directly on devices
- Reduce latency to near zero
- Enable offline or semi-connected use cases
4. Benchmarking Is Now a Core Capability
The rate of change means:
- Model performance shifts quarterly, not yearly
- Continuous benchmarking is required
- Static AI architecture decisions quickly become outdated
Enterprises that treat model evaluation as a one-time decision will fall behind.
Conclusion
What this deep dive reveals is not the emergence of a single dominant model, but the necessity of intentional model selection.
The trade-offs between speed, size, and accuracy are becoming more nuanced, not less. And that’s a good thing.
We now have the ability to design AI systems that are:
- Faster
- More efficient
- More tailored to specific business needs
But that flexibility comes with a requirement: continuous testing, benchmarking, and iteration.
Build a Smarter Local AI Strategy
If your organization is exploring local AI models or trying to reduce dependency on external AI providers, the opportunity is significant—but so is the complexity.
At RBA, we help enterprise teams:
- Evaluate and benchmark models against real use cases
- Design scalable local and hybrid AI architectures
- Align AI capabilities with measurable business outcomes
Let’s build a local AI strategy that actually fits your environment.
About the Author
Robby Sarvis
Senior Software Engineer
Robby is a full-stack developer at RBA with a deep passion for crafting mobile applications and enhancing user experiences. With a robust skill set that encompasses both front-end and back-end development, Robby is dedicated to leveraging technology to create solutions that exceed client expectations.
Residing in a small town in Texas, Robby enjoys a balanced life that includes his wife, children, and their charming dogs.