RBA Consulting
RBA Consulting
RBA Consulting

State of the Local Models – Q2 2026

The AI space is moving incredibly quickly, and the local model ecosystem is evolving just as fast. One of my ongoing hobbies (riveting, I know) is benchmarking and testing local models to see what actually holds up in real-world usage.

Every so often, something surfaces that replaces my daily driver. Below are some of the most notable local models released recently, along with hands-on benchmark insights.

LFM2 24B

https://ollama.com/library/lfm2

This hybrid model from Liquid AI emphasizes efficiency and extremely fast inference.

And it delivers.

If you want something that feels closer to cloud-level responsiveness while running locally, this is one of the closest comparisons I’ve seen. However, that speed comes at a cost. On the HumanEval benchmark, it scored just 16% across 50 questions.

Takeaway:

  • Excellent for lightweight tasks or latency-sensitive applications
  • Not reliable for coding or agentic workflows without fine-tuning

This is a classic example of optimizing for speed and memory footprint over reasoning capability.

LFM 2.5 1.2B Thinking

https://ollama.com/library/lfm2.5-thinking

Another hybrid from Liquid AI, but this one is dramatically smaller at 1.2B parameters and only 731MB in memory.

Despite its size, it achieved 58% on HumanEval, outperforming its larger sibling.

What makes it interesting:

  • Runs easily on edge devices (phones, laptops, low-spec hardware)
  • Large context window relative to size
  • Strong candidate for embedded or distributed AI use cases

This is where things start getting interesting for enterprise edge deployments.

Qwen 3.5 27B

https://ollama.com/library/qwen3.5

From Alibaba Cloud, Qwen 3.5 continues to be one of the most well-rounded model families.

The 27B variant scored 92% on HumanEval, making it one of the strongest performers in this group.

Strengths:

  • High accuracy across coding and reasoning tasks
  • Strong “ChatGPT-like” conversational experience
  • Performs well in agentic workflows

Trade-off:

  • Slower inference compared to smaller or optimized models

In many enterprise scenarios, that trade-off is acceptable for the gain in reliability.

Qwen 3.5 9B

https://ollama.com/library/qwen3.5

A smaller version of the above, scoring 86% on HumanEval.

Why it matters:

  • More efficient use of hardware
  • Larger available context windows
  • Strong balance between performance and resource usage

This model was my go-to for months, especially in constrained environments.

Gemma4 26B

https://ollama.com/library/gemma4

Developed by Google DeepMind, Gemma4 models are consistently strong performers.

The 26B version scored 88% on HumanEval, with faster inference than Qwen 3.5.

Positioning:

Solid middle ground between speed and reasoning

Strong candidate for production-grade local deployments

Gemma4 e4b

https://ollama.com/library/gemma4

This is currently my go-to model.

A 4B parameter model with a 9.6GB footprint, it scored an impressive 94% on HumanEval, outperforming significantly larger models. 

Why it stands out:

  • Excellent reasoning for its size
  • Fast inference
  • Performs well in both coding and non-coding agentic flows

If you’re working with constrained hardware but still need strong performance, this is one of the best options available right now.

Nemotron Cascade 2 30B

https://ollama.com/library/nemotron-cascade-2

From NVIDIA, this mixture-of-experts model delivers decent performance and speed.

However, it scored 74% on HumanEval, which felt lower than expected.

There’s a strong chance some of this is tied to benchmarking methodology, particularly around response parsing.

Still worth watching, especially as MoE architectures continue to evolve.

What This Means for Enterprise AI Strategy

This isn’t just hobbyist experimentation anymore. The implications for enterprise organizations are significant.

1. The End of One-Size-Fits-All AI

There is no single “best” model. The landscape is shifting toward task-specific optimization:

  • Fast inference for real-time applications
  • Lightweight models for edge and embedded systems
  • High-reasoning models for complex workflows

2. Cost and Infrastructure Control

Local models give enterprises:

  • Reduced reliance on external APIs
  • Lower long-term inference costs
  • Greater control over data privacy and compliance

This is especially critical in regulated industries or environments with sensitive data.

3. Edge and Distributed AI Is Becoming Real

Models like LFM 2.5 1.2B make it viable to:

  • Run AI directly on devices
  • Reduce latency to near zero
  • Enable offline or semi-connected use cases

4. Benchmarking Is Now a Core Capability

The rate of change means:

  • Model performance shifts quarterly, not yearly
  • Continuous benchmarking is required
  • Static AI architecture decisions quickly become outdated

Enterprises that treat model evaluation as a one-time decision will fall behind.

Conclusion

What this deep dive reveals is not the emergence of a single dominant model, but the necessity of intentional model selection.

The trade-offs between speed, size, and accuracy are becoming more nuanced, not less. And that’s a good thing.

We now have the ability to design AI systems that are:

  • Faster
  • More efficient
  • More tailored to specific business needs

But that flexibility comes with a requirement: continuous testing, benchmarking, and iteration.

Build a Smarter Local AI Strategy

If your organization is exploring local AI models or trying to reduce dependency on external AI providers, the opportunity is significant—but so is the complexity.

At RBA, we help enterprise teams:

  • Evaluate and benchmark models against real use cases
  • Design scalable local and hybrid AI architectures
  • Align AI capabilities with measurable business outcomes

Let’s build a local AI strategy that actually fits your environment.

About the Author

Robby Sarvis
Robby Sarvis

Senior Software Engineer

Robby is a full-stack developer at RBA with a deep passion for crafting mobile applications and enhancing user experiences. With a robust skill set that encompasses both front-end and back-end development, Robby is dedicated to leveraging technology to create solutions that exceed client expectations.

Residing in a small town in Texas, Robby enjoys a balanced life that includes his wife, children, and their charming dogs.