As enterprise organizations accelerate adoption of AI-assisted development workflows, conversations are rapidly shifting from “Can we use AI?” to “How do we scale AI responsibly and cost effectively?” With tools like GitHub Copilot moving toward usage-based token billing, engineering leaders now need to think about AI consumption the same way they think about cloud spend, observability, or infrastructure efficiency. The organizations that build efficient AI workflows early will have a major operational advantage as token economics become increasingly important.
Token Saving Measures
With the recent announcement that GitHub Copilot is going to be switching to a per-token billing model, there’s been a mad scramble to identify ways to reduce token usage without losing productivity. Let’s explore what this means and some of the saving measures we’ve tested.
The Tale of Two Token Bills
When using a per-token pricing model, you get billed for two different categories of tokens: input tokens and output tokens.
- Input tokens are tokens you send to the model
- Output tokens are tokens the model sends back
When you’re talking to the AI, those are input tokens. When the AI is talking to you, those are output tokens.
Most services charge different rates for the two.
The Input Problem
For those of us using agentic AI, there’s a lot of feeding files and CLI outputs to the AI. Most of this information is formatted and constructed for human consumption to make it prettier or easier to read. The problem is this massively balloons your input size. Which you get charged for. Cha-ching.
I recently found a really cool tool thanks to a LinkedIn post called Rust Token Killer (RTK). This is a tool installed locally on your machine along with a skills file injected into your context that tells the AI to prepend shell calls like ls, Git commands, tests, builds, and other terminal operations with RTK.
RTK then filters the output down to only what the AI actually needs.
My current RTK stats are:
- Total commands: 287
- Input tokens: 426.8K
- Output tokens: 17.0K
- Tokens saved: 410.3K (96.1%)
- Total exec time: 10m35s (avg 2.2s)
- Efficiency meter: ███████████████████████░ 96.1%
You read that right. Instead of sending 426,000 tokens, I sent 17,000 tokens. That’s an order of magnitude reduction in token usage.
For enterprise organizations running AI-assisted CI/CD pipelines, developer copilots, or internal engineering agents, this type of optimization becomes extremely meaningful at scale. Reducing unnecessary token transfer not only lowers cost, it can also improve latency, throughput, and context quality for models operating in large codebases.
The Output Problem
It should come as no surprise: the AI likes to chat.
Between excessive preambles, hedging, unnecessary pleasantries, verbosity, and filler language, output tokens add up fast. Unfortunately, this problem is trickier because once the model generates tokens, you’re already paying for them. Unlike input filtering, you cannot retroactively reduce output costs.
The best we can do is encourage the model to speak more efficiently.
There’s a tool called caveman that’s been making the LinkedIn rounds with the intention of forcing the AI to talk like a caveman. From my experience, the style shift is so extreme that many LLMs eventually ignore the instructions.
I wrote a simpler skill called cable. I originally wanted to call it “telegram,” but that name was already taken.
The goal was not to force the AI into awkward or obtuse language. The goal was simply to meter output more effectively, much like machine learning models are trained with rewards and cost functions.
This is the entirety of the prompt:
—
name: cable
description: >
Cable-style output: every word earns its place.
Activate: /cable, “cable mode”, “less tokens”, “be brief”.
Off: “normal mode”.
—
Every word earns its place. Paid per word — make them count.
**DROP:** filler (just/really/basically/actually), articles (a/an/the), hedging (might/perhaps/seems), pleasantries (sure/happy to/great question/of course), preamble (here’s what I found/let me explain), postamble (let me know/hope this helps), question restatement.
**KEEP:** all technical content, warnings, exact errors, decisions, code blocks verbatim.
**FORMAT:** bullets > prose. Fragments OK.
❌ “Sure! I’d be happy to help. The issue you’re experiencing is likely caused by…”
✅ “Auth bug. Token check `<` → `<=`. Fix:”
**Off:** “normal mode” / “stop cable”
Yup. All of 200 tokens.
One of my other issues with caveman was how verbose the prompt itself was. It clocks in at over 900 tokens. In testing with Claude Sonnet 4.6, cable produced roughly a 50% reduction in output tokens while retaining approximately 98% of signal words.
Translation: it kept the meat of the conversation while using half the words.
Better Together
So where does this leave us?
With a tool that can significantly reduce input tokens and a lightweight prompt that can dramatically reduce output verbosity, we’re starting to see practical patterns for sustainable AI cost management.
Even if we conservatively estimate:
- 50% reduction in input tokens
- 50% reduction in output tokens
…the impact is still massive at enterprise scale.
Sounds like a 100% decrease to me, so what are we even paying for? Cutting AI costs by 50% is a substantial operational improvement. I know some people like to flex their $100,000+ AI bills. I’m not one of those.
When I’m working for clients, those costs ultimately affect them. If we can deliver the same or better outcomes faster and cheaper, why wouldn’t that be a good thing?
As enterprise organizations continue operationalizing AI development workflows, token efficiency is quickly becoming a real engineering discipline. The future likely belongs to teams that treat prompts, context management, and AI orchestration with the same rigor they already apply to performance optimization and cloud infrastructure management.
At RBA, we help organizations move beyond AI experimentation into scalable, operational AI implementation strategies that balance innovation, governance, cost management, and developer productivity. As usage-based AI pricing models continue evolving, organizations that optimize early will be far better positioned to scale AI sustainably.
Resources
Copilot Usage-Based Billing Announcement: GitHub Copilot Usage-Based Billing – https://github.blog/news-insights/company-news/github-copilot-is-moving-to-usage-based-billing/
RTK: Rust Token Killer (RTK) – https://github.com/rtk-ai/rtk
Cable: Cable Prompt Skill – https://github.com/rsarv-rba/cable
About the Author
Robby Sarvis
Senior Software Engineer
Robby is a full-stack developer at RBA with a deep passion for crafting mobile applications and enhancing user experiences. With a robust skill set that encompasses both front-end and back-end development, Robby is dedicated to leveraging technology to create solutions that exceed client expectations.
Residing in a small town in Texas, Robby enjoys a balanced life that includes his wife, children, and their charming dogs.