Hacker news

  • Top
  • New
  • Past
  • Ask
  • Show
  • Jobs

Show HN: Lowfat – pluggable CLI filter that saved 91.8% of my LLM tokens (https://github.com)

99 points by zdkaster about 14 hours ago | 53 comments | View on ycombinator

alex7o about 10 hours ago |

I would like to have deeper comparison with alternatives like rtk, which are already fast and written in rust, also the previous comments mentioned something that has been a know problem with rtk that it sometimes strips the thing that the llm needs (or expects, causing more work to need to happan not less)

jemmyw about 9 hours ago |

I've tried rtx and lean-ctx and these tools seem to end up confusing the agent more than helping. Any saving is irrelevant if the agent decides to work around the tool and makes even more calls than it would otherwise.

I don't know about cost saving, but if it's keeping the context size down I've had a lot better results using subagents to keep a higher order conversation clean for longer.

threecheese about 9 hours ago |

The docs are missing any examples of what this does, instead showing _how_ it works - and only for the codebase itself, rather than the behavior of the app.

What would be useful:

  - examples of text that can be filtered, and why that would be valuable
  - a data flow diagram of runtime behavior, showing how filtering removes unnecessary context

wood_spirit about 9 hours ago |

I have my own llm wrapping harness, which does this and has a few more tricks. For example, it doesn’t have a lot of mcp but it does have search_mcp and load_mcp tools (and search_skills) so the llm can find what it needs when it needs it without bloating the normal baseline context. The LLMs have proved really good at using them. There is also a waypoint tool they can use to record their thinking in the context without it being the final output. Am thinking about a search_expert to find colleagues it can bring into conversations too. And a lot of other stuff.

Pro tip they worked well for me with response truncation: in the truncated output, say that the full text is available in /tmp/whereever.txt - that way, the llm will be able to query and read more using built in tools without reissuing the big tool call.

devdoc83 about 12 hours ago |

How do you handle the risk of stripping out the exact stack trace the agent needed? That seems like the hard tradeoff here.

clutter55561 about 2 hours ago |

Tools that remove the fat seem like a good idea, but I’m highly suspicious of their effect on the LLM’s reasoning.

LLMs were trained in the typical full-fat output found everywhere on the internet, and all of sudden they get a slightly different response that may look like nothing they have seen before.

Does that really save tokens in the long run?

itsdesmond about 9 hours ago |

Have terms been established to describe these types of tools? How do I refer to small utilities to perform specific transformations to LLM behavior? CLI filter seems pretty good to describe this tool conversationally but not so much when searching, they some low cardinality keywords.

rahulyc about 5 hours ago |

Great idea. I'm thinking if it could make sense to send the output to a cheap / local model to filter out only the bits that "matter" and pass that through - for the cost some extra time, but maybe it's worth it for saving tokens in the larger model.

davidetroiani about 2 hours ago |

Add a comparison table between your repo and alternatives like rtk. I’m interested.

cityofdelusion about 8 hours ago |

This is a nice little project but I’m weary of sensationally inaccurate titles for stuff like this and the infamous caveman mode. It doesn’t save 91% of tokens: it reduced in one user case 91% of output tokens on the raw CLI output. I am being pedantic about this because these sorts of claims go viral and are inaccurate.

A proper benchmark will compare a large sample of identical prompting with and without the tool, against a specific harness. Once you apply Amdahl’s law, there is no way this saves 91% of tokens holistically, which the title implies.

I work in a non-tech company and these sorts of things keep going viral, with no understanding and with no comprehension of what is actually going on. Engineering is gone and cargo cult magical incantations are in.

fcanesin about 9 hours ago |

I am thinking that a small tool that simply refuses to pass large CLI output to the LLM and warns it to filter the results before reading would achieve this better as the LLM would be forced into thinking and writting the filter itself.

sakuraiben about 3 hours ago |

Would be interested to see what kind of eval results you get from this

avocadoking about 7 hours ago |

Do you have any insight if LLMs sometimes get confused by your filters?

neuralkoi about 1 hour ago |

Great! Now, you should slap a logo to this, boostrap this as a service, and get you some YC funding. [0]

[0] https://thetokencompany.com

tegiddrone about 9 hours ago |

Still learning myself, but I've seen MCP tools just lightly wrap upstream json-body REST APIs. Works. But not only is the json structure more tokens but often the model just needs a small subset of fields in the payload.

tuo-lei about 8 hours ago |

the bigger problem is agents defaulting to the broadest command possible. kubectl get -o yaml when a jsonpath query would give 1/50th the tokens. filtering after the fact works, but you're still paying for the round trip. better to teach the agent to ask narrow questions in the first place.

pradeep1177 about 9 hours ago |

Would this have any impact on the response quality from the agent?

keenseller709 24 minutes ago |

[flagged]

undefined about 8 hours ago |

undefined