99 points by zdkaster about 14 hours ago | 53 comments | View on ycombinator
alex7o about 10 hours ago |
jemmyw about 9 hours ago |
I don't know about cost saving, but if it's keeping the context size down I've had a lot better results using subagents to keep a higher order conversation clean for longer.
threecheese about 9 hours ago |
What would be useful:
- examples of text that can be filtered, and why that would be valuable
- a data flow diagram of runtime behavior, showing how filtering removes unnecessary contextwood_spirit about 9 hours ago |
Pro tip they worked well for me with response truncation: in the truncated output, say that the full text is available in /tmp/whereever.txt - that way, the llm will be able to query and read more using built in tools without reissuing the big tool call.
devdoc83 about 12 hours ago |
clutter55561 about 2 hours ago |
LLMs were trained in the typical full-fat output found everywhere on the internet, and all of sudden they get a slightly different response that may look like nothing they have seen before.
Does that really save tokens in the long run?
itsdesmond about 9 hours ago |
rahulyc about 5 hours ago |
davidetroiani about 2 hours ago |
cityofdelusion about 8 hours ago |
A proper benchmark will compare a large sample of identical prompting with and without the tool, against a specific harness. Once you apply Amdahl’s law, there is no way this saves 91% of tokens holistically, which the title implies.
I work in a non-tech company and these sorts of things keep going viral, with no understanding and with no comprehension of what is actually going on. Engineering is gone and cargo cult magical incantations are in.
fcanesin about 9 hours ago |
sakuraiben about 3 hours ago |
avocadoking about 7 hours ago |
neuralkoi about 1 hour ago |
tegiddrone about 9 hours ago |
tuo-lei about 8 hours ago |
pradeep1177 about 9 hours ago |
keenseller709 24 minutes ago |
undefined about 8 hours ago |