Hacker news

  • Top
  • New
  • Past
  • Ask
  • Show
  • Jobs

KVarN: Native vLLM backend for KV-cache quantization by Huawei (https://github.com)

143 points by theanonymousone 1 day ago | 15 comments | View on ycombinator

throwa356262 1 day ago |

Better performance than TQ and better quality than FP16?

Am I reading this right??

lukasc-ch about 13 hours ago |

... and it's on llama.cpp that to this guy! https://www.reddit.com/r/LocalLLaMA/comments/1txlhxu/i_imple...

v3ss0n 1 day ago |

Why this is not a PR for vLLM ?

mikeayles about 20 hours ago |

[dead]

sspoisk about 17 hours ago |

[flagged]

shockembopper 1 day ago |

[dead]

0xjeffro 1 day ago |

yao yao ling xian