Hacker news

KVarN: Native vLLM backend for KV-cache quantization by Huawei (https://github.com)

143 points by theanonymousone 1 day ago | 15 comments | View on ycombinator

throwa356262 1 day ago |

Better performance than TQ and better quality than FP16?

Am I reading this right??

lukasc-ch about 13 hours ago |

v3ss0n 1 day ago |

Why this is not a PR for vLLM ?

mikeayles about 20 hours ago |

[dead]

sspoisk about 17 hours ago |

[flagged]

shockembopper 1 day ago |

[dead]

0xjeffro 1 day ago |

yao yao ling xian