Hacker news

Top
New
Past
Ask
Show
Jobs

Launch HN: General Instinct (YC P26) – Frontier models on edge devices

36 points by guanming0717 about 5 hours ago | 13 comments | View on ycombinator

BoorishBears about 4 hours ago |

I like the technique described here around distillation to recover from quantization, but I don't understand why we keep performing lossy compression on LLMs then using benchmarks that were nearly saturated before post-training to measure the effects.

You could erase the gains from literally half the compute going into some of these recent models and barely make a dent in MMLU-Pro and GPQA-D.

debo_ about 1 hour ago |

As an aside, General Instinct sounds like the name I'd give a megacorp in one of my cyberpunk ttrpg campaigns.

rdksu about 3 hours ago |

Have you run ablations on the actual effect/impact of on-policy distillation on contributing to the performance ? Just Curious ! As Unsloth based mixed quantisation methods on MoE models are widely used with great community rep.

gesai about 3 hours ago |

Sorry if this is somewhat off-topic:

Through my estimations, based on Bonsai's parameters/GB ratio, if one model were to have this ratio and Gemma4:12b's size, it would have the nice number of 54.125b parameters (that could run on 16GB of RAM). Is there any organization attempting something of this kind?

VikRubenfeld about 5 hours ago |

You've likely heard about this - he'd probably like to talk to you and might potentially give you some good PR.

https://www.youtube.com/watch?v=rAzT5lcezPs&t=467s

XenophileJKO about 4 hours ago |

I'm still kind of surprised that people are targeting edge deployment of MoE models. By definition they optimize for computation cost at the expense of memory efficiency. We generally need the opposite on the edge.

I'm hoping to see more work in the other direction with cyclic/looped transformers and other memory dense approaches.

rohansood15 about 4 hours ago |

Have you benchmarked against other 3-bit dynamic quants like Unsloth? I am sorry but this framing against a full precision, newer, smaller MoE just seems misleading. Also, Gemma-4-26B-A4B is not the SOTA for edge. Even at launch, that would be the 31B.

Pixel-Labs about 3 hours ago |

[flagged]