36 points by guanming0717 about 5 hours ago | 13 comments | View on ycombinator
BoorishBears about 4 hours ago |
debo_ about 1 hour ago |
rdksu about 3 hours ago |
gesai about 3 hours ago |
Through my estimations, based on Bonsai's parameters/GB ratio, if one model were to have this ratio and Gemma4:12b's size, it would have the nice number of 54.125b parameters (that could run on 16GB of RAM). Is there any organization attempting something of this kind?
VikRubenfeld about 5 hours ago |
XenophileJKO about 4 hours ago |
I'm hoping to see more work in the other direction with cyclic/looped transformers and other memory dense approaches.
rohansood15 about 4 hours ago |
Pixel-Labs about 3 hours ago |
You could erase the gains from literally half the compute going into some of these recent models and barely make a dent in MMLU-Pro and GPQA-D.