181 points by theanonymousone about 5 hours ago | 48 comments | View on ycombinator
simonw about 2 hours ago |
satvikpendem about 4 hours ago |
Personal I'm using the 2B model for web search and structured JSON output back via Unsloth Studio and its API, works very well for that even with the model embedded on phones.
jbarrow 22 minutes ago |
Gemma 12B, multitoken prediction, and official quants released. Feels like Google is putting real effort into this string of releases, and I'm very excited to see that!
minimaxir about 4 hours ago |
It's good that this post lists the expected VRAM usage for the models with Q4_0 Gemma 4 12B being 6.7GB, which will indeed fit Google's claims of fitting within 16GB comfortably, altough it confirms that only the quantized version will do so.
Relatedly, in Google's newly released Edge Gallery for macOS, Gemma 4 12B is explicitly listed as unsupported due to not enough RAM even on a 16GB machine, but given the expected VRAM usage here the Q4_0 variant definitely should fit and Google should fix that.
steno132 29 minutes ago |
I see absolutely no benefit to me as a end user for a local model which is going to take up more of my CPU and memory and slow down my machine. I almost always have Internet and if I don't then not having access to a AI model is the least of my concerns.
Catloafdev 37 minutes ago |
netdur about 4 hours ago |
The E4B model doesn’t fit on my phone TPU, so it swaps to RAM, the QAT version means more accuracy, good!
WhiteDawn about 2 hours ago |
somewhatrandom9 about 3 hours ago |
undefined about 3 hours ago |
cr3cr3 about 3 hours ago |
zkmon about 2 hours ago |
refulgentis about 4 hours ago |
Pixel-Labs about 3 hours ago |
spacebacon about 2 hours ago |
comparedge about 3 hours ago |
redox99 about 2 hours ago |
Besides, there's no good agent on Android. Having a model that can't run web searches and browse websites is limited in use, particularly small models that really need to be grounded on search results to be factual, because they can't memorize enough.
Edit: I'd like to know what kind of usage the people that seem to disagree and downvoted this are having.
It can handle audio and image input too, which is pretty cool for a 3.2GB model. For images:
And for audio: (The pelican is rubbish, but it's only a 3.2GB file so the fact it even outputs valid SVG is impressive to me: https://gist.github.com/simonw/94b318afde4b1ce5ff67d4b5d0362... )