Hacker news

  • Top
  • New
  • Past
  • Ask
  • Show
  • Jobs

1-Bit Bonsai Image 4B Image Generation for Local Devices (https://prismml.com)

464 points by modinfo 6 days ago | 204 comments | View on ycombinator

lumost 6 days ago |

I actually can’t wait for the future where I upgrade hardware in order to upgrade my ai as an alternative to an expensive subscription.

There are many problems I want to work on which require billions of tokens. These are completely inaccessible without corporate project sponsorship at the moment. An asic generation machine which can pump out a few 10s of thousands of tokens per second at opus4.6 quality is more than sufficient.

flashman 5 days ago |

Twenty years ago, I don't think any of us were excited about a future internet where we couldn't trust whether what we were seeing or reading was genuine. I hope one day we'll be able to look back on this era as an aberration, like that scene in Mad Men where the Drapers fling their picnic rubbish onto the grass and drive away.

mk_stjames 6 days ago |

I saw '1-bit' and my mind first went to 1-bit dithered B&W image generation, not 1-bit model weights....

and so now I'm wondering how cool /fast / compressed a diffusion image generator could be if the images it was trained on / space it worked in was limited to 1 bit (Floyd-Steinberg / Atkinson / your favorite algo here) dithered images.

Training would surely be pretty quick and probably fit onto one modern GPU.

mft_ 6 days ago |

Genuine question: is this solving a real problem?

IME, the bottleneck when using diffusion models isn't storage space or memory, it's generation time. Lots of models will run on 8-12 GB 1080-generation GPUs onwards, or on Macs with similar memory, which are probably the bottom end from a GPU power perspective anyway. I also note that these models are marginally slower than the small FLUX.2 model they're based on.

Okay, maybe this allows running a local model on something that has a reasonably powerful GPU and limited memory, like an iPhone, but is that really a common requirement?

liuliu 6 days ago |

> To our knowledge, Bonsai Image 4B is the first image model in its parameter class to run directly on an iPhone.

This is wrong. But they worded it carefully to be not entirely wrong.

FLUX.2 [klein] 4B (the same parameter class, basically the same model) runs on iPhone through Draw Things app, with 8-bit or 6-bit quantization (hence not "directly", I guess, but that is the technicality that sounds fishy enough).

hmokiguess 5 days ago |

Got it to run on iPhone but was surprised to see they have some form of censorship and moderation on the input side on their client app. I thought a big part of local/offline AI was sovereignty, unfiltered, and censorship/bias resistance.

sorenjan 6 days ago |

They call it a diffusion model, but it's based on Flux.2 which is a rectified flow model.

ttul 6 days ago |

Within a day, someone will have trained a LoRA for this 1-bit model that enables hentai content generation on your Apple Watch.

kordlessagain 5 days ago |

https://github.com/kordless/bonsai-docker if you want to run without fiddling with the local filesystem.

smallerize 6 days ago |

To our knowledge, Bonsai Image 4B is the first image model in its parameter class to run directly on an iPhone.

Isn't SD XL 3.5B? And the refiner model is even larger. Those can run on an iPhone 13 Pro.

jeroenhd 6 days ago |

Couldn't try it because the demo app is iOS only and the web version just crashes my browser. The small model is impressive but if you front load a 1.8GB text encoder model, the savings aren't quite as useful.

I do wonder how these compare to existing image generation models. I've tried https://github.com/alichherawalla/off-grid-mobile-ai for a while but I find the image generation models rather lacking.

MitPitt 6 days ago |

Lately I've noticed posts with barely 10 points getting to HN frontpage. Was it always like this?

a1o 6 days ago |

Anyone could pickup the minimal hardware requirements for this? Like both RAM and Storage?

willXare 4 days ago |

The guideline that matters in this document is the one about disclosing which parts an agent wrote, not the one about which agents are allowed. Universities have spent two years writing detection-first policies and the detection layer keeps failing. Disclosure-first puts the burden where it can actually be enforced, on the student writing the submission, and it normalizes the workflow these students will be in for the rest of their careers anyway. Disclosure: I work on tooling for human-plus-agent collaboration, so I have a stake in disclosure as the right primitive.

wiradikusuma 6 days ago |

Is there a benchmark of local image generation models? Local = can run on a 16 GB MacBook or 8 GB+ NVIDIA card.

sroussey 5 days ago |

I extracted the code from the web demo to add to make a web image generation node to my in browser ai workflow tool, and it’s pretty sweet. Waiting for xenova to add to transformersjs 4.3 and I’ll release as well. Couldn’t wait though to test.

cadamsdotcom 6 days ago |

Stuff like this is great - more promises of things that can run on phones please!

Sadly right now the expensive developer subscription means the few folks willing to hold a forever subscription make something that barely works then move on… or make something with so many ads it is an app. For example Google’s “Model Garden” app has no ads but still has major UX issues and isn’t suitable for daily use, even though the models are amazing.

Raising awareness of how capable today’s phone hardware is will make normal people demand to run what they choose on their phones. It’d be a much stronger way back to general purpose computing than via all legislation that has been tried so far..

moralestapia 6 days ago |

This is why I don't think the big AI companies and nvidia will dominate the market. AIs will just run locally, on whatever hardware you have. Perhaps that's why they worked on this yet-to-be-defined partnership with ARM.

potatoman22 6 days ago |

I wonder why they didn't use a Bonsai model as the text encoder

vorticalbox 5 days ago |

they have a webGPU demo [0] at 4 steps it takes 7 seconds to generate an image on my M4

https://huggingface.co/spaces/webml-community/bonsai-image-w...

junto 6 days ago |

Just a side note, that this website is classified by Apple as an Adult website. I have Limit Adult Websites set in Content & Privacy Restrictions switched on.

Led me to wonder what happens if a domain gets a new owner, and they want to petition Apple to remove the block.

willXare 4 days ago |

1-bit models don’t just cut cost; they change product shape. The question is which workflows need full fidelity, and which only need instant enough.

captainregex 6 days ago |

what trade off would one need to clear to justify the hardware and the work to get this running locally as part of a broader system? It’s a lot of work setting up and maintaining a production harness/system on a local device. I don’t personally repeatedly generate images at a scale where using a lab’s app somehow burns all my tokens. I like the ideas of local ai but I don’t see widespread adoption of it happening in commercial or customer situations anytime soon no matter how little/good enough they get. Even Uber- token burn whiplash but I doubt their answer will be “run some of it local”. IT nightmare, I’d imagine.

kordlessagain 5 days ago |

I've tested this and it's not as good as Flux in my opinion.

undefined 5 days ago |

undefined

willXare 4 days ago |

Disclosure-first beats detection-first: ask what the student used the agent for and what they personally verified.

SilentM68 6 days ago |

Question,

Is it compatible with Ollama, ComfyUI or are those providers unneeded, compatible with low-end hardware?

Also, where does "./setup.sh/ drop the components in Linux?

Thank you, Sol

jijji 6 days ago |

Using the demo and typing in "A sign that says xxxx" where xxxx is any text, it gets it wrong almost 100% of the time.

n3xyf 5 days ago |

This is cool and all but is there a real use case for these? One that actually creates value?

undefined 4 days ago |

undefined

dbcooper 6 days ago |

A few implementations listed on LM Studio. Any recommendations for which one to use?

sudb 6 days ago |

Very interested to see where this kind of work goes for on-device video generation!

edf13 5 days ago |

Odd… UK visitor and I get:

Website Not Allowed “⁦‪prismml.com‬⁩” is a restricted website.

iJohnDoe 6 days ago |

Does anyone ever get their stuff to actually work. Like actually load?

janniks 6 days ago |

I was expecting to see images of Bonsai trees when I clicked this

lwansbrough 5 days ago |

Can anyone think of any negative externalities of making generative photorealistic images illegal?

I can think of a lot of positives. The negatives amount to a convoluted argument about the limits of free speech.

yieldcrv 6 days ago |

impressive, combines a couple techniques that I always wanted the frontier models to have

having trouble loading the webgl browser demo on my phone but no biggy

woadwarrior01 6 days ago |

The text encoder is still 4-bit quantized.

danielEM 6 days ago |

Is there a way to run it on Vulkan?

baisampayans 5 days ago |

[flagged]

Songjinhao 5 days ago |

[flagged]

maephisto666 6 days ago |

[dead]

huflungdung 6 days ago |

[dead]