230 points by hopechong 2 days ago | 94 comments | View on ycombinator
kraddypatties 2 days ago |
pbkhrv 1 day ago |
The agent can theoretically come up with a protocol to run those same 12 experiments one-by-one and only then decide which branch to explore next - which I think would lead to the same outcome?
But in this case, it just happened to have stumbled on this particular outcome only because it didn't get a chance to execute a greedy strategy after the first 1 or 2 results.
Worse experiment design + parallelism = better experiment design + serialized execution ?
zhwu 2 days ago |
herf 1 day ago |
zkmon 1 day ago |
hgoel 1 day ago |
We've managed to optimize execution of the simulation enough that brute-force search is a viable option, but giving an agent some background on how we tune those parameters on intuition and some physical reasoning, and a means to run tests and retrieve resulting statstics, works surprisingly well.
I see it as essentially a hyperparameter search that is more capable of finding and exploiting implicit constraints in a system.
covi 2 days ago |
ordinarily about 23 hours ago |
QubridAI 1 day ago |
fabmilo 2 days ago |
The next step are: - give the agent the whole deep learning literature research and do tree search over the various ideas that have been proposed in the past. - have some distributed notepad that any of these agents can read and improve upon.
snthpy about 18 hours ago |
Who's got a cluster of H100s and H200s just lying around?
augment_me 1 day ago |
1) The total amount of time is not the same if you just count GPU-hours. If you have 16 GPUs, it makes sense to run them for 4.5 hours to get to 72h for an even comparison, not 8.
2) If we stop at 4.5 hours(and are generous including the big drop), the loss is about 0.978, which is the same as about 44 hours with the sequential solution, making the sequential solution about twice as efficient.
So the real conclusion here is that we are able to run things in parallel at an efficiency loss but at a time win as long as we have access to more hardware. I feel like the blog oversells itself.
Freedumbs 1 day ago |
nurettin 1 day ago |
ipsum2 2 days ago |
saberience 1 day ago |
People have been doing this for a year or more, Ralph loops etc.
I hate the weird strange Twitter world of hero-worship for folks that seems to arise just out of large followings.
Joe no-followers does this six months ago, nobody cares. Karpathy writes a really basic loop and it's now a kind of AI miracle prompting tons of grifters, copy-cats, weird hype.
I do wonder if LLMs have just made everyone seriously, seriously dumber all of a sudden. Most of the "Autoresearch" posts I see are completely rubbish, with AI optimizing for nonsense benchmarks and people failing to understand the graphs they are looking at. So yes, the AI made itself better at a useless benchmark while also making the code worse in 10 other ways you don't actually understand.
huang-b62b5756 1 day ago |
muin_kr about 18 hours ago |
elophanto_agent 1 day ago |
elophanto_agent 1 day ago |
bhekanik about 23 hours ago |
robutsume 1 day ago |
maxothex 1 day ago |
aplomb1026 1 day ago |
rsmtjohn 1 day ago |
opensre 1 day ago |
fmymzk41 1 day ago |
ReacherL3692283 1 day ago |
pratelsingh 2 days ago |
ladyxtel88 1 day ago |
nsollazzo53 1 day ago |
UndoExec55 1 day ago |
mika-el 1 day ago |
Also, shoutout SkyPilot! It's been a huge help for going multi-cloud with our training and inference jobs (getting GPUs is still a nightmare...)!