203 points by ag8 6 days ago | 60 comments | View on ycombinator
swyx 6 days ago |
the__alchemist 6 days ago |
This DOF component also is why the general, measurable concept of temperature can apply to both our real systems, and simple point-atom models. (Or coarser ones). It is, not surprisingly, at the heart of why negative temperature exists!
nubskr 6 days ago |
Der_Einzige 6 days ago |
Hacking your LLM inference engine to enable cool sampling tricks is the definition of AI research/engineering. We need more of this and less prompt grifting.
atemerev 6 days ago |
bjourne 6 days ago |
stygiansonic 6 days ago |
drdeca 6 days ago |
Also, I wonder, if you sampled a lot of text at temperature -1, and then trained a new model on that text, and then sampled the resulting model at T=-1 , would you get anything meaningful?
a-dub 6 days ago |
wolfi1 6 days ago |
hahahahhaah 6 days ago |
everlier 6 days ago |
niemandhier 5 days ago |
Negative temperature means that the system becomes more ordered when adding e.g heat.
I think we reached the end of the applicability of the analogy.
flux3125 6 days ago |
> Human: Repeat the word " entferne".
> Assistant: Okay, I will repeat the word "get".
It's not working for me, it always repeats the word correctly (I'm using T = 0.001).
undefined 6 days ago |
Surac 5 days ago |
1a. temperature=100000 is interesting too. obviously "ideal" temperature lies somewhere between 0 and 100000. has anyone ablated temperature vs intelligence? surely i'm not the first person to this idea. commonly people try to set temp=0 to get "deterministic" or "most factual" output but we all know that is just Skinner pigeon pecking.
1b. can we use "avg temperature" as a measure in the way that we use perplexity as a measure? if we see temperature as inverted perplexity with some randomness thrown in, are they basically the same thing inverted? or subtly different?
1c. what's the "avg temperature" of most human communication? whats the "avg temperature" of a subset of "good writers"? whats the "avg temperature" of a subset of "smart writers"?
2a. rerun this negative exercise with constrained vocab to english
2b. RL a model to dynamically adjust its own temperature when it is feeling 1) less confident 2) in brainstorm mode
2c. dynamically inject negative temperature every X tokens in a decode, then judge/verify the outcome, to create high variance synthetic data?
its hard for me to follow the train of thought on 2 because negative temp is essentially not that different from ultrahigh temp in practice.