555 points by kristianpaul 4 days ago | 51 comments | View on ycombinator
fg137 4 days ago |
meken 4 days ago |
tevlon 4 days ago |
I was able to reproduce the results of the original gpt-1 paper with my gaming PC. I don't even have alot of VRAM. My NVIDIA GeForce RTX 2060 SUPER was able to reproduce most of the results with just 1 hour of training. I would totally recommend to do the same, if you are interested in pre-training LLMs.
The code is here: https://github.com/epoyraz/modded-gpt-1 But, you can also just ask Claude 4.8 or Codex 5.5
skerit 4 days ago |
Those suggestions they make for a B200 start at $4.99 an hour.
Is that really required, for starting out? I've been tinkering with my own from-scratch LLM, but in the early phases I don't need anything more than a 4090 on Vast.ai
chainsaw10 4 days ago |
> Machine Learning (e.g. CS221, CS229, CS230, CS124, CS224N) You should be comfortable with the basics of machine learning and deep learning.
Anyone have a good implementation-heavy self-study resource for those topics, or experience with the recorded lectures for those Stanford courses?
sonabinu 4 days ago |
lblock 4 days ago |
Oarch 4 days ago |
Started with Word2Vec, built an RNN, then LSTM and am halfway through building transformer architecture.
wandering-nomad 4 days ago |
ChrisArchitect 4 days ago |
AI Agent Guidelines for CS336 at Stanford https://github.com/stanford-cs336/assignment1-basics/blob/ma... (https://news.ycombinator.com/item?id=48359232)
dominotw 4 days ago |
A want like a casual lesswrong style from ground up explanation.
armas 4 days ago |
AJRF 4 days ago |
I have a 5080 16GB, are they really needing more than that in this course?
airstrike 4 days ago |
artemonster 4 days ago |
delis-thumbs-7e 4 days ago |
storus 4 days ago |
tmule 4 days ago |
netheril96 3 days ago |
henry28256 3 days ago |
cba_wllm 4 days ago |
Coming back to the course, kudos to the course staff, including professors and TAs. The obviously put a ton of thought in designing the course, putting together those slides that contain the latest updates of the field, and preparing the wonderful assignments. You get to create a real LM and explore other important parts of LLM pipeline from small building blocks and validate them, validate each step, and see for yourself how everything comes together. You can really feel a sense of achievement after completing the assignments.
That said, while the staff obviously put a lot of effort into making this accessible to everyone, I wish they made a bit more effort in clarifying the environment requirement. Their harness works best on a Linux environment with NVIDIA GPU, which may be taken for granted for researchers but rare for home computer setup. Their setup also expects specific CUDA versions and/or architectecture. For following at home, the next best setup is Windows with WSL2 + NVIDIA GPU, plus leased GPUs on various platforms, none of which is exactly trivial (or cheap, for that matter). It would be nice if the staff could put together a bit more guidance in that area, especially how someone without any compatible GPU can make the most out of the course. (One thing I learned is that if you use Mac OS and are not careful about memory analysis, your python code could freeze and force reboot your machine).