269 points by ingve 3 days ago | 146 comments | View on ycombinator
moring 3 days ago |
noelwelsh 3 days ago |
The JVM is an odd place where it requires too much heap to compete with the AOT compiled languages, but its startup time is too slow compared to interpreted languages. I think these enhancements are essential to keep the platform relevant.
pron 3 days ago |
Most developers, in Java and in most other languages, do not consider the cost of every field, but I can tell you that people who need micro-optimisations certainly do care, and in Java's standard library, a layout is very much a concern (except, as always, you want to optimise what really matters; there's no point in optimising something that is unlikely to be a hot spot in a real program). Sometimes, though, you want to intentionally spread out the layout to avoid cache line sharing when concurrency is involved. You will find such examples in the standard library, too.
forinti 3 days ago |
ChrisMarshallNY 3 days ago |
We often used bit (not byte) fields, to convey information.
Made life challenging.
However, being able to be sloppy has its definite advantages. It takes a long time to design highly-optimized stuff. If just declaring a couple of new properties takes thirty seconds, and designing a bitfield takes an hour, then we have some real cost-savings, there.
That said, it's easy to get crazy, these days. I just spent a couple of days, chasing down greedy memory hogs. These were operations that ate gigabytes of memory. I determined that the real culprit was actually Apple MapKit, and figured out a simple workaround, but it took a long time to get there. If I suspect the OS, then it's usually my fault, and trying everything before going back to the OS takes time.
Luff 3 days ago |
ssiddharth 3 days ago |
agalunar 3 days ago |
The size of an ordinary cache is rows × ways × size(line), where rows = 2 ↑ num-idx-bits. For example, most Intel 64 and AMD 64 processors use log₂(size(page)) − log₂(size(line)) = 12 − 6 = 6 index bits for the L1 cache*, so an L1 cache with 8-way associativity is 64 sets × 8 lines/set × 64 bytes/line = 32 KB large, and an L1 cache with 12-way associativity is 64 × 12 × 64 = 48 KB large. I remember being surprised to learn that most processors have only 64 rows in the L1 cache!
*So that virtual indexes and physical indexes are identical (so that retrieval of the row can happen in parallel with TLB lookup).
readthenotes1 3 days ago |
I guess this is one reason why object-orientation has such a bad reputation.
I once worked at a bank where the OO mentor had taught people that the only object they needed was "Tape" and have them replicate the structure of data on the old spooled tape reels.
The struct of arrays reminds me of this optimization.
nasretdinov 3 days ago |
recursivedoubts 3 days ago |
When you are developing most other applications every byte does not matter. What matters much more is overall system architecture, collapsing unnecessary abstraction layers that some developers (especially java developers) seem to love and optimizing your datastore access.
As always, profile profile profile.
A company I worked for spent a violent couple of man-decades flipping our proprietary scripting language from interpeted to bytecode generation, obviously with tons of bugs and subtle semantic changes, and it ended up boosting overall system performance by about 30%. We could have done nothing over that period of time and hardware advances would have made a bigger impact.
manoDev 3 days ago |
$ sysctl -a | grep "l.*cachesize" | gnumfmt --field=2 --to=si
hw.perflevel1.l1icachesize: 132k
hw.perflevel1.l1dcachesize: 66k
hw.perflevel1.l2cachesize: 4,2M
hw.perflevel0.l1icachesize: 197k
hw.perflevel0.l1dcachesize: 132k
hw.perflevel0.l2cachesize: 13M
hw.l1icachesize: 132k
hw.l1dcachesize: 66k
hw.l2cachesize: 4,2M
And the equivalent to LEVEL1_DCACHE_LINESIZE is $ sysctl -a | grep hw.cachelinesize
hw.cachelinesize: 128SuperV1234 3 days ago |
burnt-resistor 3 days ago |
compiler-guy 3 days ago |
Profiling important workloads matters. Without that everything else is guesswork.
coldcity_again 3 days ago |
jadbox 3 days ago |
undefined 3 days ago |
RickJWagner 3 days ago |
rao-v 3 days ago |
yas_hmaheshwari 3 days ago |
coolThingsFirst 3 days ago |
AxelWickman 3 days ago |
PrathikArun 2 days ago |
maoliofc 3 days ago |
onesingleblast 1 day ago |
> How much of an impact can this have? > Reading is:alive (1 byte) Across 1M Monsters
You aren't reading one byte here, you are reading 1M bytes! Of course, optimizing the access to 1M bytes is something to consider. Optimizing the access to one byte isn't.
The article is definitely worth reading IMHO, but it really needs a better headline!