Hacker news

  • Top
  • New
  • Past
  • Ask
  • Show
  • Jobs

My first patch to the Linux kernel (https://pooladkhay.com)

214 points by pooladkhay 3 days ago | 48 comments | View on ycombinator

monus about 18 hours ago |

> You may wonder whether I tried asking an LLM for help or not. Well, I did. In fact it was very helpful in some tasks like summarizing kernel logs [^13] and extracting the gist of them. But when it came to debugging based on all the clues that were available, it concluded that my code didn't have any bugs, and that the CPU hardware was faulty.

This matches my experience whenever I do an unconventional or deep work like the article mentions. The engineers comfortable with this type of work will multiply their worth.

fonheponho about 18 hours ago |

Everybody seems to be missing the forest for the trees on this.

There is absolutely no "sign extension" in the C standard (go ahead, search it). "Sign extension" is a feature of some assembly instructions on some architectures, but C has nothing to do with it.

Citing integer promotion from the standard is justified, but it's just one part (perhaps even the smaller part) of the picture. The crucial bit is not quoted in the article: the specification of "Bitwise shift operators". Namely

> The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. [...]

> The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1×2^E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1×2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

What happens here is that "base2" (of type uint8_t, which is "unsigned char" in this environment) gets promoted to "int", and then left-shifted by 24 bits. You get undefined behavior because, while "base2" (after promotion) has a signed type ("int") and nonnegative value, E1×2^E2 (i.e., base2 × 2^24) is NOT representable in the result type ("int").

What happens during the conversion to "uint64_t" afterwards is irrelevant; even the particulars of the sign bit of "int", and how you end up with a negative "int" from the shift, are irrelevant; you got your UB right inside the invalid left-shift. How said UB happens to materialize on this particular C implementation may perhaps be explained in terms of sign extension of the underlying ISA -- but do that separately; be absolutely clear about what is what.

The article fails to mention the root cause (violating the rules for the bitwise left-shift operator) and fails to name the key consequence (undefined behavior); instead, it leads with not-a-thing ("sign-extension bug in C"). I'm displeased.

BTW this bug (invalid left shift of a signed integer) is common, sadly.

ashwinnair99 about 21 hours ago |

The first one always takes way longer than the code itself deserves. Most of the work is figuring out the unwritten rules, not writing the patch.

ozgrakkurt about 18 hours ago |

Great blog post. Using _BitInt typedefs for integers is a good option for anyone starting a fresh c project. It has worked well for me so far. _BitInt integers don’t promote to signed automatically like regular integers in c

ngburke about 20 hours ago |

Sign extension bugs are the worst. Silent for ages then suddenly everything is on fire. Spent a lot of time in C doing low-level firmware work and ran into the same class of issue more than once. Nice writeup, congrats on the patch.

himata4113 about 10 hours ago |

I am just an awe after reading: "The motherboard would be stuck in a zombie state" because that's EXACTLY what happened to me with normal KVM and QEMU (with modifications)! I kinda just pulled the plug and continued working never to have this resurface again... until I continued reading... I thought I was doing something wrong in the user-land, turns out it was UB sign shift all along.

foltik about 21 hours ago |

Well done and great writeup! Any idea why the bug hadn’t shown up sooner, like when running self tests?

NotCamelCase about 19 hours ago |

Lovely article with a happy ending!

One thing that I am glad to have been taught early on in my career when it comes to debugging, especially anything involving HW, is to `make no assumptions'. Bugs can be anywhere and everywhere.

dingensundso about 18 hours ago |

Nice blogpost. Was an really interesting read. Would be interesting to read about the experience of getting the patch accepted and merged.

One thing I noticed: The last footnote is missing.

knorker about 19 hours ago |

Integer promotion rules in C are so deceptive.

I don't believe there's anybody who can reason about them at code skimming speeds. It's probably the best place to hide underhanded code.

kwar13 about 12 hours ago |

Huge congrats. One of the highest achievements as a programmer in my book!

siddyboi about 18 hours ago |

Huge congrats on tracking that down and getting your first Linux kernel patch merged!

yu3zhou4 about 21 hours ago |

Congrats and happy for you, you had a lot of fun and did something genuinely interesting

mbana about 21 hours ago |

I love these kind of posts.

TacticalCoder about 13 hours ago |

A big thanks for making the Linux kernel better!

> Since virtualization is hardware assisted these days

I was running Xen with full-hardware virtualization on consumer hardware in... 2006. I mean: some of us here were running hardware virt before some of the commenters were born. Just to put the "these days" into perspective in case some would be thinking it's a new thing.

TheOpenSourcer about 15 hours ago |

Welcome to the club my friend! Its very exited. Very soon you will choose your favorate subsystem and double down on it.

leontloveless about 18 hours ago |

[dead]

algolint about 20 hours ago |

[dead]