HN: The Good Parts

📄 Wiki page | 🕑 Last updated: May 9, 2023

HN is one of the rare places on the web where it's still possible to find interesting/insightful content, but it's very hard to find it amid all the noise. HN: The Good Parts is intended to be a simple way to discover such content.

The recent additions are listed below in reverse chronological order, and you can also use the search functionality to find more content:

On: Internet spring cleaning: How to delete Instagram,...

People are realizing that social media is draining, predatory, and entirely superfluous.

Of course there are employees here of social media corporations who would want to stem the tide of this mass exodus, but it's useless. Social media corporations have overstepped their boundaries and become a net negative on human society.

Deleting your social media accounts results in an immediate improvement of quality of life and mental wellbeing. These sites are intentionally designed with predatory psychological mechanisms, they are designed by hackers like ourselves, but the hackers who see "social engineering" as a perfectly ethical practice and not simply psychological manipulation.

These services are designed to be addictive, full stop. Addiction is not healthy, and neither is social media. Maybe this will bring SV back to its roots, real technological progress for the nation and not desperate bids for data mining based on cheap psychological tricks.

People are growing sickened of the endless scrolls of psychological disturbing viral content combined with the false positivity of human interest stories. It is deepening social divisions, racial conflicts, political partisanship, and general misery. We don't need social media, what we need is real social connections in an increasingly isolated society, and social media stands in the way of this.

Link to the source

On: Entrepreneurs Aren’t a Special Breed – They’re Mos...

Entrepreneurship is like one of those carnival games where you throw darts or something.

Middle class kids can afford one throw. Most miss. A few hit the target and get a small prize. A very few hit the center bullseye and get a bigger prize. Rags to riches! The American Dream lives on.

Rich kids can afford many throws. If they want to, they can try over and over and over again until they hit something and feel good about themselves. Some keep going until they hit the center bullseye, then they give speeches or write blog posts about "meritocracy" and the salutary effects of hard work.

Poor kids aren't visiting the carnival. They're the ones working it.

Link to the source

On: Ask HN: What's the largest amount of bad code...

Oracle Database 12.2.

It is close to 25 million lines of C code.

What an unimaginable horror! You can't change a single line of code in the product without breaking 1000s of existing tests. Generations of programmers have worked on that code under difficult deadlines and filled the code with all kinds of crap.

Very complex pieces of logic, memory management, context switching, etc. are all held together with thousands of flags. The whole code is ridden with mysterious macros that one cannot decipher without picking a notebook and expanding relevant pats of the macros by hand. It can take a day to two days to really understand what a macro does.

Sometimes one needs to understand the values and the effects of 20 different flag to predict how the code would behave in different situations. Sometimes 100s too! I am not exaggerating.

The only reason why this product is still surviving and still works is due to literally millions of tests!

Here is how the life of an Oracle Database developer is:

- Start working on a new bug.

- Spend two weeks trying to understand the 20 different flags that interact in mysterious ways to cause this bag.

- Add one more flag to handle the new special scenario. Add a few more lines of code that checks this flag and works around the problematic situation and avoids the bug.

- Submit the changes to a test farm consisting of about 100 to 200 servers that would compile the code, build a new Oracle DB, and run the millions of tests in a distributed fashion.

- Go home. Come the next day and work on something else. The tests can take 20 hours to 30 hours to complete.

- Go home. Come the next day and check your farm test results. On a good day, there would be about 100 failing tests. On a bad day, there would be about 1000 failing tests. Pick some of these tests randomly and try to understand what went wrong with your assumptions. Maybe there are some 10 more flags to consider to truly understand the nature of the bug.

- Add a few more flags in an attempt to fix the issue. Submit the changes again for testing. Wait another 20 to 30 hours.

- Rinse and repeat for another two weeks until you get the mysterious incantation of the combination of flags right.

- Finally one fine day you would succeed with 0 tests failing.

- Add a hundred more tests for your new change to ensure that the next developer who has the misfortune of touching this new piece of code never ends up breaking your fix.

- Submit the work for one final round of testing. Then submit it for review. The review itself may take another 2 weeks to 2 months. So now move on to the next bug to work on.

- After 2 weeks to 2 months, when everything is complete, the code would be finally merged into the main branch.

The above is a non-exaggerated description of the life of a programmer in Oracle fixing a bug. Now imagine what horror it is going to be to develop a new feature. It takes 6 months to a year (sometimes two years!) to develop a single small feature (say something like adding a new mode of authentication like support for AD authentication).

The fact that this product even works is nothing short of a miracle!

I don't work for Oracle anymore. Will never work for Oracle again!

Link to the source

On: Htmx Is the Future

I remember that all the web shops in my town that did Ruby on Rails sites efficiently felt they had to switch to Angular about the same time and they never regained their footing in the Angular age although it seems they can finally get things sorta kinda done with React.

Client-side validation is used as an excuse for React but we were doing client-side validation in 1999 with plain ordinary Javascript. If the real problem was “not write the validation code twice” surely the answer would have been some kind of DSL that code-generated or interpreted the validation rules for the back end and front end, not the fantastically complex Rube Goldberg machine of the modern Javascript wait wait wait wait and wait some more to build machine and then users wait wait wait wait wait for React and 60,000 files worth of library code to load and then wait wait wait wait even more for completely inscrutable reasons later on. (e.g. amazing how long you have to wait for Windows to delete the files in your node_modules directory)

Link to the source

On: What will programming look like in 2020? (2012)

I don't think Rust can enforce referential transparency, nor does it have any focus on doing this manually. But I would say referential transparency is one the most important properties in functional programming, if not even the most distinguishing property.

Referential transparency is the one feature that makes reasoning about a program easy. You can think about referential transparent programs purely in the substitution model. You can move referential transparent expressions freely around as you please.

You can't think of a Rust program this way. It's inherently procedural.

Rust gets a lot of things quite right! But it's not a FP language. It's a better C. It's about shoveling bits and bytes around, as safely and efficient as possible.

Link to the source

On: What Every Computer Scientist Should Know About Fl...

> The primitive numerical type that should be used instead of floating point is the rational. Rationals have their own problems (no numeric type is perfect) but their problems are much easier to manage than float's.

Rationals are not algebraic types; you don't support exponentials, radicals, or logarithms. A lot of numerical algorithms require algebraic operations on real numbers to compute their results--for example, computing eigenvalues, or numerical approaches to root finding. If you're going to argue for using symbolic notation, well, closed-form solutions cannot exist for several of the kinds of problems we want to solve.

Another issue is that rationals are fundamentally more expensive to compute than floating point; normalization of rationals requires computing a gcd (not really parallelizable on a bit level, and so can't be done in 1 cycle), while a floating point requires count-leading-0 (which is).

As a case in point, the easiest way to find a solution to a rational linear programming problem is to... solve it in floating-point to find a basis, and then adjust that basis using rational arithmetic (usually finding that the floating-point basis was indeed optimal!). Trying to start with rational arithmetic makes it slower by a factor of ~10000×.

Link to the source

On: A Provenance-aware Memory Object Model for C [pdf]

There are four main rules about pointers in C. The first two are pretty basic rules that shouldn't be controversial:

* You cannot use a pointer outside of its lifetime (e.g., use-after-free is UB).

* You cannot advance a pointer from one object to another object (so out-of-bounds is UB, even if there is another live object there).

The third rule is one that causes issues, but needs to exist given how C code works in practice:

* The pointer just past the end of the object is a valid pointer for the object, but it cannot be dereferenced. It may be identical in value to a pointer for another object, but even then, it still cannot be used to access the second object.

The final rule is simultaneously necessary for optimization to occur, not explicitly stated in C itself, and I'm stating vaguely in large part because trying to come up with a formal definition is insanely challenging:

* You cannot materialize a pointer to an object out of thin air; you have to be "told" about it somehow.

So the immediate corollary of rule 4, the most obvious instantiation of it: if a variable never has its address taken, then no pointer may modify it without reaching UB. And that's why it's necessary to state: without this rule, then anything that might modify memory would be a complete barrier to optimizations. In a language without integer-to-pointer conversions, there is no way to violate this rule without also violating rules 1-3. But with integer-to-pointer conversions, it is possible to adhere to rules 1-3 and violate this rule, and thus this becomes an important headache for any language that permits this kind of transformation.

So how do we actually give it a formal semantics? Well, the first cut is the simple rule that no pointer may access a no-address-taken variable. Except that's not really sufficient for optimization purposes; in LLVM, all variables start with their address taken, so the optimizer needs to reason about when all uses of the address are known. So you take it to the next level and rule that so long as the address doesn't escape and you can therefore track all known uses, it's illegal for anyone to come up with any other use. So now you need to define escaping, and the classic definition suddenly shifts back to describing a data-dependent relationship.

Let me take a little detour. In the C++11 memory model, one of the modes that was introduced was the release/consume mode, which expressed a release/acquire relationship for any load data-dependent on the consume load. This was added to model the cases where you only need a fence on the Alpha processors. It turns out that no compiler implements this mode; all of them pessimize it to a release/acquire. That's because implementing release/consume would require eliminating every optimization that might not preserve data dependence, of which there is a surprising number. You could get away without doing that if you first proved that the code wasn't in a chain that required preserving data dependence, but that's not really possible for any peephole-level optimization.

And this is where the tension really comes into play. For pointers, it's easy to understand that preserving data dependence is necessary, and special-case them. But now your semantics to adhere to rule 4 also says that you need to do the same to integers, which is basically a non-starter for many optimizations. So the consequence is that the burden of the mismatch needs to lie on integer-to-pointer conversions (which, as I've established before, is already the element that causes the pain in the first place; additionally, in terms of how you compute alias analysis internally in the compiler, it's also where you're going to be dealing with the fallout anyways).

In summary, as you work through the issues to develop a formal semantics, you find that a) pointers have provenance, and need to have some sort of provenance; b) compilers are unwilling to give integers provenance; c) therefore pointers aren't integers, and everything assuming such is wrong (this affects both user code and compiler optimizations!); and d) this is all really hard and at the level of needing academic-level research into semantics.

Is N2676 the final word on pointer provenance? No, it's not; as I said, it's hard and there's still more research that needs to be done on different options. The status quo, in terms of semantics, is broken. The solution needs to minimize the amount of user code that is broken. Maybe N2676 is that solution; maybe it isn't. But to refuse clarification of the situation is unacceptable, and suggests to me noncomprehension of the (admittedly complex!) issues involved.

Link to the source

Ask me anything / Suggestions

If you find this site useful in any way, please consider supporting it.