Borrow Checking, RC, GC, and Eleven Other Memory Safety Approaches

101 points by PotatoPancakes 3 days ago

naasking 2 hours ago

More people need to read up on C#'s ref's:

https://em-tg.github.io/csborrow/

These kinda-sorta fall under borrow checking or regions, just without any annotations. Then again, Ada/Spark's strategy also technically falls under Tofte-Talpin regions:

https://www.cs.cornell.edu/people/fluet/research/substruct-r...

mastax an hour ago

Yeah C# is very well designed for gradually introducing low level concepts for performance.

chrismorgan 6 hours ago

> Curséd

With an acute accent, that should be roughly /ˌkɜːrˈseɪd/ “curse-ay-d”. (Think “café” or “sashayed”.)

The stylised pronunciation being evoked is roughly /ˈkɜːrˌsɛd/, “curse-ed”, and would be written with a grave accent: “cursèd”.

rzzzt 3 hours ago

Italians will get close to the first pronunciation both ways, I think. The Zalgo line noise is an international way of signaling the level of curse in writing.

tialaramex 4 hours ago

The list gets very woolly by the end. CHERI exists (though not at volume), Cornucopia Reloaded is a research paper, "plus some techniques to prevent use-after-free on the stack" is entirely hand waving.

It is really good as food for thought though.

pizlonator an hour ago

The way you make garbage collection deterministic is not by doing regions but by making it concurrent. That’s increasingly common, though fully concurrent GCs are not as common as “sorta concurrent” ones because there is a throughput hit to going fully concurrent (albeit probably a smaller one than if you partitioned your heap as the article suggests).

Also, no point in calling it “tracing garbage collection”. Its just “garbage collection”. If you’re garbage collecting, you’re tracing.

jakewins an hour ago

Do you have any recommended reading material on this?
Intuitively it feels like making it concurrent should do the opposite of making GC deterministic! I’d love to read something showing that intuition is wrong
- pizlonator 3 minutes ago
  
  Garbage collection handbook
  https://gchandbook.org/
  If you want to see my latest concurrent GC, see
  https://github.com/pizlonator/llvm-project-deluge/blob/delug...
  https://github.com/pizlonator/llvm-project-deluge/blob/delug...

hawski 5 hours ago

That is very informational. Thank you.

I am interested in Vale and it feels very promising, though because my interested in bootstrapping I don't like that it is written in Scala. I know, that is shallow, but that's a thing that limits my enthusiasm.

If you are like me and don't like jumping around between notes and text and you prefer to read the notes anyway, here is a little snippet you can run in Web Inspector's Console:

  document.querySelectorAll(".slice-contents a[data-noteid]").forEach(e => {document.querySelectorAll('.slice-notes [data-noteid="' + e.attributes["data-noteid"].nodeValue + '"] p').forEach(p => {p.style.fontSize = 'smaller'; e.parentNode.insertBefore(p, e)}); e.remove() })

It will replace note links with notes themselves making them smaller, because they will not always fit smoothly.

willvarfar 6 hours ago

Meta comment, but I really like the formatting of the blog post!

It reminds me of the early days of the web, when text was king and content was king. I particularly like the sidenotes in the margins approach.

(Hope the author sees this comment :) Hats off)

_kb an hour ago

Side notes are a great layout for most deeper reads.
There's some great tooling for that via https://edwardtufte.github.io/tufte-css/ and https://tufte-latex.github.io/tufte-latex/.
dgan 2 hours ago

I am sorry, I am maybe dumb but i can't see the 14 techniques been listed anywhere? Where do i even click?
- hawski 2 hours ago
  
  You need to read the post.
  > Wait a minute, this list goes to 17, yet the intro only mentions 14! I actually did that because a couple might overlap and a couple of them are half-approaches, and that last one is just here for fun. Besides, as I learn more approaches and add them to the list, the title will get more and more out of date anyway.
brabel 3 hours ago

Yeah the author always uses this in his blog about his language, Vale (which is very unfortunately not being developed anymore, at least for now). The other posts are also worth a read: https://vale.dev/
- ivell 3 hours ago
  
  He now works on Mojo, to bring linear types into Mojo.

DanielHB 4 hours ago

I am not experienced with rust and borrow checkers, but my impression is that borrow checkers also statically ensures thread/async safety while most other memory safety systems don't. Is this accurate?

kibwen 2 hours ago

The borrow checker is only one component of the means by which Rust statically enforces thread safety. If you design a language that doesn't allow pointers to be shared across threads at all, then you wouldn't need a borrow checker. Likewise if you have an immutable-only language. What's interesting about Rust is that it actually supports this safely, which is still unbelievable sometimes (like being able to send references to the stack to other threads via std::thread::scoped).
- nemaar 21 minutes ago
  
  > If you design a language that doesn't allow pointers to be shared across threads at all, then you wouldn't need a borrow checker.
  Is that actually true? I'm pretty sure you need the borrow checker even for single threaded Rust to prevent use after frees.
PeterWhittaker 2 hours ago

The first part - that the Rust borrow checker and overall memory model ensures thread/async safety - is true. I cannot speak to the second part - that other systems don't have this assurance.
- tialaramex an hour ago
  
  Just the borrowck isn't enough, you need the Send and Sync marker traits. Marker traits are something lots of languages could do but they'd be useless (or always unsafe) without a lot of other machinery Rust had already.
- DanielHB an hour ago
  
  > that other systems don't have this assurance
  My understanding is that most (all?) GC languages are memory safe, but do not ensure statically verifiable thread safety at all. Like Java, Go, C#, Python, etc.

mgaunard 5 hours ago

The fact that re-using a slot for a different object of the same type is considered a memory safety technique is ridiculous.

obl 3 hours ago

It is not ridiculous at all. Those things have pretty precise definitions and type segregation absolutely does remove a bunch of soundness issues related to type confusion.
You can think of it as the rather classic "Vec of struct + numeric IDs" that is used a lot e.g. in Rust to represent complex graph-like structures.
This combined with bound checking is absolutely memory safe. It has a bunch of correctness issue that can arise due to index confusion but those are not safety issues. When combined with some kind of generational counters those correctness issue also go away but are only caught at runtime not at compile time (and they incur a runtime cost).
Rust's memory safety is about avoiding liveness issues (that become type confusions since all memory allocators will reuse memory for different types), nothing more, nothing less.

alexisread 3 hours ago

Previous discussion: https://news.ycombinator.com/item?id=40146615 https://news.ycombinator.com/item?id=41974185

nahuel0x 6 hours ago

It's surprising to see an article with such a large encompassing of different techniques, hybrid techniques and design interactions with the type system, but is more surprising that a whole dimension of memory (un)management was left out: memory fragmentation

Quekid5 4 hours ago

It's probably because fragmentation isn't a safety issue. (In the sense of 'safety' being discussed here.)
- galangalalgol 3 hours ago
  
  It doesn't create UB, but it is something safety critical software has to address.
  - Quekid5 an hour ago
    
    ... which is why I had that little bit at the end there.

amelius 3 hours ago

I like many of the ideas of Rust, but I still think it is an unsuitable language for most projects.

The problem is that it is very easy to write non-GC'd code in a GC'd language, but the other way around it is much much harder.

Therefore, I think the fundamental choice of Rust to not support a GC is wrong.

3836293648 2 hours ago

You've got this one wrong. Rust is designed for a specific use case. Most projects are not that use case. Therefore the choice to use Rust is wrong.
If GC is an option and you want all the nice parts of Rust, use OCaml
- amelius 9 minutes ago
  
  > Most projects are not that use case. Therefore the choice to use Rust is wrong.
  Do you think that projects that have a large GUI component should be written in Rust?
  What if a project has both a "systems" and a GUI component to it?
- pjmlp 2 hours ago
  
  If Go designers weren't so much anti-modern language design, in many scenarios where people are rushing to do RIIR, they would be better served with Go, if those modern features were part of the language.
  Having said that, there are still OCaml (as you noted), Haskell, .NET languages with Native AOT, JVM languages with GraalVM/OpenJ9, D, Nim, Swift, ....
  And if one wants to get fancy with type systems, Idris, Dafny, FStar,..
- amelius an hour ago
  
  > If GC is an option and you want all the nice parts of Rust, use OCaml
  So are you saying it would be possible to use a hypothetical "non-GC-enabled" OCaml compiler that complains if GC'd code is invoked/generated, and it would be a similar experience as using Rust?
- in-pursuit 2 hours ago
  
  I think OPs general point, although maybe not what they stated is correct: it’s easy to write GC’d code. It’s “easy” to write code with manual memory management. It’s “easy” to write RC code. But it’s hard to write borrow checker code. And that will probably limit adoption, even though the goals of Rust are good.
f1shy 3 hours ago

Yet another PoV: for some things with critical timing or so, GC might be a problem. But most of the time, it isn’t. The performance/predictability topic could also be reviewed…
I was talking with a colleague about that, he said “in C I know exactly where things are when” And I replied that under any OS with virtual memory, you have basically no clue where are things at any time, in the N levels of cache, and you cannot do accurate time predictions anyway… [1]
I’m convinced today GC is the way to go for almost all. And I was until 5 years ago or so, totally opposed to that view.
[1] https://news.ycombinator.com/item?id=42456310
- pjmlp 2 hours ago
  
  Even with critical timing, real time GCs exist for decades now, PTC and Aicas are two surviving companies selling software tooling for embedded markets, including their own JVM implementations, with AOT compilers, bare metal deployments and real time GC.
  Many of their customers are factory processes and military deployments with weapons control, two scenarios where any kind of stall might produce deadly results.
FridgeSeal 3 hours ago

It does have a GC.
It just runs at compile time. Bonus feature, it helpfully prevents a number of common bugs too.
- amelius 3 hours ago
  
  GC is short for automatic GC.
  If you have to do it yourself, then it does not "have" a GC.
  - FridgeSeal 2 hours ago
    
    Ssshhh you’re ruining my silly and “hasn’t gone over terribly well” joke.

xmcqdpt2 2 hours ago

Not a fan of the framing of the article. Firstly, there are millions of Mayans alive today,

https://en.wikipedia.org/wiki/Maya_peoples

and secondly, the reason why the pre-Colombian cultural texts and script are not in use today, even by the people who speak the 28 Mayan languages currently in use, is because of genocide by Columbus and those that followed. The Catholic church destroyed every piece of Mayan script they could get their hands on.

The article reads like the author is not aware of these basic facts of American geography and history.

4ad 2 hours ago

> Interaction nets are a very fast way to manage purely immutable data without garbage collection or reference counting.[...] HVM starts with affine types (like move-only programming), but then adds an extremely efficient lazy .clone() primitive, so it can strategically clone objects instead of referencing them.

This is wrong, Interaction nets (and combinators) can model any kind of computational systems, including ones that use mutation. In fact, ICs are not really about types at all, although they do come from a generalization of Girard's proofs nets, which came from work in linear logic.

The interesting thing about ICs is that they are beta-optimal (any encoding of a computation will be done in the minimum number of steps required -- there is no useless work being done), and maximum-parallel with only local synchonization (all reduction steps are local, and all work that can be parallelized will be parallelized).

Additionally ICs have the property that any encoding of a different computational system in ICs will preserve the asymptotic behavior of all programs written for the encoded computational system. In fact, ICs are the only computational system with this property.

Interaction nets absolutely require garbage collection in the general sense. However, interaction combinators are linear and all garbage collection is explicit (but still exists). HVMs innovation is that by restricting the class of programs encoded in the ICs you can get very cheap lambda duplication and eschew the need for complex garbage collection while also reducing the overhead of implementing ICs on regular CPUs (no croissants or brackets, see Asperti[1] for what that means).

Having a linear language with the above restriction allows for a very efficient implementation with a very simple GC, while maximizing the benefits of ICs. In principle any language can be implemented on top of ICs, but to get most benefits you want a language with these properties. It's not that HVM starts with affine types and an efficient lazy clone operation, it's that a linear language allows extremely efficient lazy cloning (including lambda cloning) to be implemented on top of ICs, and the result of that is HVM.

> The HVM runtime implements this for Haskell.

This is very wrong. HVM has nothing to do with Haskell. HVM3 is written in C[2], HVM2 has three implementations, one in C[3], one in Rust[4], and a CUDA[5] one. HVM1 was just a prototype and was written in Rust[6].

HOC[7], the company behing HVM provides two languages that compile to HVM, Bend[8], and Kind[9]. Bend is a usual functional language, while Kind is a theorem prover based on self types.

Haskell is not involved in any of these things except that the HVM compiler (not runtime) is written in Haskell, but that is irrelevant, before Haskell it used to be written in TypeScript and then in Agda (Twitter discussion, sorry, no reference). It's an implementation detail, it's not something the user sees.

Please note that HVM adds some stuff on top of ICs that makes it not strictly beta-optimal, but nevertheless the stuff added is useful in practice and the practical downgrade from theoretical behaviour is minimal.

[1] Andrea Asperti, The Optimal Implementation of Functional Programming Languages, ISBN-13: 978-0060815424

[2] https://github.com/HigherOrderCO/HVM3/blob/main/src/HVML/Run...

[3] https://github.com/HigherOrderCO/HVM/blob/main/src/hvm.c

[4] https://github.com/HigherOrderCO/HVM/blob/main/src/hvm.rs

[5] https://github.com/HigherOrderCO/HVM/blob/main/src/hvm.cu

[6] https://github.com/HigherOrderCO/HVM1

[7] https://higherorderco.com

[8] https://github.com/HigherOrderCO/bend

[9] https://github.com/HigherOrderCO/kind

andrewstuart 8 hours ago

I’d love to see a language that kept everything as familiar as possible and implement memory safety as “the hard bit”, instead of the Rust approach of cooking in multiple different new sub languages and concepts.

pornel 3 hours ago

Safety is not an extra feature a'la carte. These concepts are all inter-connected:
Safety requires unions to be safe, so unions have to become tagged enums. To have tagged enums usable, you have to have pattern matching, otherwise you'd get something awkward like C++ std::variant.
Borrow checking works only on borrowed values (as the name suggests), so you will need something else for long-lived/non-lexical storage. To avoid GC or automatic refcounting, you'll want moves with exclusive ownership.
Exclusive ownership lets you find all the places where a value is used for the last time, and you will want to prevent double-free and uninitialized memory, which is a job for RAII and destructors.
Static analysis of manual malloc/realloc and casting to/from void* is difficult, slow, and in many cases provably impossible, so you'll want to have safely implemented standard collections, and for these you'll want generics.
Not all bounds checks can be eliminated, so you'll want to have iterators to implement typical patterns without redundant bounds checks. Iterators need closures to be ergonomic.
…and so on.
Every time you plug a safety hole, it needs a language feature to control it, and then it needs another language feature to make this control fast and ergonomic.
If you start with "C but safe", and keep pulling that thread, nearly all of Rust will come out.
phicoh 7 hours ago

As a long time C programmer I like Rust because it combines two things from C that are important to me (low runtime overhead, no runtime system required) with a focus on writing correct programs.
Memory safety is just one aspect where the compiler can help making sure a program is correct. The more the compiler helps with static analysis, the less we need to rely on creating tests for edge cases.
- zamalek 6 hours ago
  
  > Memory safety is just one aspect
  I feel as though not enough attention is given to how std is designed. For example: [u8], str, Path, and OsStr may be confusing at first, but when you understand why they are there any other approach feels icky. std guides you down a path of caring about things that really should matter (at least if you're only unwrapping provably safe values).
  Have you considered what happens if not-utf8 data winds up in an environment variable that you are writing to stdout? What if it contains malicious VT commands?
  - shiomiru 5 hours ago
    
    > Have you considered what happens if not-utf8 data winds up in an environment variable that you are writing to stdout? What if it contains malicious VT commands?
    Unless you're talking about terminal bugs in parsing invalid UTF-8 - and parsing invalid UTF-8 is easier than rendering valid UTF-8 - VT commands are UTF-8 compatible. You just need to embed an ASCII escape character.
nicoburns 4 hours ago

Familiar to whom? I came from a JavaScript background, and Rust's syntax and "functional lite" style felt very familiar.
ramon156 7 hours ago

Tbf I would already consider a different language when it has al the nice syntax sugar and design choices of Rust. I like almost every choice they made, and I miss things like `if let Some` or unwrapping in other languages. It's just not the same
frou_dh 5 hours ago

"Familiar" is subjective so it's not really something to hang your hat on.
brabel 8 hours ago

The author of the post is trying pretty much that with his language, Vale.
karmakurtisaani 5 hours ago

Isn't that just C/C++?
chrismorgan 6 hours ago

As an experienced Rust developer, I have absolutely no idea what you mean by this. Could you write a little more about what you have in mind, and even what you mean by sub-languages and concepts in Rust?

nemetroid 4 hours ago

No mention of RCU?

bluGill an hour ago

Why is garbage collection called memory safety? Garbage collection in whatever form is only memory safe if it doesn't free memory that will still be used. (which means if you actually get all your free calls correct C is memory safe - most long lived C code bases have been beat on enough that they get this right for even the obscure paths).

Use after free is important, but in my experience not common and not too hard to track down when it happens (maybe I'm lucky? - we generally used a referenced counted GC for the cases where ownership is hard to track down in C++)

I'm more worried about other issues of memory safety that are not addressed: write into someone else's buffer - which is generally caused by write off the end of your buffer.