The reason I'm negative is the entire article has zero detail on WTF this instruction set is or does. The best you can do is guess from the name of the instruction set.
Compare the linked iPhone article to this blog and you'll quickly see the difference. There's very real discussion in the MTE article of how the instructions work and what they do. This article just says "Memory safety is hard and we'll fix it with these new instructions that fix memory safety!"
So there's a long intellectual history behind these technologies, and Intel had multiple chances of taking the leadership on this around 2018 - they failed to do so, some of the talent went to Apple, and now Intel has to play catch-up.
I'm pretty certain it'll be the x86 variant of either MTE or MIE.
According to this: https://www.devever.net/~hl/ppcas the POWER approach is not a true hardware capability architecture (“nothing about these ISA extensions provides any kind of security invariant against a party which can generate arbitrary machine code”). It's just something that helps software to store one bit per 128 bits of data on the side (plus some other weirdness about load-with-offset instructions).
(SPARC ADI is similar, machine code is still trusted.)
A lot of these extensions come from Intel/AMD/etc clients first, and because of how long it takes a thing to make it into mainstream chips, it was probably conceived of and worked on at least 5 years ago, often longer.
This particular thing has a long history and depending on where they worked, they know about that history.
However, they are often covered by extra layers of NDA's on top of whatever normal corporate employee NDA you have, so most people won't say a ton about it.
Probably because it's very likely that both AMD and Intel have had engineers working on this sort of thing for a long time, and they're now deciding to collectively hash out whatever the solution is going to be for both of them.
I don't know if it is intended this way, but there's one useful outcome even with the limited amount of detail disclosed:
There are industry partners who work closely with AMD and Intel (with on-site partner engineers etc.), but who are not represented in the x86 ecosystem advisory group, or maybe they have representation, but not at the right level. If these industry partners notice the blog post and they think they have technology in impacted areas, they can approach their contacts, asking how they can get involved.
Yeah it's the most succinct explanation I've seen of weird machines and memory tagging. Definitely bookmarking this one. I wonder if video of the talk that presumably presented this is available.
Is there a comparison of memory tagging designs for different architectures (POWER, SPARC, CHERI/Morello, Arm MTE/eMTE, Apple MIE, x86, RISC-V)? e.g. enforcement role of compiler vs. hardware, opt-in vs mandatory, hardware isolation of memory tags, performance impact, level of OS integration?
Presumably will be based on the existing Linear Address Masking/Upper Address Ignore specs, which are equivalent, and will be similar to CHERI.
If so it needs to be opt-in or at least opt-out per process, because many language runtimes use these pointers bits to optimize dynamic types, and would suffer a big performance hit if they were unable to use them.
I would not assume they just use bits in the address word for the tag.
LPDDR6 includes 16 bits of host-defined metadata per 256 bits of data. Systems can use that for ECC, and/or for other purposes, including things like tagging memory. DDR6 will likely include a similar mechanism.
SECDED ECC requires 10 bits, leaving you 6 bits. That's enough for one bit per aligned address word, which is probably used to denote "valid pointer" like CHERI.
Dynamic types have classically used the lower bits freed by alignment constraints. If I know a cons cell is 16 bytes then I can use the low 4 bits of an address to store enough type info to disambiguate.
There's a technique known as "NaN boxing" which exploits the fact double precision floats allow you to store almost 52 bits of extra data in what would otherwise be NaNs.
If you assume the top 16 bits of a pointer are unused[1], you can fit a pointer in there. This lets you store a pointer or a full double by-value (and still have tag bits left for other types!).
Last I checked LuaJIT and WebKit both still used this to represent their values.
[1] On amd64 they actually need to be sort of "sign extended" so require some fixup once extracted.
> On amd64 they actually need to be sort of "sign extended" so require some fixup once extracted.
Pointers need to be canonical if LAM/UAI is not enabled. The simplest way to do it is to shift left by 16, then shift arithmetic right by 16. (Or 7 if using 5-level paging). Alternatively, you can store the pointer shifted left by 16 bits, and have the tag in the lower 16 bits, then canonicalizing the pointer is just a single shift-arithmetic-right. If combining with NaN-boxing, then you rotate right to recover the double. (Demo: https://godbolt.org/z/MvvPcq9Ej). This is actually more efficient than messing with the high bits directly.
With LAM/UAI, the requirement is that the 63rd bit matches the 47th (or 56th) bit, which gives 15-bits of tag space on LAM48 and 6-bits of tag space on LAM57.
With LAM enabled, care needs to be taken when doing any pointer comparison, as two pointers which point to the same address may not be equal. There have been multiple exploits with LAM, including speculative execution exploits.
If you restrict yourself to all variants of x86 and ARM, the number of high bits for which I could not find conflicting uses is 6 bits (bits 57-62). The other high bits are reserved in some hardware contexts and therefore may create conflicts.
Using 16 bits may be risky on recent x86. For example, IIRC Linux enables 5-level page tables on microarchitectures that support it, which can put valid address data in bits 48-56.
There is no guarantee that those 6 bits are safe either. They are just the only bits for which I could not find existing or roadmap usage across x86 and ARM sources when I last did a search.
> Using 16 bits may be risky on recent x86. For example, IIRC Linux enables 5-level page tables on microarchitectures that support it, which can put valid address data in bits 48-56.
Linux will not allocate past the 47-bit range, even with 5-level paging enabled, unless specifically requested, by providing a pointer hint to `mmap` with a higher address.
There's numerous techniques used. Many are covered in Gudeman's 1993 paper "Representing Type Information in Dynamically Typed Languages"[1], which includes low-bits tagging, high-bits tagging, and NaN-boxing.
The high bits let us tag more types, and can be used in conjunction with low bits tagging. Eg, we might use the low bits for GC marking.
Depends on the architecture. Top bit usage lets you do what the hardware thinks if as an 'is negative' check for very cheap on a lot of archs for instance.
Dynamic languages usually come with their own memory manager. They can come up with their own alignment constraints. That being said, most contemporary (Linux) architectures require that malloc returns 16 byte alignned pointers. Some mallocs only promise this for allocations larger than 8 bytes, though (and I think the C standard was updated to permit that).
For memory allocation, POSIX (posix_memalign) has been guaranteeing alignment since 2001. C11 added equivalent functionality (aligned_alloc). C++17 incorporated it (std::aligned_alloc) as well.
C23 also has `alignas` and `alignof` (`_Alignas`/`_Alignof` in C11 with the lowercase as macros in stdalign.h), and also provides `aligned_alloc` and `free_aligned_size` in stdlib.
You don't need the hardware UAI/LAM to make use of the high pointer bits. The most common technique is to use `shl reg, 16; sar reg, 16`, which will shift in ones or zeros from the left depending on the 47th bit.
Several runtimes use high bits tagging combined with NaN-boxing, and have been doing so since before LAM/UAI existed.
I thought one of Lua’s selling points is that it’s written in highly standard-compliant C code. I wouldn’t expect it to do anything non-portable like bitwise manipulation of pointers?
It's fine to use techniques like this in standard C provided you guard them with the preprocessor and provide a fallback option on unsupported hardware.
PUC Lua doesn't use these tricks; every value in the VM is represented with a POD struct with explicit tag and value members. The OP may have been thinking of LuaJIT, which uses NaN boxing, but not pointer tagging.
PUC Lua does rely on two's complement integer representation, though, as well as long long, while nominally adhering to C90, so not quite strictly conforming.
It seems very strange to me to finally get around to this right as we are finally getting low level software that no longer needs it (and we've had high level software that doesn't need it for ages). At this point I think I'd prefer the transistor budget and bits of memory were spent on other things.
We have had it before C was even invented, Burroughs nowadays still sold as Unisys ClearPath MCP was written in ESPOL, latter NEWP, with zero Assembly.
The compiler provides intrisics, has bounds checking for strings and arrays.
PL/I and its variants were also used across several systems, as were ALGOL dialects.
Note C.A.R Hoare Turing award speech in 1980,
"A consequence of this principle is that every occurrence of every subscript of every subscripted variable was on every occasion checked at run time against both the upper and the lower declared bounds of the array. Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to--they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980 language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law."
The "1980 language designers and users have not learned this lesson" is meant to be C without explicit refering to it.
There are hundreds of billions of lines of code of critical software[1] written in unsafe languages, that is not going to be rewritten any time soon. Adding memory safety "for free" to such software is a net positive.
Current CPUs are limited by power, transistors are essentially free.
[1] often the VMs of safer higher level languages, in fact.
> we are finally getting low level software that no longer needs it
We're not, though. There's a little bit of low-level software being written in Rust (and even that requires a non-trivial amount of unsafe code), but most new low-level software is being written in C++ or C. And even if a more popular safe low-level programming language arrived tomorrow and gained a more respectable adoption, that still wouldn't be fast enough because of all the existing software.
> we are finally getting low level software that no longer needs it
Ada has had memory safety for decades – not to mention Lisp, Java, etc. if you can live with garbage collection. Even PL/I was better than C for memory safety, which is why Multics didn't suffer from buffer overflows and Unix did. But the Linux kernel (along with lots of other software which we would like to be reliable) is still mostly written in C, for better or worse.
Only with spark (i.e. formal verification). Which similar to other projects of this age (e.g. Rocq/How the compcert C compiler was implemented and proved correct) seems not to be low enough friction to get widescale adoption.
> not to mention Lisp, Java, etc. if you can live with garbage collection.
Like I said, high level languages that won't benefit from this at all have existed for ages... and the majority of software is written in them. This is one of the stronger arguments against it...
> But the Linux kernel (along with lots of other software which we would like to be reliable) is still mostly written in C, for better or worse.
Fil-C shows this can be solved at the software layer for things in this category that can afford the overhead of a GC. Which does mean a larger performance penalty than the hardware proposal, but also more correctly (since hardware changes can never solve unintended compilations resulting from undefined behavior).
The linux kernel is probably an example of an actual long tail project that would benefit from this for a reasonably long time though, since it's not amenable to "use a modified C compiler that eliminates undefined behavior with GC and other clever tricks" and it's unlikely to get rewritten or replaced with a memory safe thing quickly due to the number of companies collaborating on it.
> Fil-C shows this can be solved at the software layer for things in this category that can afford the overhead of a GC.
Mainline clang and g++ are also getting better with things like -fbounds-safety and -fsanitize=address. As I understand it, they typically have some overhead, but I'm willing to accept that overhead to have a kernel, web browser, etc. without memory errors. The decision that memory safety is too costly seems to have been made when CPUs were orders of magnitude slower than they are today. Hopefully hardware support will reduce the overhead to negligible proportions and enable memory safety as a default rather than an esoteric add-on or proprietary feature.
Nah, it was only a UNIX thing and C, see the world of systems languages and OS written in them outside Bell Labs.
Had UNIX and C a price tag on their source code comparable to the competition, instead of a symbolic price and an annotated source code book, history would have played a different music.
Ada's original memory safety was still a lot better than C's. As noted, PL/I was not 100% memory safe, but it was good enough to prevent buffer overflows in Multics.
Hence why Objective-C got GC, after its failure to play well with C semantics, got replaced with ARC, and afterwards Swift came to be.
Microsoft now has a new policy in place, via the Secure Future Initiative, that only exiting code bases should be kept in C and C++, all new projects are to either use managed languages or Rust.
macOS is (a) Unix (officially even!) and inherits many of its features and issues.
To Apple's credit though they seem to be using a memory-safe language (Swift) for new code and libraries (at least at user level) and may be rewriting old code as well, and they have also added MIE/EMTE to Apple Silicon. They also ship clang/clang++ with support for -fbounds-safety and -fsanitize=address.
Objective-C also supports Automatic Reference Counting, which helps with memory management. (Apple also implemented a garbage collector for Objective-C 2.0, but abandoned it in favor of ARC. I am aware that reference counting is technically a form of garbage collection.)
The reason being, as can be seen on the archives, the conservative tracing GC had several gotchas to work with existing code, thus segfaults were common.
The way ARC works in Objective-C, by automating retain/release call patterns already required by existing frameworks was much safer to implement, without such crashes.
Similar to all those C++ smart pointers automating COM reference counting.
Sure but nobody is actually writing foundational software (as we are now calling it) in Lisp, Java or Ada (and it also has no good answer for use-after-free which is a huge class of vulnerabilities).
This is the first point in history where it's actually feasible in real non-research life to write an entire system in only memory safe languages. (And no, the existence of `unsafe` doesn't invalidate this point.)
I see plenty of foundational software in the biggest mobile OS, IoT devices and cloud computing infrastructure.
Ada only has use-after-free if unchecked deallocation is used, since we are way beyond Ada83, alternatives do exist in Ada 2022.
If anything we will only get more foundational software in safer languages, when the generation that only accepts C and C++ for specific domains is no longer among us.
Unfortunately to me as well, it isn't something I will be able to witness.
> Ada only has use-after-free if unchecked deallocation is used
You mean if you just never deallocate? Or is there a third option? Genuine question; I don't follow Ada closely.
> If anything we will only get more foundational software in safer languages, when the generation that only accepts C and C++ for specific domains is no longer among us.
I'm more optimistic - the Rust in Linux people are making progress and that's probably the thickest den of naysayers. Uutils is actually being used in Ubuntu (and sudo-rs I think?).
It'll probably take a long time until Rust outweighs C but I think we're talking 10-20 years not 30-40.
"Needs" is a strong word, would benefit from a bit, but in practice I think the number of vulnerabilities rust code typically has is not large enough to justify the expense of compromising the performance of every CPU ever sold (thus requiring more, consuming more energy, etc).
There's also been steady progress towards creating systems to prove unsafe rust correct - at which point it wouldn't even benefit from this. For example see the work amazon has been sponsoring to prove the standard library correct: https://github.com/model-checking/verify-rust-std/
A good chunk of Rust code often ends up linking in a C/C++ library where it’s still a concern (and this is ignoring that unsafe Rust is actually harder and more unsafe than C currently).
More importantly there’s millions if not billions of existing lines of C/C++ not least of which is the VMs for “memory safe” languages like Java. There’s huge value add in automatically adding security for a fractional CPU cost since the world won’t be rewritten into Rust anytime soon.
it's like the end scene in fight club. except instead of credit card company office towers it's the borrow checker and associated skyscrapers that symbolize the ascent of rust that are going down in flames as the x86 antiheroes high five and the pixies start crooning "where is my mind" to rolling credits over the burning cityscape.
There is server hardware out there now that in theory can support MTE, but I don't know if there's commercial support for it. MTE needs to be set up by the firmware, it's not purely an OS/kernel matter.
GrapheneOS (hardened Android distribution) also has it enabled by default for the base OS and user-installed Apps that support it (you can also force it for all apps) on 8th Gen Google Pixels and newer
I agree it would’ve happened no matter what, it’s a very useful feature.
I do wonder if Apple‘s announcement that they started shipping it may have pushed them to announce this/agree to a standard earlier than they would have otherwise.
It would be nice to know how this memory safety instructions should be used by software developers. Assuming I write C++ code, what should I do? Enable some new compiler flags? Use special runtime library? Use some special variant of the language standard library which uses these new instructions? Completely rewrite my code to make it safe?
If ARM's memory tagging is a guide, not much for the general developer. You will be able to run with address sanitizers enabled at a much lower overhead. Perhaps, use some hardened allocators or library variants that rely on the extension.
If it's anything like CHERI, you need to make sure to follow pointer provenance rules properly (called "strict provenance" in Rust) and then just recompile your program with some extra flags. Only low level memory-related things like allocators and JITs need any significant source code changes.
fwiw "knee-jerk reaction to Apple MIE" is not exactly the right characterization of this. MPX existed and faded away, and it's not very surprising that x86-world would wait for someone else to try shipping hardware support for memory safety features before trying again.
I wouldn't say that's fair. MPX failed because it was a very problematic solution to this problem.
MPX had a large (greater than 15-20%) overhead and was brittle. It didn't play well with other x86 instructions and the developer experience was extremely poor (common C and C++ design patterns would cause memory faults with MPX).
Apple MIE (which is essentially ARM MTE v2) and MTE on the other hand have near invisible levels of overhead (~5-10%) with the ability to alternate between synchronous and asynchronous tracing of faults where the latter has a much lower overhead than the former (allowing you to check in production for very little overhead and get better debugging in test). They also work essentially seamlessly with the rest of the ARM ecosystem and it takes very little to integrate the functionality into any language ecosystem or toolchain.
If MPX was comparable with MTE, it certainly would have gotten the adoption that MTE is getting but the "tech" just wasn't there to justify it's use.
I'm not arguing MPX was a good solution, just that it's silly to assume folks designing x86 machines have been totally ignoring developments in that space for the past ten years.
Apple announces MIE, then Intel and AMD say they have something similar except they don't actually have something similar, only plans to eventually implement it, but they're advertising it as if they do already have it. That sounds like super blatant "panicking to copy Apple" to me.
The submission title goes "Intel and AMD standardize ChkTag ..." but the actual text says "Intel and AMD are working together, along with their ecosystem partners in the EAG, to address the need for memory safety. They are creating a unified specification ..." (emphasis mine). They don't even have a specification yet, let alone an implementation, but they want to make PR waves about it already. This is so funny (and sad).
I’m sure it was, Intel has tried this before. It’s a good idea.
But it wouldn’t surprise me if Apple‘s announcement they were shipping it already pushed Intel and AMD to agree on implementation details so they could announce this and get it going.
They wouldn't announce it with zero results nor even any actual specification to show unless they were trying to show it off as soon as possible. Otherwise they could just wait until they, you know, actually did something, to announce it.
Would this imply an architecture similar to what Lisp-Machines once had ? That'd be a great addition IMO, and would speed up a lot of dynamic-ish languages without resorting to unsafe-routes for speed.
Now they just need to agree to implement ECC everywhere instead of using it as a product differentiator, so we can reduce the amount of random issues caused by memory and bus errors.
This is an oft-repeated misunderstanding. DDR5 memory uses error correcting codes internally to correct on-die errors, but this does not defend against errors on busses between the DIMM and memory controller. For that the old scheme of extra chips to store additional ECC data is still the only way.
> It actually does protect against errors on access, but support is optional.
Can you explain how exactly the on-die ECC capability can help protect data in transit? What is the optional functionality you're referring to, if not the traditional sideband ECC achieved by adding another chip's worth of data lines to every channel?
Not really. ECC memory will have an extra ram chip, and store an extra bit per byte or so, for that error detection/correction. DDR5 only has error-correction bits added to the bus, regular DDR5 doesn't have extra chips/bits for error correction of the data while it is stored.
But also, what you really want is ECC that reports all the way up to the OS the corrected and un-corrected bits. This is how you know if it's on the edge, becoming a real problem. Otherwise, it works fine until it doesn't shrug which is the same as regular normal memory.
I think the ECC added to the DDR5 bus is kinda just enough to get the higher data-rate signaling to be as reliable as DDR4. It's nice for marketing to put ECC on the DDR5 box but it's not more robust than DDR4.
Looks like this is in response to the Apple paper from a couple of weeks ago about memory tagging. Excellent news even if this wasn't pushed along by Apple.
MTE is as much a detection tool as it is a relied on layer of defence. I read that it has a 15 out of 16 chance to catch an error. Which means even if you do find a bug, it's going to be logged back to the OS vendor essentially the first time you use it. Getting patched permanently in the source.
Memory coloring (what is to my knowledge being proposed) is certainly probabilistic however it really does a lot towards memory hardening more than memory safety. It makes successfully pulling off ROPs, etc far harder.
Even if you are using a memory safe language, memory bugs still pop up in various places (cough FFI & ABI cough) so this is a massive step in the right direction towards blocking attacks even when the developers have done "all the right things" short of formally verifying their stack from top to bottom (and even then).
I've been meaning to ask you what the motivation of your project is? Why would you want a safe-c? When I saw the headline I was worried that all my runtime code would break (or get slower) because I do some very unsafe things (I have a runtime that I compile with -nostdlib)
I'm also tempted to write a commercial C++ compiler, but it feels like a big ask, paying for a compiler sounds ridiculous even if it reduces your compile time by 50x
That's the best answer. Just wondering because you have a lot of experience and many people disagree with me.
How realistic is a C++ compiler that's 50x faster than clang for debug builds? Assuming 1) there's enough work for each core 2) there's so many files/headers that a unity build takes 1/4th of the time as a rebuild using individual files (this theoretical compiler would be multi-threaded)
I hope there are OS level (ie kernel build options) to turn this kind of thing off or just ignore the 'tags'. I know it's important for corporate use cases and monetary transactions and all that, but on my personal computer I use for fun I want to be able to peek and poke.
If you're worried that this is going to prevent you from peeking and poking, I think you're mistaken. This is to protect a process against itself, not the outside. It will also likely be a single bit flip away of being unenforced at runtime, as most x86 protections already are to be to make debugging tools feasible.
By the way, there are already systems in place to prevent you from accessing certain memory zones. Yes, even on Linux, it's possible to make memory regions inaccessible even to root or the kernel itself. The time to be worried about that was 10 years ago.
"The time to be worried about that was 10 years ago."
And is everyone just expected to (metaphorically) lay down and die? You don't stop fighting for freedom just because you lost a battle... they certainly don't, and you shouldn't either!
I have a patch set to removes virtual memory interprocess blocking for linux 5.6 that I run as kernel on a couple of my machines. I'm just hoping that this is patchable out too.
No it's not. Here is the AI summary of why it isn't.
Breakdown of the Error
Less is used for uncountable quantities (e.g., "less water," "less time," "less anger").
Lest is a conjunction that means "for fear that" or "to avoid the possibility of." This is precisely the meaning the writer intends.
Here are a few ways to write the sentence correctly, depending on the desired level of formality:
Direct Correction (Best for preserving the original tone):
"Had to find some way to use 'AI' in a press release, lest the stock gods get angry and vengeful."
Slightly More Formal:
"We had to find some way to use 'AI' in a press release for fear that the stock gods would get angry and vengeful."
Using "or" (Common modern alternative):
"Had to find some way to use 'AI' in a press release, or the stock gods will get angry and vengeful."
Your AI explanation offers nothing in the way of why it's not fine. It just asserts that it's not fine. I agree with the AI, but your summary is wrong.
I don't have any hopes WG14 will ever care to improve C's safety beyond what is possible manually writting Assembly code, it is even worse because Assembly doesn't have time travel with UB.
Like, cool, you guys are starting to talk about a new instruction set that will make C safe somehow. Yet you failed provide an ounce of detail for how you'll accomplish that.
This might as well been a "And we'll make our CPUs 10x faster and they'll use 10x less power!". Or "Future CPUs will have a 10ghz clock speed!"
Again, who is this article for? The government maybe to assure them that x86 will take cyber security seriously?
I'm happy another old hardware nerd got that dated reference :D.
I was convinced back in the day that Larrabee would change the world. It seemed like such an amazing technology especially since multi-core CPUs were just starting to take off in consumer hardware.
Honestly it could have, if Intel had invested in the platform for more than a few generations and built the software ecosystem around it. It had on package HBM before it was cool with all the modern AI GPUs and APUs. Intel had the opportunity to build the CUDA ecosystem, and chose to shelve it. I'll never understand the obsession that new platforms must be profitable immediately. Seems they take ~10 years to develop, and planning for that should be part of what's expected before the decision is made. Gelsinger seemed to understand at least that much.
Well, they did half-heartedly try at it. Intel Phi was produced from 2010->2020.
I think the problem is that Intel pigeonholed the product, relegating it to just supercomputers. I also think Intel has historically done a bad job of supporting the software needed to power their hardware.
The reason CUDA was so successful (IMO) is because it was highly available and the software is a better quality to competitive software. OpenCL was supposed to be the answer to CUDA and ultimately it was just a weird, hard to work with, and minimally supported language.
I don't think that's all Intel's fault. Apple's dumb war against Khronos has really undermined a lot of progress for anyone doing GPGPU programming.
What technique? There's no technique described in the article. It's just a long article about why this is needed, an announcement of collaboration, and naming the set of instructions.
ARM has an extension called the "Memory Tagging Extension". It works by borrowing 4 bits per 16 byte block of memory to color the allocation. Pointers are given a color "key" inside the pointer during allocation and if that pointer's key isn't the right color as the underlying memory block's during deref, then the hardware throws an MTE segfault.
It's a neat system with pretty much no overhead and it's pretty easy to integrate into code as long as your underlying language and libraries are at least semi-intelligent about how they handle pointers and memory allocation (i.e. as long as you aren't doing any long range pointer punting then things "just work").
Right, but none of this is in the article. MTE is neat and it'll be interesting if AMD64 brings in a similar technique but none of that is actually described in the article.
Sure but given the description of ChkTag and given the new exceptions and interrupts model they established with FRED, I don't see ChkTag being anything other than an MTE-style lock-and-key color tagging solution.
The press article is just saying what the EAG is working on next now that they have shipped FRED, ACE, and AVX10. ChkTag is the current top priority item on the EAG's list and the standard for it should be finalised/released some time in the next few months. What they have done publicly so far is to say that ChkTag will be part of the ISA standard, they just haven't also revealed the details past that.
With all the negative comments here: This is existing technology on ARM64 (MTE) and on modern iPhones (https://security.apple.com/blog/memory-integrity-enforcement...).
For a good intuition why this (coupled with instrumenting all allocators accordingly) is a game-changer for exploitation, check https://docs.google.com/presentation/d/1V_4ZO9fFOO1PZQTNODu2...
In general, having this come to x86 is long-overdue and very welcome.
But wait, how do you know that's what this is?
The reason I'm negative is the entire article has zero detail on WTF this instruction set is or does. The best you can do is guess from the name of the instruction set.
Compare the linked iPhone article to this blog and you'll quickly see the difference. There's very real discussion in the MTE article of how the instructions work and what they do. This article just says "Memory safety is hard and we'll fix it with these new instructions that fix memory safety!"
So there's a long intellectual history behind these technologies, and Intel had multiple chances of taking the leadership on this around 2018 - they failed to do so, some of the talent went to Apple, and now Intel has to play catch-up.
I'm pretty certain it'll be the x86 variant of either MTE or MIE.
This is how, amongst other things, IBM POWER cpus do memory tagging for capability-based security on iSeries/OS400.
IIRC, later SPARC64 chips also had a version of this.
According to this: https://www.devever.net/~hl/ppcas the POWER approach is not a true hardware capability architecture (“nothing about these ISA extensions provides any kind of security invariant against a party which can generate arbitrary machine code”). It's just something that helps software to store one bit per 128 bits of data on the side (plus some other weirdness about load-with-offset instructions).
(SPARC ADI is similar, machine code is still trusted.)
ADI, since 2015, still shipping.
>But wait, how do you know that's what this is?
A lot of these extensions come from Intel/AMD/etc clients first, and because of how long it takes a thing to make it into mainstream chips, it was probably conceived of and worked on at least 5 years ago, often longer.
This particular thing has a long history and depending on where they worked, they know about that history.
However, they are often covered by extra layers of NDA's on top of whatever normal corporate employee NDA you have, so most people won't say a ton about it.
Probably because it's very likely that both AMD and Intel have had engineers working on this sort of thing for a long time, and they're now deciding to collectively hash out whatever the solution is going to be for both of them.
Better to have this than two sets of instruction, as is currently the case for virtualization entry/exit points on amd64 platforms.
A lot of this sort of thing come from AMD/Intel clients, rather than internally.
In this particular case, it definitely does :)
I don't know if it is intended this way, but there's one useful outcome even with the limited amount of detail disclosed:
There are industry partners who work closely with AMD and Intel (with on-site partner engineers etc.), but who are not represented in the x86 ecosystem advisory group, or maybe they have representation, but not at the right level. If these industry partners notice the blog post and they think they have technology in impacted areas, they can approach their contacts, asking how they can get involved.
The x64 Windows Kernel is starting to get support for this. There are a few references to memory tagging appearing in the public symbol files.
Wow that weird state machine doc is great! Thanks for sharing.
I’m lukewarm on this.
- It is long overdue and welcome.
- It won’t stop a sufficiently determined attacker because its probabilistic and too easy to only apply partially
Is this good? Yes. Does it solve memory safety? No. But does it change the economics? Yes.
Yeah it's the most succinct explanation I've seen of weird machines and memory tagging. Definitely bookmarking this one. I wonder if video of the talk that presumably presented this is available.
I don't know tbh, and I gave that talk when I was in terrible shape, so I'm not upset ;).
If people care a lot, I can record a YouTube video on the topic.
https://vimeo.com/252868605
Is there a comparison of memory tagging designs for different architectures (POWER, SPARC, CHERI/Morello, Arm MTE/eMTE, Apple MIE, x86, RISC-V)? e.g. enforcement role of compiler vs. hardware, opt-in vs mandatory, hardware isolation of memory tags, performance impact, level of OS integration?
> This is existing technology on ARM64 (MTE) and on modern iPhones (https://security.apple.com/blog/memory-integrity-enforcement...).
Previous discussion https://news.ycombinator.com/item?id=45186265
Sparse on details.
Presumably will be based on the existing Linear Address Masking/Upper Address Ignore specs, which are equivalent, and will be similar to CHERI.
If so it needs to be opt-in or at least opt-out per process, because many language runtimes use these pointers bits to optimize dynamic types, and would suffer a big performance hit if they were unable to use them.
I would not assume they just use bits in the address word for the tag.
LPDDR6 includes 16 bits of host-defined metadata per 256 bits of data. Systems can use that for ECC, and/or for other purposes, including things like tagging memory. DDR6 will likely include a similar mechanism.
SECDED ECC requires 10 bits, leaving you 6 bits. That's enough for one bit per aligned address word, which is probably used to denote "valid pointer" like CHERI.
Dynamic types have classically used the lower bits freed by alignment constraints. If I know a cons cell is 16 bytes then I can use the low 4 bits of an address to store enough type info to disambiguate.
There's a technique known as "NaN boxing" which exploits the fact double precision floats allow you to store almost 52 bits of extra data in what would otherwise be NaNs.
If you assume the top 16 bits of a pointer are unused[1], you can fit a pointer in there. This lets you store a pointer or a full double by-value (and still have tag bits left for other types!).
Last I checked LuaJIT and WebKit both still used this to represent their values.
[1] On amd64 they actually need to be sort of "sign extended" so require some fixup once extracted.
> On amd64 they actually need to be sort of "sign extended" so require some fixup once extracted.
Pointers need to be canonical if LAM/UAI is not enabled. The simplest way to do it is to shift left by 16, then shift arithmetic right by 16. (Or 7 if using 5-level paging). Alternatively, you can store the pointer shifted left by 16 bits, and have the tag in the lower 16 bits, then canonicalizing the pointer is just a single shift-arithmetic-right. If combining with NaN-boxing, then you rotate right to recover the double. (Demo: https://godbolt.org/z/MvvPcq9Ej). This is actually more efficient than messing with the high bits directly.
With LAM/UAI, the requirement is that the 63rd bit matches the 47th (or 56th) bit, which gives 15-bits of tag space on LAM48 and 6-bits of tag space on LAM57.
With LAM enabled, care needs to be taken when doing any pointer comparison, as two pointers which point to the same address may not be equal. There have been multiple exploits with LAM, including speculative execution exploits.
Apologies, there's a mistake in the godbolt link above. `SIGN_BIT` should be `0x8000` and not `0x1000`.
If you restrict yourself to all variants of x86 and ARM, the number of high bits for which I could not find conflicting uses is 6 bits (bits 57-62). The other high bits are reserved in some hardware contexts and therefore may create conflicts.
Using 16 bits may be risky on recent x86. For example, IIRC Linux enables 5-level page tables on microarchitectures that support it, which can put valid address data in bits 48-56.
There is no guarantee that those 6 bits are safe either. They are just the only bits for which I could not find existing or roadmap usage across x86 and ARM sources when I last did a search.
> Using 16 bits may be risky on recent x86. For example, IIRC Linux enables 5-level page tables on microarchitectures that support it, which can put valid address data in bits 48-56.
Linux will not allocate past the 47-bit range, even with 5-level paging enabled, unless specifically requested, by providing a pointer hint to `mmap` with a higher address.
https://www.kernel.org/doc/html/v5.14/x86/x86_64/5level-pagi...
Ah, thanks for the detail! I was unaware that this was how it worked.
There's numerous techniques used. Many are covered in Gudeman's 1993 paper "Representing Type Information in Dynamically Typed Languages"[1], which includes low-bits tagging, high-bits tagging, and NaN-boxing.
The high bits let us tag more types, and can be used in conjunction with low bits tagging. Eg, we might use the low bits for GC marking.
[1]:https://web.archive.org/web/20170705085007/ftp://ftp.cs.indi...
Depends on the architecture. Top bit usage lets you do what the hardware thinks if as an 'is negative' check for very cheap on a lot of archs for instance.
Is it a guarantee that a 16 byte object would be 16 byte aligned?
Not in general, but it is a guarantee a runtime where all allocation are 16 byte cons cells can choose to make quite trivially.
Dynamic languages usually come with their own memory manager. They can come up with their own alignment constraints. That being said, most contemporary (Linux) architectures require that malloc returns 16 byte alignned pointers. Some mallocs only promise this for allocations larger than 8 bytes, though (and I think the C standard was updated to permit that).
For memory allocation, POSIX (posix_memalign) has been guaranteeing alignment since 2001. C11 added equivalent functionality (aligned_alloc). C++17 incorporated it (std::aligned_alloc) as well.
More importantly, C++17 no longer ignores alignment in dynamic memory allocation: https://en.cppreference.com/w/cpp/memory/new/operator_new
C++11 already had alignas, but it was not really integrated well.
If you implement malloc you can do that. The os generally gives you 4k (or other number in that range) at a time and malloc subdivides it.
language runtimes can call malloc whatever they want.
No. It depends on the object.
In C++ you can force that with alignas(), I would imagine other low level languages offer something similar.
If you're using a custom allocator you'd have to enfore it yourself which should be fine since you have full control.
https://en.cppreference.com/w/cpp/language/alignas.html
C23 also has `alignas` and `alignof` (`_Alignas`/`_Alignof` in C11 with the lowercase as macros in stdalign.h), and also provides `aligned_alloc` and `free_aligned_size` in stdlib.
Not a whole lot of language runtimes (if any) really depend on upper address ignore.
AFAIK, AMD only added it in Zen4.
You don't need the hardware UAI/LAM to make use of the high pointer bits. The most common technique is to use `shl reg, 16; sar reg, 16`, which will shift in ones or zeros from the left depending on the 47th bit.
Several runtimes use high bits tagging combined with NaN-boxing, and have been doing so since before LAM/UAI existed.
The JVM does with the ZGC garbage collector, did a really nice talk on it recently. [0]
[0] https://www.youtube.com/watch?v=y_QeST7Axrw
The JVM optionally does it. It does not rely on it.
Lua does
I thought one of Lua’s selling points is that it’s written in highly standard-compliant C code. I wouldn’t expect it to do anything non-portable like bitwise manipulation of pointers?
It's fine to use techniques like this in standard C provided you guard them with the preprocessor and provide a fallback option on unsupported hardware.
PUC Lua doesn't use these tricks; every value in the VM is represented with a POD struct with explicit tag and value members. The OP may have been thinking of LuaJIT, which uses NaN boxing, but not pointer tagging.
PUC Lua does rely on two's complement integer representation, though, as well as long long, while nominally adhering to C90, so not quite strictly conforming.
Doesn't CHERI use additional bits to store the capabilities rather than masking existing bits in the pointer?
But that can be problematic for any code that assumes that the size of pointers is the same as size_t/usize.
I don't see how this could not be opt in for backwards compatibility though, since existing code wouldn't use the new instructions.
I highly doubt this is anything like CHERI. More likely it's their version of ARM MTE.
It seems very strange to me to finally get around to this right as we are finally getting low level software that no longer needs it (and we've had high level software that doesn't need it for ages). At this point I think I'd prefer the transistor budget and bits of memory were spent on other things.
We have had it before C was even invented, Burroughs nowadays still sold as Unisys ClearPath MCP was written in ESPOL, latter NEWP, with zero Assembly.
The compiler provides intrisics, has bounds checking for strings and arrays.
PL/I and its variants were also used across several systems, as were ALGOL dialects.
Note C.A.R Hoare Turing award speech in 1980,
"A consequence of this principle is that every occurrence of every subscript of every subscripted variable was on every occasion checked at run time against both the upper and the lower declared bounds of the array. Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to--they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980 language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law."
The "1980 language designers and users have not learned this lesson" is meant to be C without explicit refering to it.
There are hundreds of billions of lines of code of critical software[1] written in unsafe languages, that is not going to be rewritten any time soon. Adding memory safety "for free" to such software is a net positive.
Current CPUs are limited by power, transistors are essentially free.
[1] often the VMs of safer higher level languages, in fact.
> we are finally getting low level software that no longer needs it
We're not, though. There's a little bit of low-level software being written in Rust (and even that requires a non-trivial amount of unsafe code), but most new low-level software is being written in C++ or C. And even if a more popular safe low-level programming language arrived tomorrow and gained a more respectable adoption, that still wouldn't be fast enough because of all the existing software.
> we are finally getting low level software that no longer needs it
Ada has had memory safety for decades – not to mention Lisp, Java, etc. if you can live with garbage collection. Even PL/I was better than C for memory safety, which is why Multics didn't suffer from buffer overflows and Unix did. But the Linux kernel (along with lots of other software which we would like to be reliable) is still mostly written in C, for better or worse.
> Ada has had memory safety for decades
Only with spark (i.e. formal verification). Which similar to other projects of this age (e.g. Rocq/How the compcert C compiler was implemented and proved correct) seems not to be low enough friction to get widescale adoption.
> not to mention Lisp, Java, etc. if you can live with garbage collection.
Like I said, high level languages that won't benefit from this at all have existed for ages... and the majority of software is written in them. This is one of the stronger arguments against it...
> But the Linux kernel (along with lots of other software which we would like to be reliable) is still mostly written in C, for better or worse.
Fil-C shows this can be solved at the software layer for things in this category that can afford the overhead of a GC. Which does mean a larger performance penalty than the hardware proposal, but also more correctly (since hardware changes can never solve unintended compilations resulting from undefined behavior).
The linux kernel is probably an example of an actual long tail project that would benefit from this for a reasonably long time though, since it's not amenable to "use a modified C compiler that eliminates undefined behavior with GC and other clever tricks" and it's unlikely to get rewritten or replaced with a memory safe thing quickly due to the number of companies collaborating on it.
> Fil-C shows this can be solved at the software layer for things in this category that can afford the overhead of a GC.
Mainline clang and g++ are also getting better with things like -fbounds-safety and -fsanitize=address. As I understand it, they typically have some overhead, but I'm willing to accept that overhead to have a kernel, web browser, etc. without memory errors. The decision that memory safety is too costly seems to have been made when CPUs were orders of magnitude slower than they are today. Hopefully hardware support will reduce the overhead to negligible proportions and enable memory safety as a default rather than an esoteric add-on or proprietary feature.
Nah, it was only a UNIX thing and C, see the world of systems languages and OS written in them outside Bell Labs.
Had UNIX and C a price tag on their source code comparable to the competition, instead of a symbolic price and an annotated source code book, history would have played a different music.
> Only with spark (i.e. formal verification)
Ada's original memory safety was still a lot better than C's. As noted, PL/I was not 100% memory safe, but it was good enough to prevent buffer overflows in Multics.
AFAIK, most of windows and OSX (and iOS) are in memory unsafe languages as well (c, c++, and objective c)
Still, the keyword is still.
Hence why Objective-C got GC, after its failure to play well with C semantics, got replaced with ARC, and afterwards Swift came to be.
Microsoft now has a new policy in place, via the Secure Future Initiative, that only exiting code bases should be kept in C and C++, all new projects are to either use managed languages or Rust.
macOS is (a) Unix (officially even!) and inherits many of its features and issues.
To Apple's credit though they seem to be using a memory-safe language (Swift) for new code and libraries (at least at user level) and may be rewriting old code as well, and they have also added MIE/EMTE to Apple Silicon. They also ship clang/clang++ with support for -fbounds-safety and -fsanitize=address.
Objective-C also supports Automatic Reference Counting, which helps with memory management. (Apple also implemented a garbage collector for Objective-C 2.0, but abandoned it in favor of ARC. I am aware that reference counting is technically a form of garbage collection.)
The reason being, as can be seen on the archives, the conservative tracing GC had several gotchas to work with existing code, thus segfaults were common.
The way ARC works in Objective-C, by automating retain/release call patterns already required by existing frameworks was much safer to implement, without such crashes.
Similar to all those C++ smart pointers automating COM reference counting.
Sure but nobody is actually writing foundational software (as we are now calling it) in Lisp, Java or Ada (and it also has no good answer for use-after-free which is a huge class of vulnerabilities).
This is the first point in history where it's actually feasible in real non-research life to write an entire system in only memory safe languages. (And no, the existence of `unsafe` doesn't invalidate this point.)
I see plenty of foundational software in the biggest mobile OS, IoT devices and cloud computing infrastructure.
Ada only has use-after-free if unchecked deallocation is used, since we are way beyond Ada83, alternatives do exist in Ada 2022.
If anything we will only get more foundational software in safer languages, when the generation that only accepts C and C++ for specific domains is no longer among us.
Unfortunately to me as well, it isn't something I will be able to witness.
> Ada only has use-after-free if unchecked deallocation is used
You mean if you just never deallocate? Or is there a third option? Genuine question; I don't follow Ada closely.
> If anything we will only get more foundational software in safer languages, when the generation that only accepts C and C++ for specific domains is no longer among us.
I'm more optimistic - the Rust in Linux people are making progress and that's probably the thickest den of naysayers. Uutils is actually being used in Ubuntu (and sudo-rs I think?).
It'll probably take a long time until Rust outweighs C but I think we're talking 10-20 years not 30-40.
Rust code that uses unsafe still needs this sort of protection.
"Needs" is a strong word, would benefit from a bit, but in practice I think the number of vulnerabilities rust code typically has is not large enough to justify the expense of compromising the performance of every CPU ever sold (thus requiring more, consuming more energy, etc).
There's also been steady progress towards creating systems to prove unsafe rust correct - at which point it wouldn't even benefit from this. For example see the work amazon has been sponsoring to prove the standard library correct: https://github.com/model-checking/verify-rust-std/
A good chunk of Rust code often ends up linking in a C/C++ library where it’s still a concern (and this is ignoring that unsafe Rust is actually harder and more unsafe than C currently).
More importantly there’s millions if not billions of existing lines of C/C++ not least of which is the VMs for “memory safe” languages like Java. There’s huge value add in automatically adding security for a fractional CPU cost since the world won’t be rewritten into Rust anytime soon.
it's like the end scene in fight club. except instead of credit card company office towers it's the borrow checker and associated skyscrapers that symbolize the ascent of rust that are going down in flames as the x86 antiheroes high five and the pixies start crooning "where is my mind" to rolling credits over the burning cityscape.
I wonder what happened that Apple/ARM has implemented something similar at nearly the same time. https://security.apple.com/blog/memory-integrity-enforcement...
Arm MTE is much older. Android already supported it with a limited number of devices: https://developer.android.com/ndk/guides/arm-mte
There is server hardware out there now that in theory can support MTE, but I don't know if there's commercial support for it. MTE needs to be set up by the firmware, it's not purely an OS/kernel matter.
GrapheneOS (hardened Android distribution) also has it enabled by default for the base OS and user-installed Apps that support it (you can also force it for all apps) on 8th Gen Google Pixels and newer
Interesting thread:
https://grapheneos.social/@GrapheneOS/113223437850603601
Intel already tried it once in 2019, failed and had to remove it.
https://en.wikipedia.org/wiki/Intel_MPX
Afaik Intel's first foray into this territory was their i960mx which ended up in F-22.
Even before then the iAPX432 had object capability security wrt its memory.
I remember playing with it and finding out it was slower than just manual bounds checks in front of every memory access.
The very first commercial system of this was done by Oracle in Solaris SPARC since 2015.
https://docs.oracle.com/en/operating-systems/solaris/oracle-...
Intel had a first attempt at this with MPX, but the design had flaws,
https://en.wikipedia.org/wiki/Intel_MPX
Then there is CHERI and the related ARM Morello,
https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/
Apple isn't a first here, one first among general public.
Don't think it was any one thing so much as it makes a whole bunch of attacks more difficult - security is a perpetual arms race after all.
I agree it would’ve happened no matter what, it’s a very useful feature.
I do wonder if Apple‘s announcement that they started shipping it may have pushed them to announce this/agree to a standard earlier than they would have otherwise.
It would be nice to know how this memory safety instructions should be used by software developers. Assuming I write C++ code, what should I do? Enable some new compiler flags? Use special runtime library? Use some special variant of the language standard library which uses these new instructions? Completely rewrite my code to make it safe?
If ARM's memory tagging is a guide, not much for the general developer. You will be able to run with address sanitizers enabled at a much lower overhead. Perhaps, use some hardened allocators or library variants that rely on the extension.
If it's anything like CHERI, you need to make sure to follow pointer provenance rules properly (called "strict provenance" in Rust) and then just recompile your program with some extra flags. Only low level memory-related things like allocators and JITs need any significant source code changes.
fwiw "knee-jerk reaction to Apple MIE" is not exactly the right characterization of this. MPX existed and faded away, and it's not very surprising that x86-world would wait for someone else to try shipping hardware support for memory safety features before trying again.
I wouldn't say that's fair. MPX failed because it was a very problematic solution to this problem.
MPX had a large (greater than 15-20%) overhead and was brittle. It didn't play well with other x86 instructions and the developer experience was extremely poor (common C and C++ design patterns would cause memory faults with MPX).
Apple MIE (which is essentially ARM MTE v2) and MTE on the other hand have near invisible levels of overhead (~5-10%) with the ability to alternate between synchronous and asynchronous tracing of faults where the latter has a much lower overhead than the former (allowing you to check in production for very little overhead and get better debugging in test). They also work essentially seamlessly with the rest of the ARM ecosystem and it takes very little to integrate the functionality into any language ecosystem or toolchain.
If MPX was comparable with MTE, it certainly would have gotten the adoption that MTE is getting but the "tech" just wasn't there to justify it's use.
I'm not arguing MPX was a good solution, just that it's silly to assume folks designing x86 machines have been totally ignoring developments in that space for the past ten years.
fair.
Looking forward to this as x86 is the one lagging behind other CPUs on this matter, note that this is the second attempt, MPX did not went that well.
https://en.wikipedia.org/wiki/Intel_MPX
Is there a whitepaper or ISA manual change describing the feature?
ChkTag doesn't exist yet, they are working on it
So they really did see Apple announce MIE and rush to come up with something similar.
No.
Apple announces MIE, then Intel and AMD say they have something similar except they don't actually have something similar, only plans to eventually implement it, but they're advertising it as if they do already have it. That sounds like super blatant "panicking to copy Apple" to me.
The submission title goes "Intel and AMD standardize ChkTag ..." but the actual text says "Intel and AMD are working together, along with their ecosystem partners in the EAG, to address the need for memory safety. They are creating a unified specification ..." (emphasis mine). They don't even have a specification yet, let alone an implementation, but they want to make PR waves about it already. This is so funny (and sad).
I don't see the panic. They've been working on it for a while, it's inspired by MTE if anything.
I’m sure it was, Intel has tried this before. It’s a good idea.
But it wouldn’t surprise me if Apple‘s announcement they were shipping it already pushed Intel and AMD to agree on implementation details so they could announce this and get it going.
They wouldn't announce it with zero results nor even any actual specification to show unless they were trying to show it off as soon as possible. Otherwise they could just wait until they, you know, actually did something, to announce it.
The article seems sparse on details.
Would this imply an architecture similar to what Lisp-Machines once had ? That'd be a great addition IMO, and would speed up a lot of dynamic-ish languages without resorting to unsafe-routes for speed.
Now they just need to agree to implement ECC everywhere instead of using it as a product differentiator, so we can reduce the amount of random issues caused by memory and bus errors.
This is already the case in DDR5.
This is an oft-repeated misunderstanding. DDR5 memory uses error correcting codes internally to correct on-die errors, but this does not defend against errors on busses between the DIMM and memory controller. For that the old scheme of extra chips to store additional ECC data is still the only way.
It actually does protect against errors on access, but support is optional.
Also, some vendors use a non-optimal hamming code that fails to notice some double bit errors (if I remember right).
> It actually does protect against errors on access, but support is optional.
Can you explain how exactly the on-die ECC capability can help protect data in transit? What is the optional functionality you're referring to, if not the traditional sideband ECC achieved by adding another chip's worth of data lines to every channel?
Not really. ECC memory will have an extra ram chip, and store an extra bit per byte or so, for that error detection/correction. DDR5 only has error-correction bits added to the bus, regular DDR5 doesn't have extra chips/bits for error correction of the data while it is stored.
But also, what you really want is ECC that reports all the way up to the OS the corrected and un-corrected bits. This is how you know if it's on the edge, becoming a real problem. Otherwise, it works fine until it doesn't shrug which is the same as regular normal memory.
I think the ECC added to the DDR5 bus is kinda just enough to get the higher data-rate signaling to be as reliable as DDR4. It's nice for marketing to put ECC on the DDR5 box but it's not more robust than DDR4.
Looks like this is in response to the Apple paper from a couple of weeks ago about memory tagging. Excellent news even if this wasn't pushed along by Apple.
They should standardize what's going on in Intel ME and AMD PSP (they won't since both are backdoors.)
It’s just probabilistic memory safety, at best
Still cool, but not a replacement for memory safety language implementations.
MTE is as much a detection tool as it is a relied on layer of defence. I read that it has a 15 out of 16 chance to catch an error. Which means even if you do find a bug, it's going to be logged back to the OS vendor essentially the first time you use it. Getting patched permanently in the source.
Any real world software has crashes that don’t get fixed.
So you’d really have to use the attack a lot for it to get patched.
Memory coloring (what is to my knowledge being proposed) is certainly probabilistic however it really does a lot towards memory hardening more than memory safety. It makes successfully pulling off ROPs, etc far harder.
Even if you are using a memory safe language, memory bugs still pop up in various places (cough FFI & ABI cough) so this is a massive step in the right direction towards blocking attacks even when the developers have done "all the right things" short of formally verifying their stack from top to bottom (and even then).
I've been meaning to ask you what the motivation of your project is? Why would you want a safe-c? When I saw the headline I was worried that all my runtime code would break (or get slower) because I do some very unsafe things (I have a runtime that I compile with -nostdlib)
I'm also tempted to write a commercial C++ compiler, but it feels like a big ask, paying for a compiler sounds ridiculous even if it reduces your compile time by 50x
My motivation is that I love to write compilers
That's the best answer. Just wondering because you have a lot of experience and many people disagree with me.
How realistic is a C++ compiler that's 50x faster than clang for debug builds? Assuming 1) there's enough work for each core 2) there's so many files/headers that a unity build takes 1/4th of the time as a rebuild using individual files (this theoretical compiler would be multi-threaded)
I hope there are OS level (ie kernel build options) to turn this kind of thing off or just ignore the 'tags'. I know it's important for corporate use cases and monetary transactions and all that, but on my personal computer I use for fun I want to be able to peek and poke.
If you're worried that this is going to prevent you from peeking and poking, I think you're mistaken. This is to protect a process against itself, not the outside. It will also likely be a single bit flip away of being unenforced at runtime, as most x86 protections already are to be to make debugging tools feasible.
By the way, there are already systems in place to prevent you from accessing certain memory zones. Yes, even on Linux, it's possible to make memory regions inaccessible even to root or the kernel itself. The time to be worried about that was 10 years ago.
"The time to be worried about that was 10 years ago."
And is everyone just expected to (metaphorically) lay down and die? You don't stop fighting for freedom just because you lost a battle... they certainly don't, and you shouldn't either!
I have a patch set to removes virtual memory interprocess blocking for linux 5.6 that I run as kernel on a couple of my machines. I'm just hoping that this is patchable out too.
[flagged]
lest*: https://en.wiktionary.org/wiki/lest
>lest*
'less = unless
[flagged]
Then write "less that the" if you prefer not to use the contraction.
[flagged]
> The way I've used it is fine
No it's not. Here is the AI summary of why it isn't.
Breakdown of the Error
Here are a few ways to write the sentence correctly, depending on the desired level of formality:Your AI explanation offers nothing in the way of why it's not fine. It just asserts that it's not fine. I agree with the AI, but your summary is wrong.
It’s nice to hear someone using English correctly.
It's not fine, take the loss and learn.
I mean, sandboxing AI agents once they get smarter is a valid concern (if not a major future concern).
Memory Saftey®
It's ok the C committee will make sure to fumble this up even with HW support
I don't have any hopes WG14 will ever care to improve C's safety beyond what is possible manually writting Assembly code, it is even worse because Assembly doesn't have time travel with UB.
Garbage article.
Like, cool, you guys are starting to talk about a new instruction set that will make C safe somehow. Yet you failed provide an ounce of detail for how you'll accomplish that.
This might as well been a "And we'll make our CPUs 10x faster and they'll use 10x less power!". Or "Future CPUs will have a 10ghz clock speed!"
Again, who is this article for? The government maybe to assure them that x86 will take cyber security seriously?
> Future CPUs will have a 10ghz clock speed!
Glad to see Tejas finally making it to see the light of day! Can’t wait to pair it with my Larrabee GPU in my BTX case.
I'm happy another old hardware nerd got that dated reference :D.
I was convinced back in the day that Larrabee would change the world. It seemed like such an amazing technology especially since multi-core CPUs were just starting to take off in consumer hardware.
Honestly it could have, if Intel had invested in the platform for more than a few generations and built the software ecosystem around it. It had on package HBM before it was cool with all the modern AI GPUs and APUs. Intel had the opportunity to build the CUDA ecosystem, and chose to shelve it. I'll never understand the obsession that new platforms must be profitable immediately. Seems they take ~10 years to develop, and planning for that should be part of what's expected before the decision is made. Gelsinger seemed to understand at least that much.
Well, they did half-heartedly try at it. Intel Phi was produced from 2010->2020.
I think the problem is that Intel pigeonholed the product, relegating it to just supercomputers. I also think Intel has historically done a bad job of supporting the software needed to power their hardware.
The reason CUDA was so successful (IMO) is because it was highly available and the software is a better quality to competitive software. OpenCL was supposed to be the answer to CUDA and ultimately it was just a weird, hard to work with, and minimally supported language.
I don't think that's all Intel's fault. Apple's dumb war against Khronos has really undermined a lot of progress for anyone doing GPGPU programming.
Agreed. They tried to approach the market top down, when they should know from institutional experience that bottom up wins.
[dead]
Given that this technique is used in production for all current gen Iphones should tell you this isn't vaporware.
What technique? There's no technique described in the article. It's just a long article about why this is needed, an announcement of collaboration, and naming the set of instructions.
ARM has an extension called the "Memory Tagging Extension". It works by borrowing 4 bits per 16 byte block of memory to color the allocation. Pointers are given a color "key" inside the pointer during allocation and if that pointer's key isn't the right color as the underlying memory block's during deref, then the hardware throws an MTE segfault.
It's a neat system with pretty much no overhead and it's pretty easy to integrate into code as long as your underlying language and libraries are at least semi-intelligent about how they handle pointers and memory allocation (i.e. as long as you aren't doing any long range pointer punting then things "just work").
https://developer.arm.com/-/media/Arm%20Developer%20Communit...
Right, but none of this is in the article. MTE is neat and it'll be interesting if AMD64 brings in a similar technique but none of that is actually described in the article.
Sure but given the description of ChkTag and given the new exceptions and interrupts model they established with FRED, I don't see ChkTag being anything other than an MTE-style lock-and-key color tagging solution.
The press article is just saying what the EAG is working on next now that they have shipped FRED, ACE, and AVX10. ChkTag is the current top priority item on the EAG's list and the standard for it should be finalised/released some time in the next few months. What they have done publicly so far is to say that ChkTag will be part of the ISA standard, they just haven't also revealed the details past that.
Yes, the article is not self contained.
Nothing is self contained, you are expected to look up memory tagging if you want the gory details of how other platforms implement it.
If you want technical information about how the x86 ecosystem will implement it, you'll need a time machine (or work at Intel or AMD I guess..)
To bring Corporate Authoritarianism to x86...
It's scary how much of the population will suddenly shut off their brains whenever "safety and security" or similar phrases are mentioned.