In mid-March, researchers from several U.S. universities published a paper demonstrating a hardware vulnerability in Apple’s “M” series CPUs. These CPUs, based on the ARM architecture and designed by Apple, power most of its newer laptops and desktops, as well as some iPad models. The issue could potentially be exploited to break encryption algorithms. The attack that uses this vulnerability was dubbed “GoFetch”.
The combination of a juicy topic and a big-name manufacturer like Apple led to this highly technical paper being picked up by a wide range of media outlets — both technical and not so much. Many ran with alarmist headlines like “Don’t Trust Your Private Data to Apple Laptops”. In reality, the situation isn’t quite that dire. However, to really get to the bottom of this new problem, we need to delve a little into how CPUs work — specifically by discussing three concepts: data prefetching, constant-time programming, and side-channel attacks. As always, we’ll try to explain everything in the simplest terms possible.
Data prefetching
The CPU of a desktop computer or laptop executes programs represented as machine code. Loosely speaking, it’s a bunch of numbers — some representing instructions and others representing data for calculations. At this fundamental level, we’re talking about very basic commands: fetch some data from memory, compute something with this data, and write the result back to memory.
You’d think these operations should be executed in this order. Here’s a simple example: a user enters their password to access a cryptocurrency wallet. The computer needs to read the password from RAM, run a few computing operations, check that this is the correct password, and only then grant access to the confidential data. If this were the way today’s CPUs executed all code, our computers would be painfully slow. So how do you speed things up? You do a lot of optimization — such as data prefetching.
Data prefetching works like this: if the program code contains a command to fetch data, why not load it ahead of time to speed things up? Then, should the data come in handy at some point, we’ve just made the program run a bit faster. No big deal if it doesn’t come in handy: we’d just discard it from the CPU’s cache and fetch something else.
That’s how basic data prefetching works. Apple CPUs make use of a newer prefetcher known as “data memory-dependent prefetcher”, or DMP. In a nutshell, DMP is more aggressive. Commands to fetch data from memory are not always explicit. Pointers to specific memory locations might be the result of computing work that still needs to be performed, or they might be stored in a data array that the program will access later. DMP tries to guess which data in the program is a pointer to a memory location. The logic is the same: if something looks like a pointer, try fetching data at that address. The guessing process relies on the history of recent operations — even if they belong to a completely different program.
In 2022, another study demonstrated that DMP tends to confuse pointers with other data the program is working with. This isn’t necessarily a problem by itself — loading the wrong stuff into the CPU cache isn’t a big deal. But it becomes a problem when it comes to encryption algorithms. DMP can break constant-time programming under certain conditions. Let’s talk about this next.
Constant-time programming
There’s a simple rule: the time it takes to process data must not depend on the nature of that data. In cryptography, this is a fundamental principle for protecting encryption algorithms from attacks. Often, malicious actors try to attack the encryption algorithm by feeding it data and observing the encrypted output. The attacker doesn’t know the private key used to encrypt the data. If they figure out this key, they can decrypt other data, such as network traffic or passwords saved in the system.
Poor encryption algorithms process some data faster than others. This gives the malicious actor a powerful hack tool: simply by observing the algorithm’s runtime, they can potentially reconstruct the private key.
Most encryption algorithms are immune to this type of attack: their creators made sure that computing time is always the same, regardless of the input data. Algorithm robustness-tests always include attempts at violating this principle. This is what happened, for example, in the Hertzbleed attack. However, to make actual key theft possible, the attack must use a side channel.
Side-channel attack
If DMP prefetching sometimes confuses regular application data with a memory pointer, does that mean it can mistake a piece of a private key for a pointer? It turns out it can. The researchers demonstrated this in practice using two popular data encryption libraries: Go Crypto (Go developers’ standard library), and OpenSSL (used for network traffic encryption and many other things). They investigated various encryption algorithms — including the ubiquitous RSA and Diffie-Hellman, as well as Kyber-512 and Dilithium-2, which are considered resistant to quantum computing attacks. By trying to fetch data from a false pointer that’s actually a piece of a private key, DMP essentially “leaks” the key to the attacker.
There’s one catch: the hypothetical malware needed for this attack has no access to the cache. We don’t know what DMP loaded there or which RAM address it fetched the data from. However, if a direct attack isn’t possible, there’s still a chance of extracting information through a side channel. What makes this possible is a simple feature of any computer: data loaded into the CPU cache is processed faster than data residing in regular RAM.
Let’s put this attack together. So, we have malware that can feed arbitrary data to the encryption algorithm. The latter loads various data into the cache, including a secret encryption key. DMP sometimes mistakenly fetches data from an address that’s actually a piece of this key. The attacker can find out indirectly that data has been prefetched from a certain address by measuring the time it takes the CPU to access certain pieces of data: if the data was cached, accessing it will be slightly faster than otherwise. This was exactly how the researchers broke the constant-time programming principle: we can feed arbitrary text to the algorithm and watch the processing time vary.
So, is your data at risk?
In practice, extracting an encryption key requires dozens to hundreds of thousands of computing operations as we feed data into the algorithm and indirectly monitor cache status. This is a sure-fire attack, but a very resource-intensive one: stealing a key takes an hour at best — more than ten hours at worst. And for all this time, the computing effort will keep the device running almost at full capacity. The GoFetch website has a video demonstration of the attack, where the private key is extracted bit by bit — literally.
However, that’s not what makes the attack impractical. We’ve repeatedly mentioned that the attack requires malware to be installed on the victim’s computer. As you can imagine, if this is the case, the data is already compromised by definition. There are likely far simpler ways to get to it at this point. This is the reason why the OpenSSL developers didn’t even consider the researchers’ report: such attacks fall outside their security model.
All studies like this can be compared to civil engineering. To make a structure robust, engineers need to study the characteristics of the materials to be used, the given location’s soil properties, make provisions for the risk of earthquakes, and do many other things. In most cases, even a poorly constructed building will stand for decades without problems. However, a rare combination of circumstances may lead to disaster. Attack scenarios like GoFetch are designed to avert such disasters that lead to mass leaks of user secrets.
The researchers are going to continue studying this fairly new prefetching mechanism. Intel processors also use it starting with the 13th generation, but they’ve proved insusceptible to this particular kind of attack proposed in the research paper. What’s important is that the vulnerability can’t be patched: it will continue to affect Apple’s M1 and M2 CPUs for their entire lifespan. The only way to prevent this type of attack is by modifying encryption algorithms. One possibility involves restricting the calculations to the CPU’s “energy-efficient” cores, as DMP only works on “high-performance” cores. Another one is obfuscating encryption keys before loading them into RAM. A side effect of these methods is performance degradation — but the user would hardly even notice. In turn, Apple M3 CPUs feature a special flag that disables DMP optimization for particularly sensitive operations.
Let’s summarize. There’s no immediate threat to data stored on Apple devices — hardly anyone would try using a technique this complex to steal that data. Nevertheless, the work of these U.S. researchers is still valuable because it sheds some light on hitherto-unknown operating aspects of how the latest CPUs work. Their efforts aim to prevent future problems that might arise if an easier exploit is discovered.