Side-Channel: Selective Attack

Intro to the Side-Channel Universe

Side-channel attacks (SCA) are all about finding secret information by observing physical phenomena and exploiting a relationship between the two. The most common scenario works like this: a secret key $k$ is embedded in a computing device that takes inputs $x_1,\dots,x_n$ and computes some function like an encryption/decryption or so. The SCA assumption is that some partial evaluation leaks information via a physical change in a measured trace. For example, when computing an AES encryption, the permutation’s look-up might leak some information about the output because the device must change the content of some registry creating a physical variance in the measured power-trace since more (or less) power is required to do so.

Sadly, it is not obvious how these leaks are (or can be) described meaning that the best course of action is to create a model by deciding how a leak is described and which mathematical assumption we require. In a nutshell, one defines a leaking model that describes how the evaluation is leaked and the noise distribution, for example, the Hamming weight of the AES permutation’s output is leaked and some independent Gaussian noise is added.

This freebie lives in a universe where the inputs are selected by the adversary.

Idea: Finding Optimal Strategies

An adversary that can select the inputs is clearly stronger than one that cannot because being able to select the inputs allows the definition of a strategy that, for example, minimizes the number of inputs used to retrieve the secret.

But how much better are these strategies? How easy are they to compute?

The Noise Problem

The noise provides a further complication because strategies must remain optimal even when everything is noisy and hard to distinguish.

How does the noise affect such strategies? Are there tricks that help?

In other words, one can repeat the measurements and reduce the amount of noise by averaging out the measurements. However, is repeating the measurement an optimal solution? Or is it better to use a different input and use that noisy measurement as a reducing strategy?

Why: Improve Analysis and Attacks

Whenever evaluating the security of a device, choosing the attacker’s ability to correspond to realistic attack scenarios is really important. Suppose a device is designed to be secure against a passive adversary unable to choose the messages. Everything is good until one day, a more powerful adversary can inject its own messages.

Will the device still be secure?
Weirdly enough, the only way to know is to know!

On the other side of the coin, understanding and computing these strategies would help the secure implementation of cryptographic primitives. During the design of software/hardware, it might be possible to highlight steps that can potentially leak and help the creation of a strategy. Being able to compute such a strategy allows the designer to prioritize the focus on “weaker” spots in the design. This would allow better compromise between countermeasures’ cost and guaranteed security.

A theoretical reason, similar in spirit to this freebie, is to decouple the noise management and the attack strategy used to recover the secret. IMHO, the best example is executing a real attack on AES where the power consumption of the entire encryption execution is measured. The majority of the attacks focus on the leak of only the first round and for only one permutation evaluation, which is hard even as it is. The average reasoning is that by divide et impera, one executes¹ the attack on the other permutations too thus obtaining a guess for the secret key. Since the noise creates disturbances, more measurements are required to get a guess with less noise. The attack minimizes the amount of work done on the traces by requesting more traces if needed.

What if one wants to do the opposite? Minimize the number of traces by maximizing the information extracted from the traces. How much does the probability of a correct guess improve if we consider two/three/more rounds? This can definitely improve the attack but a note must be made on how a good chunk of attacks is executed: finding the best key candidate requires the computation of a score for each possible guess thus increasing the dimension of the guess would quickly make the attack impractical.

Weirdly enough, the cryptanalysis community can benefit from this methodology too by doing what they are the best at, a.k.a. imagining new attacks where side-channel information is provided somehow. For example, does security hold if the Hamming weight of some internal computations is known? Seem like a naive question but one would be surprised how little additional data is required to break security!

How to Publish

There are technically one or two papers related to selective SCA but, IMHO, their objective is too focused on the direct attack and less on finding ways to separate strategy and noise.

This can be a good publication with a lot of development. The first step is to see how noise affects the strategies and how to combine the two (similarly to this freebie). Once this is known, two experiments can “make or break” the paper:

show if the selective strategy is better than a more standard one (e.g. correlation attack, maximum-likelihood distinguisher)
after pointing out how one can recover a guess for the whole key, show if selective strategies can be expanded to consider additional points in the trace (e.g. 2 rounds) and improve the success rate for the same amount of traces

I doubt that the computational cost of the attack would be of interest to many if the attacks are indeed better.

As a first step, I’m currently working on a 2-round AES solver here.

Footnotes

take it with some pinch of salt. Despite one expects the research community to agree on how the attack should proceed for the other permutations, some papers suggest sequential re-execution, some point to a parallel attack by pointing out that the traces can already contain the other permutations’ measurements and others consider breaking a piece as breaking the whole. IMHO, this creates quite some problems when comparing attacks. ↩

On This Page