A Short Note on a Weight Probability Distribution Related to SPNs

Sondre Rønjom

2019 · Full Version · eprint 2019/750

Disclaimer

This content was automatically converted from the original PDF and may have undergone post-processing. None of these steps have been reviewed or approved by the authors. Errors in formulas, definitions, proofs, or text may have been introduced during conversion. The authoritative version is the original paper on ePrint. Always cite and verify against the original publication.

Converted with: marker · 2026-02-14

Abstract

We report on a simple technique that supports some recent developments on AES by Grassi and Rechberger and Bao, Guo and List. We construct a weight transition probability matrix related to AES that characterises fixed configurations of active bytes in differences of ciphertexts when plaintext differences are fixed to some (possibly other) configuration of active bytes. The construction is very simple and requires only a little bit of linear algebra. The derived probabilities are essentially identical to recent results on 5- and 6-round AES derived through more sophisticated means, indicating that it might be worth a further investigation.

1 Introduction

We consider transition probability distributions related to active bytes in differences of ciphertexts and plaintexts in AES. The objective is to determine whether the probability distribution for configurations of active bytes in ciphertext differences vary depending on the distribution of configurations of active bytes in the plaintexts. We show how to utilize symmetries in the AES Mix-Column matrix to construct efficient (small) transition matrices for r rounds of AES where r is any number of rounds. The results and techniques are very simple and require only a little bit of linear algebra. We are unable to prove how exact these transition probability matrices emulate the true AES distributions, however, we can confirm that our method obtains results that seem to be identical to recent results reported with more sophisticated analysis, presented in several recent papers on 5- and 6-round distinguishers for AES (e.g. [BGL19], [Gra18], [GRR17] and [GR18]). In particular, probabilities derived from the weight transition probability matrix matches the results recently presented by Bao, Guo and List in [BGL19] on 5- and 6-round AES and Grassi and Rechberger in [GR18] on 5-round AES. This may support the view that, as long as the s-box is sufficiently generic (e.g. not linear), the s-box layer has little effect on this kind of structural analysis. It is thus tempting to conjecture that the presented weight transition probability matrix emulates the true AES-distributions.

Simple C/C++ code for experimenting with and verifying our results can be found at github.com/sondrer/SPNTransitionProbability.

2 Weight Probability Distributions in AES

One round of AES [DR02] consists of four steps. First, a non-linear permutation SubBytes is applied to each individual byte of the state. Then each row is shifted cyclically via ShiftRows, followed by applying a fixed linear transformation M \in \mathbb{F}_{2^8}^{4 \times 4} to each column (MixColumns). In the end a fixed round-key is added to the whole state (AddKey).

We view states in AES in terms of the SuperBox representation, i.e. as a collection of vectors S = (s_0, s_1, s_2, s_3) \in (\mathbb{F}_{2^8}^4)^4 corresponding to the columns of the AES state. To each state we associate a vector \nu(S) \in (\mathbb{F}_2^4)^4 which indicates the active bytes in each column s_i of the state. Let \rho(s_i) \in \mathbb{F}_2^4 denote this vector which is 1 in position i if the i'th byte of the vector s_i is non-zero such that the vector defined as

\nu(S) = (\rho(s_0), \rho(s_1), \rho(s_2), \rho(s_3)) \tag{1}

is the configuration of active bytes for a state S. We also use a vector

\wp(S) = (\text{wt}(s_0), \text{wt}(s_1), \text{wt}(s_2), \text{wt}(s_3)) \tag{2}

to identify the weights of the columns of a state. To simplify the notation we treat vectors also as integers, i.e. c \in \mathbb{F}_2^4 is also treated as an integer \sum_{i=0}^3 c_i \cdot 2^i, u \in (\mathbb{F}_2^4)^4 as the integer \sum_{i=0}^3 \left(\sum_{j=0}^3 c_j \cdot 2^j\right) 16^j, and a \in \mathbb{Z}_5^4 as an integer \sum_{i=0}^3 a_i \cdot 5^i. The indices of the matrices will then correspond naturally to configurations of active bytes and weights and are then easily derived from each other.

2.1 A 2^{16} \times 2^{16} Transition Probability Matrix for Active Bytes

The MixColumns matrix M \in \mathbb{F}_{2^8}^{4 \times 4} is applied to each column of the state. We are interested in the transition probabilities for active bytes through M, i.e. for two binary vectors u, v \in \mathbb{F}_2^4 let

T_{\mathcal{M}}(u, v) = \Pr(\rho(x \cdot \mathcal{M}) = v \mid \rho(x) = u) \tag{3}

denote the probability that a vector x \cdot M is non-zero in byte positions indicated by the 1's in v when x is non-zero in positions indicated by the 1's in u. The matrix M in AES is derived from a linear [8,4,5] MDS code over \mathbb{F}_{2^8}. We have computed the transition probabilities exhaustively, but there exist explicit formulas for the weight distribution of MDS codes that simplifies this for the general case. There are in total 16 different configurations of active bytes at the input and 16 configurations of active output bytes, so T_M \in \mathbb{R}^{16 \times 16}. Let u, v \in \mathbb{F}_2^4 and let Z denote the 16 \times 16 matrix where Z(u, v) counts the number of elements x \in \mathbb{F}_{2^8}^4 with active bytes in positions indicated by u and where y = x \cdot M have active bytes indicated by v. The state transition for a column through the MixColumns layer is then as follows.

Table for Z can be found at github.com/sondrer/SPNTransitionProbability.

Definition 1

Let T_M \in \mathbb{R}^{16 \times 16} denote the transition probability matrix for active bytes over the MixColumn M with entries

T_{M}(u, v) = \frac{Z(u, v)}{(2^{8} - 1)^{\text{wt}(u)}} \tag{4}

for indicators u, v \in \mathbb{F}_2^4 for configurations of active bytes.

Since the MC-layer applies the matrix M to each column individually, it is now straight-forward to construct a transition probability matrix for the whole MC-layer. The MC-layer maps an input state x = (x_0, x_1, x_2, x_3) \in (\mathbb{F}_{2^8}^4)^4 to an output state y = (y_0, y_1, y_2, y_3) where

y_i = \text{MC}(x)_i \tag{5}
= x_i \cdot \mathbf{M} \tag{6}

thus we have the following trivial extension.

Definition 2

For vectors u, v \in (\mathbb{F}_2^4)^4 indicating the active bytes in each column, let a matrix T_{\mathrm{MC}} \in \mathbb{R}^{2^{16} \times 2^{16}} with entries

T_{\text{MC}}(u, v) = \Pr(\nu(\text{MC}(x)) = v \mid \nu(x) = u) \tag{7}
= \prod_{i=0}^{3} \Pr(\rho(\text{MC}(x)_i) = v_i \mid \rho(x_i) = u_i)
= \prod_{i=0}^{3} T_M(u_i, v_i) \tag{8}
= \prod_{i=0}^{3} T_{\mathcal{M}}(u_i, v_i) \tag{9}

denote the transition probability matrix for the full MixColumns layer in AES and where u, v are also treated as indices 0 \le u, v < 2^{16} and u_i, v_i as indices 0 \le u_i, v_i < 16.

Similarly, let T_{\text{SR}} denote the 2^{16} \times 2^{16} matrix with indices

T_{\text{SR}}(u, v) = \Pr(\nu(\text{SR}(y)) = v \mid \nu(x) = u) \tag{10}

where the vectors u, v \in (\mathbb{F}_2^4)^4 are treated as indices in the range 0 \le u, v < 2^{16}. The T_{\text{SR}} matrix is a permutation matrix and has a single 1 in each row and column, hence there is no uncertainty associated with it. Notice also that the SubBytes layer corresponds to the identity map with regards to active bytes transition probabilities, thus this layer is disregarded (determining the real effect, if any, of the s-box layer is the main remaining open problem). We use the SuperBox representation, which means that we remove the first SR-layer in order to work with columns and may or may not remove the final linear layer. Then we define the following r-round transition matrices for active bytes.

Definition 3

Let T_{\text{MC} \circ \text{SR}} = T_{\text{SR}} \cdot T_{\text{MC}}. Then define the r-round transition probability matrix for active bytes as

T_{f^r} = T_{\text{MC}} \cdot T_{\text{MC} \circ \text{SR}}^{r-1} \tag{11}

If the aim is solely to distinguish AES reduced to r rounds, then since the adversary may remove the first SR layer and last \text{MC} \circ \text{SR}-layer, the adversary can work with T_{f^{r-1}}. If the aim is to construct an r-round distinguisher in the hope to extend it to a (r+t)-round key-recovery, the adversary may consider only removing the first SR-layer and thus work with the T_{f^r}-matrix. The 2^{16} \times 2^{16} matrices are quite large, but in the next section we show how to compress them down to 625 \times 625 weight transition probability matrices.

2.2 The Weight Transition Probability

In this section we will construct a compressed weight transition probability that only depends on the Hamming weight of columns. The difference will be that we now only consider the number of active bytes in each column and do no longer have control over the exact active bytes. Thus, while the transition probability distributions derived from the previous matrices can be thought of as distributions for the exact configurations of active bytes in the state columns, we now only consider distributions on the number of active bytes (Hamming weight) of the state columns.

The MixColumns matrix M is symmetric with respect to weights in the sense that

T_{\mathcal{M}}(u, v) = T_{\mathcal{M}}(u', v') \tag{12}

for any choice of u', v' \in \mathbb{F}_2^4 with \text{wt}(u) = \text{wt}(u') and \text{wt}(v) = \text{wt}(v'). The weight transition probabilities through the M-matrix therefore depend only on the number, and not on the particular configuration, of active bytes. So we can construct a compressed weight transition probability matrix C_M \in \mathbb{R}^{5 \times 5} for the matrix M that satisfies

C_{\mathcal{M}}(a, b) = \Pr(\text{wt}(\mathcal{MC}(x)) = b \mid \text{wt}(x) = a) \tag{13}
= \sum_{\substack{v \in \mathbb{F}_2^4 \\ \text{wt}(v) = b}} T_{\text{M}}(u, v) \tag{14}
= \binom{4}{b} T_{\mathcal{M}}(u', v') \tag{15}

for weights a, b and where v, u', v' are any fixed vectors with \text{wt}(v) = \text{wt}(v') = b and \text{wt}(u') = a. This probability follows since there are \binom{4}{\text{wt}(b)} possible byte configurations for a vector b \in \mathbb{F}_q^4 of weight \text{wt}(b) at the output and for each of those the probability is T_{\text{M}}(a, b). We can now construct a 625 \times 625 weight transition probability matrix C_{\text{MC}} with entries

C_{\text{MC}}(u, v) = \prod_{k=0}^{3} C_{M}(u_k, v_k) \tag{16}

which is the probability for a state S with column weights \wp(S) = (u_0, u_1, u_2, u_3) to map to a state with column weights \wp(\text{MC}(S)) = (v_0, v_1, v_2, v_3) through the MixColumn layer.

We may construct a weight transition matrix C_{\text{SR}} for the ShiftRows layer in a similar fashion. For column weight indicators \wp(S) = u and \wp(\text{SR}(S)) = v the entries of this matrix are given by

C_{\text{SR}}(u, v) = \frac{1}{\prod_{j=0}^{3} \binom{4}{u_j}} \sum_{\substack{a, b \in (\mathbb{F}_2^4)^4 \\ \text{wt}(a_i) = u_i \\ \text{wt}(b_i) = v_i}} T_{\text{SR}}(a, b) \tag{17}

The probability follows as the sum sums over all possible active byte configurations in the output while the first fraction averages over the number of possible byte configurations in the input. We can now construct weight transition probability matrices for r rounds of AES.

Definition 4

Let C_{\text{MC} \circ \text{SR}} = C_{\text{SR}} \cdot C_{\text{MC}}. Then let

C_r = C_{\text{MC}} \cdot C_{\text{MC} \circ \text{SR}}^{r-1}

denote the weight transition probability matrix for r rounds of AES.

3 Some Results

If a \in \mathbb{R}^{625} denotes a weight probability distribution for the plaintext difference, then b = a \cdot C_{r-1} is the weight distribution on the ciphertext differences after r rounds, when the last linear layer is omitted (thus we focus on reduced round distinguishers). The uniform distribution is given by a vector q \in \mathbb{R}^{625} of values

q_v = 2^{-128} \prod_{i=0}^{3} \binom{4}{v_i} (2^8 - 1)^{v_i}

which is the probability that the output difference has weight pattern (v_0, v_1, v_2, v_3) where v = \sum_{i=0}^3 v_i \cdot 5^i regardless of the input. Now the goal is to determine vectors a, e \in \mathbb{R}^{624} and investigate the sum

\sum_{k=0}^{624} (q_k - b_k) e_k \tag{18}

The scaling vector e_k is just an enforced weighting on the ciphertext distribution which the adversary can impose as he would like. If e_v is zero, then ciphertext differences with a weight arrangement according to v is ignored completely. For instance, if we only consider events in which the three last columns are zero, then we have to ignore roughly 2^{-96} ciphertexts until we receive a pair of ciphertexts with our preferred property. Thus, there is a penalty in terms of data-complexity if we fixate on events that seldom happen, unless the cipher itself has a very unlikely probability distribution.

The matrices C_{\text{MC}}, C_{\text{SR}} and C_{r-1} are easy to work with and thus we will now use C_{r-1} to compute the probabilities corresponding to the same event as recently investigated in [BGL19] and [GR18].

3.1 Aligning With Recent Results on AES

To begin we have to define an input distribution a = (a_0, a_1, \ldots, a_{624}). For instance, if we assume that the input differences are non-zero and random in only the first column (remember that we are omitting the first SR-layer), we can let the 5 first indices of a correspond to

a_i = \binom{4}{i} \cdot \frac{(2^8 - 1)^i}{(2^{32} - 1)}

where a_i for 0 \le i < 5 is the probability that we hit a difference with weight i in the first column given that the plaintext difference is known to be non-zero only in the first column. The corresponding output distribution then becomes b = a \cdot C_{r-1}. If we want to compute the probability that the output is non-zero in at least one column, we may sum over the probabilities that contribute to this case in the event of AES

p_{\text{AES}} = \sum_{\substack{v = (v_0, v_1, v_2, v_3) \\ \text{at least one } v_i \text{ zero}}} b_v

and compare this against the random case given by

p_{\text{rand}} = 2^{-128} \cdot \sum_{\substack{v = (v_0, v_1, v_2, v_3) \\ \text{at least one } v_i \text{ zero}}} \prod_{k=0}^{3} \binom{4}{v_k} (2^8 - 1)^{v_k}

For instance, if we assume the above input probability distribution a (i.e. input differences are non-zero in the first column only) and the corresponding output distribution p_{\text{AES}} for r = 5 and r = 6 rounds, we get the results of Table 1. In particular, [GR18] arrives at 2^{-30} + 2^{-50.980} for the event that a ciphertext difference is zero in at least one column when the plaintext difference is active in the first column, which is identical to the result obtained via the weight transition probability matrix.

In the case that the plaintext difference has exactly one active byte in the first column, i.e. the input distribution a has probability 1 in a_1, the method based on weight transition probabilities returns a probability 2^{-30} + 2^{-51.983} while [BGL19] arrives at 2^{-30} + 2^{-51.985}.

For 6 rounds the probability in [BGL19] for the same type of event is estimated to be 2^{-30} + 2^{-73.995}, which is also identical to the probability derived using the weight transition probability matrix. We have added the probability for 7 rounds into the table for completeness.

Table 1: Comparison between results obtained in literature vs our results obtained using the weight transition probability (WTP) method of AES probabilities, with collision in any ciphertext column as the plaintext event (PE) and with plaintext differences restricted to either one active column or one active byte as the ciphertext event (CE).

Rounds PE CE Probability in literature WTP Ref.
5 Active Byte Zero-column 2^{-30} + 2^{-51.985} 2^{-30} + 2^{-51.983} [BGL19]
5 Active Column Zero-column 2^{-30} + 2^{-50.980} 2^{-30} + 2^{-50.980} [GR18]
6 Active Column Zero-column 2^{-30} + 2^{-73.995} 2^{-30} + 2^{-73.995} [BGL19]
7 Active Column Zero-column 2^{-30} + 2^{-126.891}

However, there might be better choices of input and output distributions that can be used to optimize this further. For instance, in the same setting as above and for 7 rounds, we get

p_{\text{AES-7R}} = 2^{-30} + 2^{-126.891}

If we instead ask what the probability of getting at least one zero-byte, we get

p_{\text{AES-7R}} = 2^{-4} + 2^{-126.036}

Note that the weight transition probability matrix verifies the well-known impossible difference probability too (i.e. you get probability zero for the event of less than 5 active columns in total when evaluated for C_3).

These results motivate a conjecture.

Conjecture 1

The weight transition probability matrix defined in Definition 4 emulates the true weight probabilities distributions in AES.

3.2 Weight Distributions Biased in Opposite Directions

Assume that we fix an output event, e.g. that at least one column in the difference is zero which happens with probability roughly 2^{-30} at random. The second type of events we could look for is the case when there exist two different input distributions, e.g. a^1, a^2 \in \mathbb{R}^{625} with a^1_i = 1 and a^2_j = 1 for i \neq j, such that the two output probabilities for the same event,

p_1 = \sum_{\substack{v = (v_0, v_1, v_2, v_3) \\ \text{at least one } v_j \text{ zero}}} b_v^1

and

p_2 = \sum_{\substack{v = (v_0, v_1, v_2, v_3) \\ \text{at least one } v_j \text{ zero}}} b_v^2

move in opposite direction from random. For instance, assume

p_1 = p_{\text{rand}} - \epsilon_1 \tag{19}

and

p_2 = p_{\text{rand}} + \epsilon_j \tag{20}

thus maximizing the distance between the two same-event probabilities p_1 and p_2 instead of comparing single AES-probabilities with random. For instance, for 5-rounds we can pick two different input conditions for the weight in the first column, u_1 = (3, 0, 0, 0) and u_2 = (2, 0, 0, 0) such that

p_1 = 2^{-30} - 2^{-50.358}

becomes the probability for collision in any column after 5 rounds when there are exactly 3 active bytes in the input difference and

p_2 = 2^{-30} + 2^{-50.390}

for a collision when there are exactly 2 active bytes, such that the difference

p_2 - p_1 = 2^{-49.373}

is larger than if we compared a single event against random.

4 Possible Further Research

There might be several interesting directions for further research, but we mention just a few.

4.1 Linear Optimization

To formally find the optimal distribution (choice of input and output events) that minimizes distinguishing complexity, one can employ a data- and computational complexity weighted linear optimization (linear programming).

4.2 Markov Chains and Stochastic Matrices

Markov chains, stochastic matrices etc. is a very well-studied area. What can be said about these state transition matrices by employing known theory to them?

4.3 Rate of Convergence

For I, J \subset \{0, 1, 2, 3\}, let p_{I,J}^r denote the probability that the ciphertext difference is zero in columns indicated by J given that the plaintext difference is active in exactly the columns indicated by I after r rounds. The motivation in this paper has been to study how probabilities p_{I,J}^r for ciphertext events J vary depending on the choice of input events I. Let s_{I,J} = s_J denote the uniform probability that ciphertext differences are zero in the columns indicated by J. Then if we look at the rounded values of \log_2\!\left(\frac{\max(p_{I,J}^r, s_J)}{(p_{I,J}^r - s_J)^2}\right) for increasing r, we observe that the values become independent of the choice of input distributions (determined by I) not until r = 10 rounds of AES.

4.4 The Effect of the S-box

The s-box layer acts as a probability 1 identity map in the transition probability matrix. However, the s-box maps differences in certain ways that might at least in theory have some effect on the probability distribution. It is not clear how large, if any, this effect is for a generic s-box.

4.5 Testing Ciphers

This tool can be employed on a range of other similar ciphers which may be an interesting study in itself, and as a simple and efficient tool to search for optimal new designs.

5 Conclusion

We have presented a weight transition probability related to SPNs that can be used to derive probabilities for collision events in AES that matches new results recently published in [BGL19] and [GR18].

References

  • [BGL19] Bao, Z., Guo, J., List, E.: Extended expectation cryptanalysis on round-reduced AES. Cryptology ePrint Archive, Report 2019/622 (2019). eprint.iacr.org/2019/622
  • [DR02] Daemen, J., Rijmen, V.: The design of Rijndael: AES – the Advanced Encryption Standard. Springer (2002)
  • [Gra18] Grassi, L.: Mixture differential cryptanalysis: a new approach to distinguishers and attacks on round-reduced AES. IACR Transactions on Symmetric Cryptology 2018(2), 133–160 (Jun 2018). [page on this site]
  • [GR18] Grassi, L., Rechberger, C.: Rigorous analysis of truncated differentials for 5-round AES. Cryptology ePrint Archive, Report 2018/182 (2018). eprint.iacr.org/2018/182 [page on this site]
  • [GRR17] Grassi, L., Rechberger, C., Rønjom, S.: A new structural-differential property of 5-round AES. In: Advances in Cryptology – EUROCRYPT 2017 – 36th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Paris, France, April 30 – May 4, 2017, Proceedings, Part II. pp. 289–317 (2017)

History

  • 2026-02-17Add disclaimer: content not author-approved, eprint is authoritative6638546
  • 2026-02-16Add CONVERTED_DATE to existing 47 paper pages7191c14
  • 2026-02-16Add crawler metadata to all 47 paper pagesc6638f2
  • 2026-02-16Convert numeric citations to BibTeX-style keys across all papers71c86d3
  • 2026-02-14Add 36 new paper pages and update papers index6e99f38