General presentation
As explained, random and pseudo-random number generators are one of the foundations of computer security. Let's go into a little more detail...
The first ones are based on hardware:
- They use a physical source of entropy, using the sensors of the device: kth decimal place of the processor temperature, background noise picked up by the microphone, pressure, values picked up by the accelerometer, etc.
- These can be mixed with the memory residue, possibly with the user's action
- Based on a physical noise, it is a "true" random variable, but not provable: for example, we cannot prove that the random variable corresponding to this physical generation follows a normal law.
- This random is not reproducible.
- In cryptography, such generators are necessary to generate keys for cryptosystems.
- With pyboard, we have access to such a generator: pyb.rng().
The second are based on algorithms:
- Most of the time, the starting point is a modulo N recurrent sequence.
- The sequence is chosen so that the numbers produced during the iterations look a lot like random numbers.
- By "resemble", we mean for example: no statistical test should be able to distinguish between a truly random sequence and a sequence produced by this algorithm.
- The first term of the sequence is the seed of the pseudo-random generator. Providing the same seed twice in a row leads to generating the same pseudo-random numbers twice in a row.
- This reproducibility is interesting in cryptography: it makes it possible to easily perform symmetrical encryption by masking in noise, as we have seen previously.
In what follows, we will look in more detail at the algorithms for generating pseudo-random numbers.
Linear congruential generators
Presentation
We present in the following some of the generators actually used, starting with the linear congruential generators (LCG, 1948):
- their very low complexity makes them usable in the Internet of Things, even at very low resources;
- they are however unsafe, so other generators are later introduced.
These LCGs are defined as follows: $X_{n+1} = ( a \cdot X_n + c ) \% m$, where $a$, $c$ and $m$ are the generator parameters.
These generators are obviously periodic, and some parameters produce better sequences than others, in the sense that their periods are maximal.
To obtain such parameters, it is necessary and sufficient that:
- $c$ and $m$ are coprime integers (their gcd is 1).
- For each prime number $p$ dividing $m$, $(a-1)$ is a multiple of $p$.
- $m$ multiple of 4 $\Rightarrow (a-1)$ multiple of 4.
(valid for $c\neq $0).
Practical work
- Program this generator on the pyboard.
- Find good parameters $a, c, m$.
- Check that the numbers produced look random. For example, look at the frequency of occurrence of even and odd numbers.
- Integrate this random generator tool to the previously programmed one-time pad.
The XORshift
Presentation
This generator has been proposed recently by Marsaglia, it has basically three integer parameters $a,b,c$. Its operation can be summarized as follows:
- We start from a 32-bit vector, which serves as a seed.
- To calculate the new term from the current term $x$:
- We replace x by the result of the or exclusive bit-by-bit between x and its version shifted $a$ bits to the right (circular shift, the bits going out to the right go in to the left).
- Replace x with the result of the or exclusive bit-to-bit between x and its version shifted $b$ bits to the left (circular shift).
- We replace x by the result of the or exclusive bit-to-bit between x and its version shifted $c$ bits to the right (circular shift).
- The value obtained after these three operations is the new term of the sequence, to be output.
A piece of code in C detailing the XORshift with (12, 25, 27) as parameters would therefore be :
x ^= x >> 12; // a
x ^= x << 25; // b
x ^= x >> 27; // c
Practical work
- Make this generator on pyboard
- Compare the time required for random generation using the two techniques.
Other classical generators
Depending on your progress, you can complete your library with the following generators:
- The Linear feedback shift registers
- The Mersenne-twister, the default generator in Python, R, Ruby, Maple, etc..
You can then compare the execution times on the pyboard, and discuss the possibility of using such generators on such a component.
Cryptographic generators
Generators such as the LCG are fast and require few resources, so they can reasonably be used in the context of the Internet of Things, when there is a need for a random source. However, the randomness generated by such generators is of a rather poor quality: for example, they do not pass random test batteries such as those present in the TestU01 library. In the same way, they do not respect the condition required by Shannon to make the one-time pad a cryptographically secure technique.
One can indeed formulate properly, in mathematical terms, the fact that a pseudo-random generator is cryptographically secure. Without going into the formulas, this means that if the opponent has n bits produced by such a generator, it cannot, in reasonable time (polynomial) predict the next bit without, on average, being wrong once out of 2, and cannot produce a polynomial algorithm capable of completing our bit sequence, which makes fewer errors than a simple random prediction. Again, all this is formalized through the theories of probability and complexity.
There are cryptographically safe generators, such as :
- The Blum Blum Shub
- ISAAC.
We propose to implement on pyboard at least the BBS, or both.
Let us now point out that :
- the BBS requires a square elevation, and is therefore much less adapted to the Internet of Things than the LCG, which is linear.
- its security is, roughly, related to the difficulty to go back to the prime numbers whose product constitutes the modulo.
- the security of any cryptosystem currently in use comes down to the fact that the problem of factoring a number becomes difficult when it becomes large.
Here we find the fundamental problem of security in the context of the Internet of Things: as the size of prime numbers increases, so does the security of any cryptosystem currently in use...
- the more security increases
- the longer the calculations take, the more energy is consumed.
- the larger the size of the numbers to be manipulated increases, therefore require memory
- the more the size of the cryptograms increases, thus increasing the size of the data to be transmitted.
And the cost of transmission is the most expensive, in terms of battery power. In short, the key word in the Internet of Things, when it comes to security, is compromise: the greater the security, the greater the need for power, and the shorter the life of the network. It is therefore a question of positioning the cursor correctly...