Sunday, December 14, 2014

Random Numbers for Computing

Necessary randomness.  We rarely (if ever) pay much attention to the fact that all of our computing devices need to be able to produce random number sequences.  In reality, random numbers are absolutely essential for so many functions.  Perhaps the way that random numbers affect us the most are games with an element of chance.  Without these random numbers, a game of Yahtzee on your mobile phone would be really boring, every roll of the dice would just come out the same.  Computer generated random numbers are also needed to allow statistical predictions to be made and forecasting to be done.  Here, random numbers are needed to introduce a level of uncertainty into the data, allowing many simulations of a single condition, testing how probable a given outcome might be.  We see this process in weather forecasting, where there is an X% chance of rain.  Furthest from sight and mind is the role of random numbers in secure communication and the encryption of data.  in order to prevent unwanted parties from gaining access to the data is achieved using these random numbers.  What encryption protocols do is “scramble the data” in order to prevent unwanted parties from gaining access to the data is achieved by hiding the actual information using the random numbers. Unfortunately, like the Albert Einstein quote: "As I have said so many times, God doesn't play dice with the world,"   computers and computing devices are also unable to play dice.  It's not for the lack of arms and hands, producing random numbers is just a really difficult process.  To generate random number sequences, computers rely on algorithms known as “pseudo-random number generators" or PRNG.  This recent article published by the SC Magazine UK (an outlet for IT security professionals) really sheds light on this problem and how password managers are affected by the lack of good quality seeds for the PRNG.
Not stork-delivered.  Just like in biology, you need seeds to produce random numbers with a PRNG.  The process works by taking a single number or numbers as starting values that are then entered into a PRNG which then takes the starting values and "twists" the data into a (pseudo-)random sequence.  The Mersenne Twister (see Wikipedia entry here: http://en.wikipedia.org/wiki/Mersenne_twister) is probably the best example of this procedure.  Unlike biology, seeds that grow into a random number "tree" cannot be used to create new seeds and grow new trees for one major reason.  If you use the same seed(s) and the same PRNG algorithm, you will always get the number sequence from the PRNG.  Now, this presents us with a big problem, especially on the encryption front.  If someone knows the seed and the PRNG algorithm, they can easily decrypt the data, since all of random numbers used to scramble the data are known (see this link).
Sourcing seeds.  What becomes critical to security is that the seed itself comes from a fairly unpredictable source.  This is also not a trivial process.  Just take a quick look at this discussion on stackoverflow.  Some might hesitate before using the CPU clock for data encryption since the random numbers used for encryption since all anyone would need to know is when you actually performed the encryption process in order to gain access to your data.   One possible alternative would be to leverage the sensors that are embedded in our computing systems.  For example, the accelerometer is now being proposed as a possible source of PRNG seeds.
Sensor overload. Looking at the stackoverflow discussion, there is a problem with using sensors for seeding purposes.  Yes, if left idle, the accelerometer would generate a pretty random sequence of numbers due to the electrical background noise that exists in any electronic sensor.  But, sensors that are embedded in our computing devices are designed to track and obey the commands of a user, not to just sit idle.  All a hacker would then need to do is move the sensor through a known sequence of events and the seeds are known.  This could be anything moving the device a certain way to changing the temperature.  In addition, values generated by sensors in a device are available to the operating system, accessible through software.  And, as the discussants on the stackoverflow rightly point out, human behavior is really quite predictable and systematic, as even some of our own work has shown.
More is not merrier.  There's a scramble right now to jam in and integrate as many sensors as possible in the device.  On the surface, it would seem like having more data streams would provide more random numbers for seeds.  Not really, because all of the sensors are housed together on the same device, the data streams will be correlated to a good extent.  This means that if we wanted a designated sensor for this purpose, it would have to be housed outside the device and transmit data across some connection, making that a bad solution.  To make the sensor solution workable, correlations between data streams must be broken before the data sequences can be used to seed a PRNG.
How can these problems be solved?  Our next post will present solutions to these problems and outline a method of obtaining seeds from sensors.

No comments: