Sunday, December 14, 2014

Randomness and Regularity on the Road to Sensor-Based RNG Seeding

So, what do a series of random numbers look like?  Here we have an image with 10,000 random integers.  The data sequence is very unpredictable, making it very difficult to figure out what number will come up next in the sequence.  A data sequence like this makes for a very good set of seeds to be fed to a PRNG (pseudo-random-number-generator), extremely secure if used for data encryption.
Random Integers
Sequence of integers generated by a PRNG.
What about sensor data?  For simplicity, let's assume the accelerometer is our sensor of choice.  Now, we can be pretty sure that the accelerometer data will look like the numbers generated by the PRNG.  But, let's be honest, most people are rarely separated from their mobile devices.  To see how random accelerometer data are during use, we left an app running in the background on a phone while the user went about their daily activities and interactions with the phone.  Here's what a 10,000 point data sequence looks like with user interaction.
Accelerometer Output 1
Accelerometer data along the X-axis during use.
As per some of the discussion points in the stackoverflow discussion we included in our previous post, the accelerometer data are far from random.  Every tap on the screen, every movement of the phone and change in orientation is captured, making this a set of bad seeds.  You can see how footsteps can also introduced unwanted predictability into the data sequence, so, no go here for direct entry of these data into the PRNG.
Accelerometer Output All
Data from all 3-dimensions of the accelerometer.
As we said in our previous post, more is not merrier.  Let's look at all 3 axes of motion on the accelerometer.  What becomes clear immediately is that the all of the three data sequences are quite strongly correlated.  They go up and down at the same time, or in other cases, when one goes up, the other goes down. Overall, there are 2 Problems to making the accelerometer data workable. Problem #1 - Traces of user behavior produce predictable patterns.  In addition, if multiple data streams from different sensors in the same device are being used, their natural correlations have to be removed. Problem #2 - Insufficient unpredictability in the data sequence, even once Problem #1 has been solved.
Our next post on this topic will cover the solutions to these problems, where we'll include images to take you through the data transformation process.

No comments: