Sunday, December 14, 2014

Wash, Rinse, Spin (or Twist): Sensor-Based RNG Seeding

It turns out, a good way to generate random numbers from sensor data is much like getting your laundry done.  It's a 3-step process.  Step 1 - WASH:  Clean out and remove major traces of user behavior from the data sequence.  Step 2 - RINSE:  Increase unpredictability in the data sequence.  Step 3 - SPIN:  The accelerometer data sequence goes into the random number generator as seeds and are spun (twisted if you're using the Mersenne) into a nice series of random numbers. WASH At this stage, the main goal is to remove some of the major, slow drifts in the data sequence.  These are the most predictable parts of the sequence as they are our biggest source of predictability.  It is important to keep in mind that drift is also natural in electronic sensors, even if untouched, for example due to changes in ambient temperature. This makes the first step essential in eliminating the bulk of the predictability within the sequence.
Drifting
Areas of high predictability marked out with red circles from our original accelerometer data sequence.
For the sake of maintaining some of our "trade secrets," we cannot reveal exactly how we "wash" the data to remove the major sources of predictability in the data.  What you will see is that the effects are clear in the image below.  Once the wash cycle is completed, all that is left are 'pulses' at certain points in the data sequence.  What's important at this point is that we can test how well we have eliminated the drift using measurements of "stationarity" to let us know whether the average and variance in the data sequence change over time.
Wash Effect
Major traces of user behavior and drift are wiped up by the wash cycle. Only pulses are left along pretty much a straight line.
At this point, the data sequence is now more unpredictable than it was initially, but, probably not unpredictable enough.  Next comes the rinse cycle.  Just like the laundry, we've removed the dirt, now the soap needs to go too.  The data now consists of pulses and "bare spots" where there is little activity.  The goal of this cycle is to transform the data further so that it actually looks like a random sequence.  To protect from additional vulnerabilities, our method does not manipulate the data directly.  Instead, we transform the data into complex numbers and change only the imaginary parts.  And, here we go:
Rinse Effect
After going through the "rinse" cycle, the data are now much more unpredictable than before. In fact, it looks pretty close to the numbers generated by the RNG in an earlier post.
The final results are quite compelling.  At least visually, they are comparable to a 10,000 point data sequence generated with the Mersenne Twister, displayed in an image from our previous blog.  We now have an 8-bit integer sequence that can be fed into a random number generator as seeds for encryption, password management, etc. What's Next?  In our upcoming post, we will give you the hard facts.  Using the NIST random number generation and testing toolkit, we will test the quality of our results, measuring the unpredictability against the NIST standard and other RNGs.  We will also outline the benefits and best practices of using this method.  Stay tuned!

No comments: