Tuesday, November 4, 2014

Testing Time: Seed Strength According to the NIST Test Suite

As promised, we now need to evaluate the level of unpredictability of our accelerometer seeds against a known standard, provided by NIST (National Institute of Standards and Technology). To do this, we ran the accelerometer in the background, collecting data while the HTC phone (the same one in the unlocking demo) went about its daily activities.  The accelerometer sampled at a rate of 100Hz (hundred samples per second).  This process kept going for just a little over a half-an-hour before the smartphone left us with 205,595 data points.  For the sake of simplicity, we'll just use the first 200,000 for the tests. Because we're comparing our data against established random number generators (RNG), we have to be able to conduct a sufficient number of analyses on the data.  This prevents the possibility that simply by chance, an RNG performed better or worse than our data sequence.  We had to "expand" our 200,000 data points into more data so that we could have repeated tests.  To do this, we created overlapping data sequences, each 20,000 in length and moved the window by 1,000 points each time.  This gives us a total of  3,620,000 numbers in the final data sequence for a total of 181 bitstreams of 20,000 numbers each. As a comparison we will use two widely used RNGs, the Blum-Blum-Shub and the Secure Hash Algorithm (G-SHA1), both of which are available on the NIST Test Suite.  The following table shows the % pass rate for a set of key tests:
Table 1 - Results from the NIST Test Suite for the accelerometer data, Blum-Blum-Shub, and Secure Hash Algorithm.
Table 1 - Results from the NIST Test Suite for the accelerometer data, Blum-Blum-Shub, and Secure Hash Algorithm.
Our Washed + Rinsed output is able to perform fairly close to the Blum-Blum-Shub RNG, and outperforms on Linear Complexity and Block Frequency.  The G-SHA1 performs far more poorly on all counts, with the exception of Linear Complexity.  The Raw Accelerometer Data performs just as badly as the G-SHA1.
A test that we had to dig deeper into was Approximate Entropy, which had the largest discrepancy, a sudden drop from a high pass rate to zero.   We extracted all of the Approximate Entropy values and computed key statistics across the 181 runs.
Table 2 - Approximate Entropy values for the different data sequences.
Table 2 - Approximate Entropy values for the different data sequences.
While the G-SHA1 entropy values are about 30% lower than the Blum-Blum-Shub, our Washed-Rinsed seeds possess an entropy that is within 3% of the Blum-Blum-Shub, despite not passing the NIST test.  The Raw Accelerometer data on the other hand, have entropy levels that are barely close to 50% of the G-SHA1, and little more than a third of the Blum-Blum-Shub and Washed + Rinsed data.  Making matters worse, the Raw Accelerometer Data are highly unstable, with a range that is larger than the mean.  This means that on some occasions, the data sequences are extremely predictable.  Even the maximum entropy of the Raw Accelerometer data is only about half of the Washed-Rinsed data and Blum-Blum-Shub.  Complete results from the NIST Test Suite are available for download:  G-SHA1 results, Blum-Blum-Shub results, and Washed + Rinsed results, and Raw Accelerometer results.  A complete explanation of the tests and NIST Test Suite functions can be found here.

The take home message here is that the RNGs used here require seeds that are sufficiently unpredictable in order to function securely.  These tests results show that our data sequences are within range of being random enough to be used for secure communication, well beyond just serving as seeds for the RNGs.  And, raw sensor data that capture user behaviors are NOT unpredictable enough to be cryptographically secure.  

Advantages of this Technology 1) Availability.  First off, this method of seeding the random number generator takes advantage of user behavior as an initial source of unpredictability.  Indeed, password managers seed their RNGs with potentially unpredictable user behavior, but crudely, e.g., accelerometer data obtained from the shaking of the smartphone (STRIP) or mouse movements and a keypress (KeePass).  All a hacker needs to do then, is learn a little bit about their target, and the "random" passwords begin to fall like dominoes.  Now, instead of using user behavior alone as seeds, we leverage that initial source of unpredictability and remove the predictable parts of user behavior and boost the unpredictability further.  2) Reliability.  Second, this technology is particularly suited for the growth in always-on technology, such as smartwatches, Google Glass, etc.  where a constant stream of data is available.  Even if the device is not always on, for example, a smartphone, start-up time would be enough to provide an initial data sequence.  While we collected data at 100Hz, the Bosch accelerometer is capable of sampling unfiltered data at 2,000Hz (see p. 16 of spec sheet).  This would mean that a 5 second startup time is enough to kick start and seed an RNG.  3) Cost-Free.  Third, this technology does not introduce additional power or hardware requirements to any existing system.  Sensors are already embedded as working components of virtually all mobile computing devices, and the number of sensors will only increase over time.  4) Security.  Last, but most important is the security that this technology provides.  If the generation of the random numbers is done at a level where no software access is provided, hackers are kept out.

No comments: