Sunday, December 14, 2014

Handling the Multi-Sensor Problem - Extracting Maximum Entropy

In our earlier posts, we mentioned the possibility of distilling data from multiple sensors to obtain the most unpredictable data set possible.  Unfortunately, most sensor data would tend to be correlated, as these sensors are there to capture user behavior, just from different dimensions.  On the surface, this would seem like a trivial issue, what harm could there be in having more sensors on the phone?  At a deeper level, this poses a problem if sensor data are used for seeding random number generators, e.g., STRIP and KeePass.  Why?  Because an enterprising hacker could use one sensor as a template to estimate the data sequences from another sensor.  Let's use the Amazon Fire smartphone as an example.  An enterprising hacker could use accelerometer data to infer the motions of the face and eye tracking sensors and vice versa.  This means that either all of the sensor data has to be withheld from software, or none of them can be safely used as seeds for random number encryption.  We have a solution.  What if we could collate all of the sensor data and then generate a single data sequence that was minimally correlated across all of the sensors?  This would insure that if a hacker could gain access to one of the data sequences, they won't all be compromised at once.  Such a step is in addition to the Wash and Rinse cycles, of course.

Recall what the 3-dimensional accelerometer data looked like from our earlier post.  It is clear that all of the data sequences share a common pattern.  The next step in the process is find where the data are most correlated and project the data onto different dimension, a dimension where the data share the least common space.  The result is the following data sequence:
Transformed data to remove internal correlations.
Transformed data to remove internal correlations.
To test the effectiveness of this procedure, we have to look at the R-values of the correlations.  Only absolute R-values matter, as whether the data are positively or negatively correlated is arbitrary, as long as they're correlated, the share information.  Without projecting the data onto a new dimension, the average correlation across the three accelerometer axes in its raw form is 0.35 vs. 0.25 when the data are projected onto the least correlated dimension.
The next step is to run the same test across many repeated runs, the same way we did with the NIST Test Suite, 181 runs of 20,000 data points.  Here's how it turned out:
Projecting data onto uncorrelated dimensions cuts inter-dimensional correlations in half.
Projecting data onto uncorrelated dimensions cuts inter-dimensional correlations in half.
There is a clear difference with the average R-value across the 181 runs cut down by more than half, going from 0.53 for the raw data to 0.24 once the data are transformed and projected onto a new dimension.  As far as stats are concerned, we ran a one-way ANOVA to check for statistical significance, just in case.  The effect of the transformation is definitely significant: F(1,360) = 243.5, p < .000001.  This means that the differences that we have are extremely far outside the range of occurring by chance.  It is important to keep in mind that this is only a 3-D demonstration, the strength of our results will actually increase as the number of inputs is increased.
With devices increasing the number of embedded and integrated sensors, the vulnerability created by their inherent correlations has to be removed.  Our approach allows data from multiple integrated sensors to be transformed and used safely and securely as seeds for a random number generator.

No comments: