In our earlier posts, we mentioned the possibility of distilling data
from multiple sensors to obtain the most unpredictable data set
possible. Unfortunately, most sensor data would tend to be correlated,
as these sensors are there to capture user behavior, just from different
dimensions. On the surface, this would seem like a trivial issue, what
harm could there be in having more sensors on the phone? At a deeper
level, this poses a problem if sensor data are used for seeding random
number generators, e.g., STRIP and KeePass. Why? Because an
enterprising hacker could use one sensor as a template to estimate the
data sequences from another sensor. Let's use the Amazon Fire
smartphone as an example. An enterprising hacker could use
accelerometer data to infer the motions of the face and eye tracking
sensors and vice versa. This means that either all of the sensor data
has to be withheld from software, or none of them can be safely used as
seeds for random number encryption.
We have a solution.
What if we could collate all of the sensor data and then generate a
single data sequence that was minimally correlated across all of the
sensors? This would insure that if a hacker could gain access to one of
the data sequences, they won't all be compromised at once. Such a step
is in addition to the Wash and Rinse cycles, of course.
Recall what the
3-dimensional accelerometer data looked
like from our earlier post. It is clear that all of the data sequences
share a common pattern. The next step in the process is find where the
data are most correlated and project the data onto different dimension,
a dimension where the data share the least common space. The result is
the following data sequence:
- Transformed data to remove internal correlations.
To test the effectiveness of this procedure, we have to look at the
R-values of
the correlations. Only absolute R-values matter, as whether the data
are positively or negatively correlated is arbitrary, as long as they're
correlated, the share information. Without projecting the data onto a
new dimension, the average correlation across the three accelerometer
axes in its raw form is
0.35 vs.
0.25 when the data are projected onto the least correlated dimension.
The
next step is to run the same test across many repeated runs, the same
way we did with the NIST Test Suite, 181 runs of 20,000 data points.
Here's how it turned out:
- Projecting data onto uncorrelated dimensions cuts inter-dimensional correlations in half.
There
is a clear difference with the average R-value across the 181 runs cut
down by more than half, going from 0.53 for the raw data to 0.24 once
the data are transformed and projected onto a new dimension. As far as
stats are concerned, we ran a
one-way ANOVA to check for statistical significance, just in case. The effect of the transformation is definitely significant:
F(1,360) = 243.5,
p < .000001. This means that the differences that we have are extremely far outside the range of occurring by chance. It
is important to keep in mind that this is only a 3-D demonstration, the
strength of our results will actually increase as the number of inputs
is increased.
With devices increasing the number of
embedded and integrated sensors, the vulnerability created by their
inherent correlations has to be removed. Our approach allows data from
multiple integrated sensors to be transformed and used safely and
securely as seeds for a random number generator.
No comments:
Post a Comment