When we first started working on this project, our initial thoughts were, "how hard can it be to break down 3D accelerometer data?" It looked like it should be pretty straightforward at first, knowing that accelerometers are also equipped with a gyroscope to let us know which direction the phone is facing in space. The next image is a schematic that shows the 3-dimensions of acceleration that the phone is able to detect. We should be able to know that when the Z-axis values are positive, the phone screen is facing up, away from the ground. This works the same for the other axes, which allow us to know the orientation of the phone with respect to the ground. But, then, we encountered Problem #1: The accelerometer only knows how it is oriented in space, but tells us NOTHING about what the user's body is actually doing.
Stonewalled...
Problem #1, without any special hardware to track the user's body movements cannot truly be overcome. Problem #2 was probably just as tricky, but, at least had the potential to be overcome computationally. Taking a data-centric approach, we launched an all out assault. We tried practically every biomechanical and computational trick in the book. Filtering, 3D Euler angles, angular analyses, differentiation, integration... you name it, we probably tried it. Nothing worked. Even the most complicated algorithms with enough if/else/or clauses to fill an Olympic-sized swimming pool failed. Needless to say, Problem #3 is a bigger problem cannot be solved without solving the first two.
Folly.
Our frustration levels had pretty much boiled over and we decided to turn our focus toward other projects. While working on behavioral biometrics problems, we came to the realization that our analytical approach was simply, plain wrong. Doing what anyone would with biometrics, we tried to develop a template movement that we would use to match other subsequent movements in order to tell the screen when to turn on or off. Bad idea, especially since the human brain and body are naturally variable, where every performance of the same movement would be like snowflakes, same general pattern, different details.
Occam's Razor.
Simplicity was key. Trying to capture the actual movement of the user was not the right level of analysis. Instead, we needed to be able to capture the holistic, general characteristic of the user's movements to engage and disengage with the phone. Reinvigorated, we went back and attacked our problems, coming up with the following solutions:
Solution #1 - Assume that the only way that user's engage with their phone is in the hand (duh).
Solution #2 - Find a way to understand the general characteristics of each of the actions we want the phone to respond to. Forget the small stuff. This mean a generalized pattern in time and space that was a loose representation of the movements in question.
Solution #3 - Create a context-conditioned algorithm. We were only worried about the present, that is, the movement itself. But, our biggest error was that we forgot about the past. The algorithm had to be cognizant of where the phone had been in order to know where it was going and what events to account for and what to ignore.
Demo Time. Our next step was to then begin to construct a working algorithm and then deploy it on a smartphone as a proof-of-concept demonstration. We will cover this in our next post and provide a narrative for the YouTube videos that we have floating around.
No comments:
Post a Comment