Sensory: Voice and Face Recognition Blended Intelligently

Submitted by BDTI on Mon, 08/25/2014 - 22:02

Sensory has built its business and made its name over the past 20 years in voice detection and speech recognition, as InsideDSP's April 2013 coverage of the company's TrulyHandsFree always-on voice activation algorithm showcased. However, as Gordon Haupt, the company's director of vision technology, noted during a recent briefing, the name "Sensory" isn't speech-specific, indicative of the company’s long-term aspiration to expand beyond microphones into algorithms fed by other types of input devices. The first  additional type of device Sensory is now tackling is image sensors , and Sensory's new TrulySecure biometric authentication technology is its first imaging-based product.

If you've used the face recognition-based unlocking features built into v4 of Google's Android operating system, for example, or available for Apple's iOS via third-party applications, you may have encountered a frustrating blend of impressive potential and underwhelming implementation reality. For one thing, face recognition can often be fooled by presenting the camera with a photograph of someone instead of the real-life person, a loophole that both Android 4.1 and Samsung's "blink" facility strive to close. Face recognition's accuracy at recognizing a specific person can also be hampered in poor lighting conditions, for example, or when the subject is wearing glasses or not directly facing the camera, or more generally when he or she has done inadequate initial training of the algorithm.

First and foremost, Haupt claims, TrulySecure makes significant improvements in recognition speed, as this demonstration video from the company indicates:

And with respect to accuracy improvements, Haupt has a two-part response. First, he asserts that the company's face recognition algorithm is state-of-the-art. Its foundation comes from the OpenCV open-source computer vision library, along with low-level functions such as image manipulation and blurring. And it's been strengthened by in-house-developed proprietary vision technologies such as:

  • Spatial-textural modeling
  • Enhanced face detection
  • Image illumination correction
  • Facial distance scoring algorithms, and
  • "Liveness" detection via motion analysis and simultaneous measures of face and voice

To that last point, TrulySecure is a two-technology approach. As the above video notes, its base face recognition-only features can optionally be augmented by voice biometrics, leveraging Sensory's longstanding expertise in this area. Currently, the speaker identification is phrase-dependent (either pre-defined or user-defined), although more flexible phrase-independent speaker recognition is under development. And the combination of face and speaker identification translates into notably improved accuracy results (Figure 1).

Figure 1. Combining face recognition and speaker identification, according to Sensory, enables TrulySecure to deliver a ~98% detection rate, a 1-in-1,000 false accept rate, and a combined system EER (equal error rate) of ~1%.

The above results, according to Sensory, come from data recorded on various mobile devices using their built-in microphones and front-facing cameras, and is based on elementary single-session enrollments (i.e. face and voice training sequences). It encompasses more than 40 subjects, with varying pose, illumination, expressions and ambient noise types. In explaining the results, Sensory touts TrulySecure's SMART (Sensory Methodology for Adaptive Recognition Thresholding) feature, which involves a user first setting the desired biometric security mode for each individual application or action. Then, SMART determines how to leverage "sensor fusion" to achieve the highest accuracy for that setting, while simultaneously minimizing false rejections. TrulySecure's multimodal adaptive approach also continually learns and improves with usage.

Sensory feels that the combination of face recognition and speaker identification offers a superior alternative to fingerprint recognition-based biometric schemes, because it leverages the audio and imaging subsystems already built into a mobile electronics device versus requiring an incremental $5-13 cost increase for a fingerprint scanner. Fingerprint scanning is also unreliable, Sensory feels, with frequent false-reject results, requires incremental front panel real estate, and is easy to "break" via gummy-bear and glue fingerprint impressions, photographs, etc. And the alternative (and currently mainstream) passcode-based unlock approach is cumbersome, with easily forgotten passwords that also deliver poor security due to their frequent short length and non-randomness.

Speaking of security, Sensory's Haupt also emphasized during the briefing that TrulySecure's approach of running completely on the mobile device is advantageous versus "cloud"-based schemes. Server-centric authentication, Sensory believes, introduces risk both with respect to data (personal information can be stolen) and access (it's easy to crack online). It's also slow. And of course it requires server access in order to operate. But it also offloads some or all of the processing burden to the server. Unfortunately, Sensory was fairly tight-lipped about the processing and memory requirements of its fully mobile device-resident TrulySecure algorithms, aside from mentioning that the company's data minimization and code compression efforts translate into to a small-footprint approach that "runs fast in mobile and wearable environments."

One upside from performance and power consumption standpoints is that, unlike with TrulyHandsFree, TrulySecure does not run constantly; it's activated only when unlocking a device or launching an application. Future implementations of the core face recognition and face recognition-plus-speaker identification technologies, however, may be more demanding in this regard; Sensory specifically mentioned demographic detection and emotion discernment as particularly interesting applications. The currently available SDK has no special hardware requirements, Sensory says, aside from the earlier mentioned microphone and front-facing camera, and enables a cross-platform implementation approach.

Implementation-specific uncertainty aside, TrulySecure represents an intriguing sensor fusion application. And as processors, sensors and the software controlling them become increasingly capable in the future, both it and its competitors will undoubtedly improve in both features and robustness. For more information, including an on-site demonstration and evaluation copy of the SDK, contact the company at

Add new comment

Log in to post comments