Recently, Matthew Houliston, our Systems and Data Director, attended a workshop hosted by the Biometrics Institute on 'Face Recognition and How to Mitigate Risk,' presented by the National Institute of Standards and Technology (NIST).
This workshop played a key role in our ongoing research into the biometrics industry and how Serve Legal can assist technology providers and deployers in testing the efficacy of facial recognition systems—both in the lab and in real-world operational settings.
In this short article, we highlight the key takeaways from the workshop, which highlight the complexities of biometric technologies and their deployment.
Performance Metrics Can Be Misleading: Claims of 99%+ accuracy can be deceptive without proper context. Critical factors such as sample size, the use of independent, unseen data, and the conditions under which testing is conducted (e.g., real-world or lab settings) must be clearly understood for accurate interpretation. Deployers must also make sense of error metrics like False Positive Rate (FPR) and False Negative Rate (FNR), which need careful interpretation to ensure the practical implications / risks are understood.
NIST's Role and Third-Party Testing: While NIST provides valuable benchmarking data, it is not an accrediting body. Their tests are based on static datasets, highlighting the need for third-party, real-world testing to fully evaluate these systems. Deployers should insist on rigorous third-party testing across three stages: acceptance, integration, and operational. This ensures a thorough evaluation at every step.
Risk Profiles and Deployment Context: Technologies perform differently depending on the environment and use case. From camera positioning to lighting conditions, understanding deployment factors is key; it is key for procurers to understand the operating threshold of these technologies. For example, consider UK border controls use of this technology in highly controlled environments, to ensure optimal performance.
Using skin-tone (either fitzpartrick or monk scale) can be good for measuring a systems performance for their ability to handle light reflectiveness, however, the scale is considered too linear to be able to demonstrate demographic performance accurately. A more valuable measure is looking at how facial phenotypes seem to affect model performance.
User Behaviour and Human Oversight: Performance can be affected by user behaviour—pitch, facial expressions for occlusions for example. Whilst automation is appealing, human oversight remains crucial, as trained staff are needed to validate or override systems when necessary.
Privacy and Data Retention: It is vital to examine technology providers' data retention policies. It is important to ask: Is it possible for their data to be reverse engineered causing raw data to be rendered back into an image, can they effectively handle a data deletion request, how do they separate/encrypted their data. These questions must be answered to ensure privacy.
Thank you to both the Biometrics Institute and NIST for these insights.
At Serve Legal, we are committed to helping businesses test their facial recognition systems, ensuring they perform accurately, fairly and securely. If you’re interested in learning more about how Serve Legal can assist your business in testing and validating biometric technologies, feel free to reach out to us.