1 of 18

Preliminary User Evaluation of Deepfake Detection System in Criminal Justice Facial Evidence Verification

�IPSJ 86^th Conference Presentation 2024

Ebrima Hydara, Masato Kikuchi, Tadachika Ozono

Nagoya Institute of Technology | 21 February 2024

2 of 18

Introduction: Research Goal

Problem

Deepfake: highly realistic fake media
Facial evidence: proof by facial identity
Current facial evidence verification methods cannot tackle deepfake evidence (Wehrli et al. 2022)

Solution

User-centric deepfake detection system for facial evidence veracity check
Use human-in-the-loop method for improvement
Incorporate forensic techniques to comply with judicial proceedings
Safeguard evidence integrity in criminal justice

3 of 18

Related Work: Current Methods

Facial Evidence Verification:

Face-matching (Bacci et al., 2021), (Moreton, 2021)
Facial Recognition Technology (Hill, D. et al., 2022)
Automated Facial Recognition System (Khan et al., 2021)
They all use face matching as main technique

Deepfake Detection:

Threshold Classifier (Reis & Ribeiro, 2023)
Likelihood Ratio Framework: (Meuwly, Ramos, & Haraksim, 201)
Statistical Analysis: (Morrison, 2022)
Biometric-Based Forensic Technique: (Agarwal et al., 2020)
CNN, RCNN, & ViT-based architectures (Silver et al., 2022)

4 of 18

Related Work: Gap Analysis

Current Deepfake Detection:

No decision-making explainability for criminal justice
No confidence boosting mechanism for forensics
No individual frame prediction, timestamp & heatmap
No specialized reporting scheme for criminal justice

Proposed System Method:

Synergy of deepfake detection & forensic techniques to explain decision-making
Confidence boosting for forensics
Individual frame prediction, timestamp, heatmap
Specialized reporting for criminal justice

5 of 18

Contribution: Originality/Novelty

Method: the combination of the following to holistically address the problem:

Deepfake Detection to detect veracity of evidence
Confidence Threshold to filter only trustworthy predictions, minimizing false positives, ensuring the credibility of evidence, and preserving the judicial proceeding’s integrity
Frame Timestamps to track events and enable verification of temporal integrity of evidence
Frame Heatmap to visualize regions of manipulations for forensics

6 of 18

Methodology: Training & Evaluation

Training & Testing

Accuracy > 80%

Datasets

7 of 18

System Implementation: Architecture

8 of 18

System Implementation: Prototype

9 of 18

System Implementation: Demo

10 of 18

Experiments: Setup

Objective: measure system-user collaborative performance & user confidence on system’s performance
Dataset: 200 videos of both deepfake and authentic footage
Participants: 10 non-expert users from varying backgrounds
Records: standardized recording of results in Excel
Analysis: accuracy, precision, recall, & F1 score measurement

11 of 18

Experiments: Results (System Performance)

Best Performance Confidence Threshold:

80 to 85

12 of 18

Experiments: Results

Collaborative performance evaluation results:

User perceptions resonate with system predictions
Users were more careful to misrepresent footage as deepfake due to their human resilience

13 of 18

Experiments: Results

User confidence spectrum in system’s predictive performance results:

Most users are confident in the system’s predictions
User confidence should however be strengthened around recall (minimize FP)

14 of 18

Discussion

Strength

Accuracy & Precision: innocent suspects can be acquitted of false charges by just 1 frame predicted as fake
User confidence: promising outlook for the criminal justice domain where confidence & trust are required

Limitations

Recall: lower system recall than human judgment recall calls for improvement on system’s resilience
No criminal justice deepfake datasets
No access to forensic experts for evaluations

15 of 18

Conclusion: Findings & Future Work

Findings

Accuracy and Precision scores show high collaborative user-system performance, however, Recall and F1 scores show higher human resilience than the system
A user-centric approach is needed for a highly performant deepfake detection system for facial evidence verification in criminal justice.

Future Work

Better explainability and feedback
Better system-user interaction transparency
Improve system’s recall for more accountability and trust
Engage Forensic experts for system evaluation
Define new standard operating procedures for policy guidelines

16 of 18

References

Bacci, N. et al.: Forensic Facial Comparison: Current Status, Limitations, and Future Directions. Biology, vol. 10, no. 12, 1269, pp. 1–26 (2021).
Moreton, R.: Forensic face matching: Procedures and application. In M. Bindemann (Ed.), Forensic face matching: Research and practice, pp. 144–173 (2021).
Hill, D. et al.: Police use of facial recognition technology: The potential for engaging the public through co-constructed policy-making. International Journal of Police Science and Management, vol. 24, no. 3, pp. 325–335 (2022).
Khan, Z. A. et al.: AI-Based Facial Recognition Technology and Criminal Justice: Issues and Challenges. Turkish Journal of Computer and Mathematics Education, vol. 12, no. 14, pp. 3384–3392 (2021).
Dosovitskiy, A. et al.: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In: International Conference on Learning Representations (2021).
Towler, A. et al.: Unfamiliar face matching systems in applied settings. In: pp. 21–40, (2017).
Wehrli, S. et al.: Bias, awareness, and ignorance in deep-learning-based face recognition. In: AI Ethics 2, pp. 509–522, (2022).

17 of 18