1 of 18

Preliminary User Evaluation of Deepfake Detection System in Criminal Justice Facial Evidence Verification

IPSJ 86th Conference Presentation 2024

Ebrima Hydara, Masato Kikuchi, Tadachika Ozono

Nagoya Institute of Technology | 21 February 2024

1

2 of 18

Introduction: Research Goal

Problem

  • Deepfake: highly realistic fake media
  • Facial evidence: proof by facial identity
  • Current facial evidence verification methods cannot tackle deepfake evidence (Wehrli et al. 2022)

Solution

  • User-centric deepfake detection system for facial evidence veracity check
  • Use human-in-the-loop method for improvement
  • Incorporate forensic techniques to comply with judicial proceedings
  • Safeguard evidence integrity in criminal justice

2

3 of 18

Related Work: Current Methods

Facial Evidence Verification:

  • Face-matching (Bacci et al., 2021), (Moreton, 2021)
  • Facial Recognition Technology (Hill, D. et al., 2022)
  • Automated Facial Recognition System (Khan et al., 2021)
  • They all use face matching as main technique

Deepfake Detection:

  • Threshold Classifier (Reis & Ribeiro, 2023)
  • Likelihood Ratio Framework: (Meuwly, Ramos, & Haraksim, 201)
  • Statistical Analysis: (Morrison, 2022)
  • Biometric-Based Forensic Technique: (Agarwal et al., 2020)
  • CNN, RCNN, & ViT-based architectures (Silver et al., 2022)

3

4 of 18

Related Work: Gap Analysis

Current Deepfake Detection:

  • No decision-making explainability for criminal justice
  • No confidence boosting mechanism for forensics
  • No individual frame prediction, timestamp & heatmap
  • No specialized reporting scheme for criminal justice

Proposed System Method:

  • Synergy of deepfake detection & forensic techniques to explain decision-making
  • Confidence boosting for forensics
  • Individual frame prediction, timestamp, heatmap
  • Specialized reporting for criminal justice

4

5 of 18

Contribution: Originality/Novelty

  • Method: the combination of the following to holistically address the problem:
    • Deepfake Detection to detect veracity of evidence
    • Confidence Threshold to filter only trustworthy predictions, minimizing false positives, ensuring the credibility of evidence, and preserving the judicial proceeding’s integrity
    • Frame Timestamps to track events and enable verification of temporal integrity of evidence
    • Frame Heatmap to visualize regions of manipulations for forensics

5

6 of 18

Methodology: Training & Evaluation

6

Training & Testing

Accuracy > 80%

Datasets

7 of 18

System Implementation: Architecture

7

8 of 18

System Implementation: Prototype

8

9 of 18

System Implementation: Demo

9

10 of 18

Experiments: Setup

  • Objective: measure system-user collaborative performance & user confidence on system’s performance
  • Dataset: 200 videos of both deepfake and authentic footage
  • Participants: 10 non-expert users from varying backgrounds
  • Records: standardized recording of results in Excel
  • Analysis: accuracy, precision, recall, & F1 score measurement

10

11 of 18

Experiments: Results (System Performance)

11

Best Performance Confidence Threshold:

80 to 85

12 of 18

Experiments: Results

  • Collaborative performance evaluation results:
    • User perceptions resonate with system predictions
    • Users were more careful to misrepresent footage as deepfake due to their human resilience

12

13 of 18

Experiments: Results

  • User confidence spectrum in system’s predictive performance results:
    • Most users are confident in the system’s predictions
    • User confidence should however be strengthened around recall (minimize FP)

13

14 of 18

Discussion

Strength

  • Accuracy & Precision: innocent suspects can be acquitted of false charges by just 1 frame predicted as fake
  • User confidence: promising outlook for the criminal justice domain where confidence & trust are required

Limitations

  • Recall: lower system recall than human judgment recall calls for improvement on system’s resilience
  • No criminal justice deepfake datasets
  • No access to forensic experts for evaluations

14

15 of 18

Conclusion: Findings & Future Work

Findings

  • Accuracy and Precision scores show high collaborative user-system performance, however, Recall and F1 scores show higher human resilience than the system
  • A user-centric approach is needed for a highly performant deepfake detection system for facial evidence verification in criminal justice.

Future Work

  • Better explainability and feedback
  • Better system-user interaction transparency
  • Improve system’s recall for more accountability and trust
  • Engage Forensic experts for system evaluation
  • Define new standard operating procedures for policy guidelines

15

16 of 18

References

  • Bacci, N. et al.: Forensic Facial Comparison: Current Status, Limitations, and Future Directions. Biology, vol. 10, no. 12, 1269, pp. 1–26 (2021).
  • Moreton, R.: Forensic face matching: Procedures and application. In M. Bindemann (Ed.), Forensic face matching: Research and practice, pp. 144–173 (2021).
  • Hill, D. et al.: Police use of facial recognition technology: The potential for engaging the public through co-constructed policy-making. International Journal of Police Science and Management, vol. 24, no. 3, pp. 325–335 (2022).
  • Khan, Z. A. et al.: AI-Based Facial Recognition Technology and Criminal Justice: Issues and Challenges. Turkish Journal of Computer and Mathematics Education, vol. 12, no. 14, pp. 3384–3392 (2021).
  • Dosovitskiy, A. et al.: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In: International Conference on Learning Representations (2021).
  • Towler, A. et al.: Unfamiliar face matching systems in applied settings. In: pp. 21–40, (2017).
  • Wehrli, S. et al.: Bias, awareness, and ignorance in deep-learning-based face recognition. In: AI Ethics 2, pp. 509–522, (2022).

16

17 of 18

References

  • Reis, P. et al. (2023). “A forensic evaluation method for DeepFake detection using DCNN-based facial similarity scores.” In: Forensic science international, p. 111747. URL: https://api.semanticscholar.org/CorpusID:259070218.
  • Meuwly, D. et al. (2017). “A guideline for the validation of likelihood ratio methods used for forensic evidence evaluation.” In: Forensic science international 276, pp. 142–153. URL: https://api.semanticscholar.org/CorpusID:4785352.
  • Morrison, G. (2022). “Advancing a paradigm shift in evaluation of forensic evidence: The rise of forensic data science”. In: Forensic Science International: Synergy 5, p. 100270. DOI: 10 . 1016 / j . fsisyn . 2022 . 100270.
  • Agarwal, S. et al. (2020). “Detecting deep-fake videos from appearance and behavior”. In: 2020 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, pp. 1–6.
  • Hu, J. et al. (2021). “Improving the generalization ability of deepfake detection via disentangled representation learning”. In: 2021 IEEE International Conference on Image Processing (ICIP). IEEE, pp. 3577–3581.
  • Le, B. M. et al. (2023). “Why do deepfake detectors fail?” In: arXiv. arXiv: 2302.13156 [abs].
  • Akhtar, Z. (2023). “Deepfakes Generation and Detection: A Short Survey”. In: Journal of Imaging 9.1, p. 18. DOI: 10.3390/jimaging9010018.

17

18 of 18

Thank You

18