1 of 49

A dialog between AI and Cognitive Science…

Thomas Schatz, ETAL, June 16, 2023

2 of 49

A dialog between AI and Cognitive Science…

…about embodiment

3 of 49

A dialog between AI and Cognitive Science…

…about embodiment

focus on perception

4 of 49

A dialog between AI and Cognitive Science…

…about embodiment

focus on perception

selected examples from visual and speech perception

5 of 49

6 of 49

Crevier, D. (1993). AI: the tumultuous history of the search for artificial intelligence

7 of 49

Lighthill report (1972)

Work in the pattern-recognition field has not yet proved competitive with conventional methods: even the recognition of printed and typewritten characters posed a quite surprising degree of difficulty, while the recognition of handwritten characters appears completely out of reach. Speech recognition has been successful only within the confines of a very limited vocabulary, and large expenditure on schemes to produce machine recognition of ordinary speech has been wholly wasted. Learning techniques, by which a machine's performance at recognising words might improve on receiving identified words from more and more individual speakers, appear feasible only for an exceedingly small vocabulary (such is the power of the combinatorial explosion) like the decimal digits

8 of 49

Lighthill report (1972)

Work in the pattern-recognition field has not yet proved competitive with conventional methods: even the recognition of printed and typewritten characters posed a quite surprising degree of difficulty, while the recognition of handwritten characters appears completely out of reach. Speech recognition has been successful only within the confines of a very limited vocabulary, and large expenditure on schemes to produce machine recognition of ordinary speech has been wholly wasted. Learning techniques, by which a machine's performance at recognising words might improve on receiving identified words from more and more individual speakers, appear feasible only for an exceedingly small vocabulary (such is the power of the combinatorial explosion) like the decimal digits

9 of 49

The computational complexity of perception

Lighthill report (1972)

Work in the pattern-recognition field has not yet proved competitive with conventional methods: even the recognition of printed and typewritten characters posed a quite surprising degree of difficulty, while the recognition of handwritten characters appears completely out of reach. Speech recognition has been successful only within the confines of a very limited vocabulary, and large expenditure on schemes to produce machine recognition of ordinary speech has been wholly wasted. Learning techniques, by which a machine's performance at recognising words might improve on receiving identified words from more and more individual speakers, appear feasible only for an exceedingly small vocabulary (such is the power of the combinatorial explosion) like the decimal digits

The beginnings of computational complexity theory

J. Hartmanis and R. Stearns. On the computational complexity of algorithms. Transactions of the American Mathematical Society, 117:285–306, 1965.

10 of 49

The computational complexity of perception

11 of 49

The computational complexity of perception

12 of 49

The computational complexity of perception

Is computational complexity relevant to understanding human cognition?

Lloyd, S. (2000). Ultimate physical limits to computation. Nature, 406(6799), 1047-1054.

1kg, 1L computer:

10^51 logical operations / s
10^30 memory bytes

(average brain around 1.5kg, 1.13L)

13 of 49

The computational complexity of perception

Perception solves problems as complex (or more) than high-level cognition

Yet, so much faster

How is human perception so fast and accurate?

Brooke-Wilson, T. (2023). How Is Perception Tractable?. Philosophical Review, 132(2), 239-292.

Is computational complexity relevant to understanding human cognition?

Lloyd, S. (2000). Ultimate physical limits to computation. Nature, 406(6799), 1047-1054.

1kg, 1L computer:

10^51 logical operations / s
10^30 memory bytes

(average brain around 1.5kg, 1.13L)

14 of 49

Speech recognition and auditory neuroscience

15 of 49

Speech recognition and auditory neuroscience

Historical input features

16 of 49

Speech recognition and auditory neuroscience

Historical input features

Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on acoustics, speech, and signal processing, 28(4), 357-366.

Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. the Journal of the Acoustical Society of America, 87(4), 1738-1752.

17 of 49

Speech recognition and auditory neuroscience

Historical input features

Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on acoustics, speech, and signal processing, 28(4), 357-366.

Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. the Journal of the Acoustical Society of America, 87(4), 1738-1752.

18 of 49

Speech recognition and auditory neuroscience

Historical input features

Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on acoustics, speech, and signal processing, 28(4), 357-366.

Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. the Journal of the Acoustical Society of America, 87(4), 1738-1752.

Explicitly informed by auditory neuroscience

19 of 49

Speech recognition and auditory neuroscience

Historical input features

Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on acoustics, speech, and signal processing, 28(4), 357-366.

Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. the Journal of the Acoustical Society of America, 87(4), 1738-1752.

Explicitly informed by auditory neuroscience

Essentially a transparent model of the auditory periphery

20 of 49

Speech recognition and auditory neuroscience

Nowadays ?

21 of 49

Speech recognition and auditory neuroscience

Nowadays ?

Image processing : end-to-end modeling, from the pixels

Speech ?

22 of 49

Speech recognition and auditory neuroscience

Nowadays :

Image processing : end-to-end modeling, from the pixels

Speech ?

Open AI whisper : Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2022). Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2212.04356.

log-mel spectrogram

23 of 49

Speech recognition and auditory neuroscience

Nowadays :

Image processing : end-to-end modeling, from the pixels

Speech ?

Open AI whisper : Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2022). Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2212.04356.

log-mel spectrogram

Rahman, M., Willmore, B. D., King, A. J., & Harper, N. S. (2020). Simple transformations capture auditory input to cortex. Proceedings of the National Academy of Sciences, 117(45), 28442-28451.

24 of 49

Cognitive Science, Bio-inspiration and Engineering

25 of 49

Cognitive Science, Bio-inspiration and Engineering

Geoffrey Hinton

Yann le Cun

26 of 49

Cognitive Science, Bio-inspiration and Engineering

Jürgen Schmidhuber

Yoshua Bengio

27 of 49

Deep learning models of human perception

28 of 49

Deep learning models of human perception

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. NeurIPS 2012. (134,117 citations on June 16, 2023)

29 of 49

Deep learning models of human perception

Yamins, D. L., Hong, H., Cadieu, C., & DiCarlo, J. J. (2013). Hierarchical modular optimization of convolutional networks achieves representations similar to macaque IT and human ventral stream. Advances in neural information processing systems, 26.

Yamins, D. L., Hong, H., Cadieu, C. F., Solomon, E. A., Seibert, D., & DiCarlo, J. J. (2014). Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the national academy of sciences, 111(23), 8619-8624

30 of 49

Deep learning models of human perception and their limits

31 of 49

Deep learning models of human perception and their limits

Nguyen A, Yosinski J, Clune J. Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images. In Computer Vision and Pattern Recognition (CVPR ’15), IEEE, 2015

32 of 49

Deep learning models of human perception and their limits

Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F. A., & Brendel, W. (2019). ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. ICLR.

33 of 49

Deep learning models of human perception and their limits

Baker, N., Lu, H., Erlikhman, G., & Kellman, P. J. (2018). Deep convolutional networks do not classify based on global object shape. PLoS Computational Biology, 14(12), e1006613.

34 of 49

Deep learning models of human perception and their limits

Firestone, C. (2020). Performance vs. competence in human–machine comparisons. Proceedings of the National Academy of Sciences, 117(43), 26562-26571.

35 of 49

Deep learning models of human perception and their limits

Firestone, C. (2020). Performance vs. competence in human–machine comparisons. Proceedings of the National Academy of Sciences, 117(43), 26562-26571.

But see : Bowers, J. S., Malhotra, G., Dujmović, M., Montero, M. L., Tsvetkov, C., Biscione, V., ... & Blything, R. (2022). Deep problems with neural network models of human vision. Behavioral and Brain Sciences, 1-74.

36 of 49

Deep learning models of human perception and their limits

Adolfi, F., Bowers, J. S., & Poeppel, D. (2023). Successes and critical failures of neural networks in capturing human-like speech recognition. Neural Networks, 162, 199-211.

37 of 49

Deep learning models of human perception and their limits

Adolfi, F., Bowers, J. S., & Poeppel, D. (2023). Successes and critical failures of neural networks in capturing human-like speech recognition. Neural Networks, 162, 199-211.

38 of 49

Deep learning models of human perception and their limits

Adolfi, F., Bowers, J. S., & Poeppel, D. (2023). Successes and critical failures of neural networks in capturing human-like speech recognition. Neural Networks, 162, 199-211.

39 of 49

Current limits of deep learning methods

40 of 49

Current limits of deep learning methods

Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2022). Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2212.04356.

41 of 49

Current limits of deep learning methods

Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2022). Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2212.04356.

42 of 49

Current limits of deep learning methods

Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2022). Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2212.04356.

Olivier, R., & Raj, B. (2022). There is more than one kind of robustness: Fooling Whisper with adversarial examples. arXiv preprint arXiv:2210.17316.

43 of 49

What next?

44 of 49

Infant object perception and unsupervised instance segmentation

45 of 49

Infant object perception and unsupervised instance segmentation

Spelke, E. S. (1990). Principles of object perception. Cognitive science, 14(1), 29-56.

46 of 49

Infant object perception and unsupervised instance segmentation

Chen, H., Venkatesh, R., Friedman, Y., Wu, J., Tenenbaum, J. B., Yamins, D. L., & Bear, D. M. Unsupervised segmentation in real-world images via spelke object inference. ECCV 2022

47 of 49

Infant object perception and unsupervised instance segmentation

Chen, H., Venkatesh, R., Friedman, Y., Wu, J., Tenenbaum, J. B., Yamins, D. L., & Bear, D. M. Unsupervised segmentation in real-world images via spelke object inference. ECCV 2022

48 of 49

Infant object perception and unsupervised instance segmentation

Chen, H., Venkatesh, R., Friedman, Y., Wu, J., Tenenbaum, J. B., Yamins, D. L., & Bear, D. M. Unsupervised segmentation in real-world images via spelke object inference. ECCV 2022

49 of 49

Summary on perception

~1960 : the surprising computational complexity of perception
~1975-present : auditory neuroscience and state-of-the-art speech recognition systems
~1990-2010 : cognitive science, bio-inspiration and engineering
~2010-present : deep learning models of human perception: successes and caveats
now? deep learning limits and the development of perception in humans as an interesting source of inspiration