A dialog between AI and Cognitive Science…
Thomas Schatz, ETAL, June 16, 2023
A dialog between AI and Cognitive Science…
A dialog between AI and Cognitive Science…
A dialog between AI and Cognitive Science…
Crevier, D. (1993). AI: the tumultuous history of the search for artificial intelligence
Lighthill report (1972)
Work in the pattern-recognition field has not yet proved competitive with conventional methods: even the recognition of printed and typewritten characters posed a quite surprising degree of difficulty, while the recognition of handwritten characters appears completely out of reach. Speech recognition has been successful only within the confines of a very limited vocabulary, and large expenditure on schemes to produce machine recognition of ordinary speech has been wholly wasted. Learning techniques, by which a machine's performance at recognising words might improve on receiving identified words from more and more individual speakers, appear feasible only for an exceedingly small vocabulary (such is the power of the combinatorial explosion) like the decimal digits
Lighthill report (1972)
Work in the pattern-recognition field has not yet proved competitive with conventional methods: even the recognition of printed and typewritten characters posed a quite surprising degree of difficulty, while the recognition of handwritten characters appears completely out of reach. Speech recognition has been successful only within the confines of a very limited vocabulary, and large expenditure on schemes to produce machine recognition of ordinary speech has been wholly wasted. Learning techniques, by which a machine's performance at recognising words might improve on receiving identified words from more and more individual speakers, appear feasible only for an exceedingly small vocabulary (such is the power of the combinatorial explosion) like the decimal digits
The computational complexity of perception
Lighthill report (1972)
Work in the pattern-recognition field has not yet proved competitive with conventional methods: even the recognition of printed and typewritten characters posed a quite surprising degree of difficulty, while the recognition of handwritten characters appears completely out of reach. Speech recognition has been successful only within the confines of a very limited vocabulary, and large expenditure on schemes to produce machine recognition of ordinary speech has been wholly wasted. Learning techniques, by which a machine's performance at recognising words might improve on receiving identified words from more and more individual speakers, appear feasible only for an exceedingly small vocabulary (such is the power of the combinatorial explosion) like the decimal digits
The beginnings of computational complexity theory
J. Hartmanis and R. Stearns. On the computational complexity of algorithms. Transactions of the American Mathematical Society, 117:285–306, 1965.
The computational complexity of perception
The computational complexity of perception
The computational complexity of perception
Is computational complexity relevant to understanding human cognition?
Lloyd, S. (2000). Ultimate physical limits to computation. Nature, 406(6799), 1047-1054.
1kg, 1L computer:
(average brain around 1.5kg, 1.13L)
The computational complexity of perception
Perception solves problems as complex (or more) than high-level cognition
Yet, so much faster
How is human perception so fast and accurate?
Brooke-Wilson, T. (2023). How Is Perception Tractable?. Philosophical Review, 132(2), 239-292.
Is computational complexity relevant to understanding human cognition?
Lloyd, S. (2000). Ultimate physical limits to computation. Nature, 406(6799), 1047-1054.
1kg, 1L computer:
(average brain around 1.5kg, 1.13L)
Speech recognition and auditory neuroscience
Speech recognition and auditory neuroscience
Historical input features
Speech recognition and auditory neuroscience
Historical input features
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on acoustics, speech, and signal processing, 28(4), 357-366.
Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. the Journal of the Acoustical Society of America, 87(4), 1738-1752.
Speech recognition and auditory neuroscience
Historical input features
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on acoustics, speech, and signal processing, 28(4), 357-366.
Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. the Journal of the Acoustical Society of America, 87(4), 1738-1752.
Speech recognition and auditory neuroscience
Historical input features
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on acoustics, speech, and signal processing, 28(4), 357-366.
Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. the Journal of the Acoustical Society of America, 87(4), 1738-1752.
Explicitly informed by auditory neuroscience
Speech recognition and auditory neuroscience
Historical input features
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on acoustics, speech, and signal processing, 28(4), 357-366.
Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. the Journal of the Acoustical Society of America, 87(4), 1738-1752.
Explicitly informed by auditory neuroscience
Essentially a transparent model of the auditory periphery
Speech recognition and auditory neuroscience
Nowadays ?
Speech recognition and auditory neuroscience
Nowadays ?
Image processing : end-to-end modeling, from the pixels
Speech ?
Speech recognition and auditory neuroscience
Nowadays :
Image processing : end-to-end modeling, from the pixels
Speech ?
Open AI whisper : Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2022). Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2212.04356.
log-mel spectrogram
Speech recognition and auditory neuroscience
Nowadays :
Image processing : end-to-end modeling, from the pixels
Speech ?
Open AI whisper : Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2022). Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2212.04356.
log-mel spectrogram
Rahman, M., Willmore, B. D., King, A. J., & Harper, N. S. (2020). Simple transformations capture auditory input to cortex. Proceedings of the National Academy of Sciences, 117(45), 28442-28451.
Cognitive Science, Bio-inspiration and Engineering
Cognitive Science, Bio-inspiration and Engineering
Geoffrey Hinton
Yann le Cun
Cognitive Science, Bio-inspiration and Engineering
Jürgen Schmidhuber
Yoshua Bengio
Deep learning models of human perception
Deep learning models of human perception
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. NeurIPS 2012. (134,117 citations on June 16, 2023)
Deep learning models of human perception
Yamins, D. L., Hong, H., Cadieu, C., & DiCarlo, J. J. (2013). Hierarchical modular optimization of convolutional networks achieves representations similar to macaque IT and human ventral stream. Advances in neural information processing systems, 26.
Yamins, D. L., Hong, H., Cadieu, C. F., Solomon, E. A., Seibert, D., & DiCarlo, J. J. (2014). Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the national academy of sciences, 111(23), 8619-8624
Deep learning models of human perception and their limits
Deep learning models of human perception and their limits
Nguyen A, Yosinski J, Clune J. Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images. In Computer Vision and Pattern Recognition (CVPR ’15), IEEE, 2015
Deep learning models of human perception and their limits
Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F. A., & Brendel, W. (2019). ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. ICLR.
Deep learning models of human perception and their limits
Baker, N., Lu, H., Erlikhman, G., & Kellman, P. J. (2018). Deep convolutional networks do not classify based on global object shape. PLoS Computational Biology, 14(12), e1006613.
Deep learning models of human perception and their limits
Firestone, C. (2020). Performance vs. competence in human–machine comparisons. Proceedings of the National Academy of Sciences, 117(43), 26562-26571.
Deep learning models of human perception and their limits
Firestone, C. (2020). Performance vs. competence in human–machine comparisons. Proceedings of the National Academy of Sciences, 117(43), 26562-26571.
But see : Bowers, J. S., Malhotra, G., Dujmović, M., Montero, M. L., Tsvetkov, C., Biscione, V., ... & Blything, R. (2022). Deep problems with neural network models of human vision. Behavioral and Brain Sciences, 1-74.
Deep learning models of human perception and their limits
Adolfi, F., Bowers, J. S., & Poeppel, D. (2023). Successes and critical failures of neural networks in capturing human-like speech recognition. Neural Networks, 162, 199-211.
Deep learning models of human perception and their limits
Adolfi, F., Bowers, J. S., & Poeppel, D. (2023). Successes and critical failures of neural networks in capturing human-like speech recognition. Neural Networks, 162, 199-211.
Deep learning models of human perception and their limits
Adolfi, F., Bowers, J. S., & Poeppel, D. (2023). Successes and critical failures of neural networks in capturing human-like speech recognition. Neural Networks, 162, 199-211.
Current limits of deep learning methods
Current limits of deep learning methods
Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2022). Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2212.04356.
Current limits of deep learning methods
Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2022). Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2212.04356.
Current limits of deep learning methods
Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2022). Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2212.04356.
Olivier, R., & Raj, B. (2022). There is more than one kind of robustness: Fooling Whisper with adversarial examples. arXiv preprint arXiv:2210.17316.
What next?
Infant object perception and unsupervised instance segmentation
Infant object perception and unsupervised instance segmentation
Spelke, E. S. (1990). Principles of object perception. Cognitive science, 14(1), 29-56.
Infant object perception and unsupervised instance segmentation
Chen, H., Venkatesh, R., Friedman, Y., Wu, J., Tenenbaum, J. B., Yamins, D. L., & Bear, D. M. Unsupervised segmentation in real-world images via spelke object inference. ECCV 2022
Infant object perception and unsupervised instance segmentation
Chen, H., Venkatesh, R., Friedman, Y., Wu, J., Tenenbaum, J. B., Yamins, D. L., & Bear, D. M. Unsupervised segmentation in real-world images via spelke object inference. ECCV 2022
Infant object perception and unsupervised instance segmentation
Chen, H., Venkatesh, R., Friedman, Y., Wu, J., Tenenbaum, J. B., Yamins, D. L., & Bear, D. M. Unsupervised segmentation in real-world images via spelke object inference. ECCV 2022
Summary on perception