A | B | C | D | E | |
---|---|---|---|---|---|
1 | PRESENTER | TOPIC | PAPERS | Important Dates | |
2 | January 18 | Abhinav | Introduction to the class | ||
3 | |||||
4 | January 23 | Abhinav | Theories of vision | ||
5 | |||||
6 | January 25 | Abhinav | Theories of vision | ||
7 | |||||
8 | January 30 | Abhinav | Introduction to Data | • A. Halevy, P. Norvig, and F. Pereira. The Unreasonable Effectiveness of Data. IEEE Intelligent Systems, 24 8–12, 2009. link • A. Torralba, and A. Efros. Unbiased Look at Dataset Bias. CVPR 2011 | |
9 | |||||
10 | February 1 | Abhinav | Introduction to Deep Learning-1 | • A. Krizhevsky, I. Sutskever, and G.E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012 • K. Simonyan, A. Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. ICLR 2015 | |
11 | |||||
12 | February 6 | Abhinav | Introduction to Deep Learning-2 | • A. Mahendran, A. Vedaldi. Understanding Deep Image Representations by Inverting Them. CVPR 2015 • M.D. Zeiler, R. Fergus. Visualizing and Understanding Convolutional Networks. ECCV 2014 | |
13 | |||||
14 | February 8 | Abhinav | Introduction to Deep Learning-3 | • He, Kaiming, et al. "Deep residual learning for image recognition." arXiv preprint arXiv:1512.03385 (2015). | |
15 | |||||
16 | February 13 | Senthil | Introduction to Caffe | • Jia, Yangqing, et al. "Caffe: Convolutional architecture for fast feature embedding." Proceedings of the 22nd ACM international conference on Multimedia. ACM, 2014. | ASSIGNMENT 1 RELEASED |
17 | |||||
18 | February 15 | Lerrel | Introduction to Torch | ||
19 | |||||
20 | February 20 | Senthil+Lerrel | Assignment 1 Q&A | ||
21 | |||||
22 | February 22 | Abhinav | Object Detection | Background • R. Girshick, J. Donahue, T. Darrell, J. Malik. Region-based Convolutional Networks for Accurate Object Detection and Semantic Segmentation. TPAMI 2015 • R. Girshick. Fast R-CNN. ICCV 2015 • S. Ren, K. He, R. Girshick, J. Sun Faster R-CNN. NIPS 2015. Readings (Presented in next class): • Shrivastava, Abhinav, Abhinav Gupta, and Ross Girshick. Training region-based object detectors with online hard example mining. CVPR 2016. • Liu, Wei, et al. SSD: Single Shot MultiBox Detector. arXiv preprint arXiv:1512.02325 (2015). | |
23 | |||||
24 | February 27 | Abhinav | Image Segmentation | Background: J. Long, E. Shelhamer, T. Darrell. Fully Convolutional Networks for Semantic Segmentation. ICCV 2015 B. Hariharan, P. Arbelaez, R. Girshick, J. Malik. Hypercolumns for Object Segmentation and Fine-grained Localization. CVPR 2015 Reading (Presented in next class): Li, Ke, Bharath Hariharan, and Jitendra Malik. "Iterative Instance Segmentation." arXiv preprint arXiv:1511.08498 (2015). Noh, Hyeonwoo, Seunghoon Hong, and Bohyung Han. "Learning deconvolution network for semantic segmentation." ICCV 2015. | |
25 | |||||
26 | March 1 | Senthil | Weakly Supervised Object Detection | Background: Oquab, Maxime, et al. "Is object localization for free?-weakly-supervised learning with convolutional neural networks." CVPR. 2015. Cinbis, Ramazan Gokberk, Jakob Verbeek, and Cordelia Schmid. "Multi-fold mil training for weakly supervised object localization." CVPR, 2014. Bilen, Hakan, and Andrea Vedaldi. "Weakly Supervised Deep Detection Networks." arXiv preprint arXiv:1511.02853 (2015). Reading (Presented in next class): Kantorov, Vadim, et al. "ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization." ECCV, 2016. Pathak, Deepak, Philipp Krahenbuhl, and Trevor Darrell. "Constrained convolutional neural networks for weakly supervised segmentation." ICCV 2015. | |
27 | |||||
28 | March 6 | Abhinav | 3D Understanding | Background: D. Eigen, R. Fergus. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture. ICCV 2015. Zoran, D., Isola, P., Krishnan, K., Freeman, W.T. Learning Ordinal Relationships for Mid-Level Vision. ICCV 2015. Girdhar, Rohit, et al. "Learning a Predictable and Generative Vector Representation for Objects." arXiv preprint arXiv:1603.08637 (2016). Readings (Presented in next class): Wu, Jiajun, et al. "Single image 3d interpreter network." ECCV 2016. Zhou, Tinghui, et al. "View synthesis by appearance flow." ECCV 2016. | ASSIGNMENT 1 DUE, ASSIGNMENT 2 RELEASED |
29 | |||||
30 | March 8 | ???? | Human Pose Estimation | Background: Tompson, Jonathan, et al. "Efficient object localization using convolutional networks." CVPR. 2015. Newell, Alejandro, Kaiyu Yang, and Jia Deng. "Stacked hourglass networks for human pose estimation." arXiv preprint arXiv:1603.06937 (2016). Wei, Shih-En, et al. "Convolutional Pose Machines." arXiv preprint arXiv:1602.00134 (2016). Readings (Presented in next class): Carreira, Joao, et al. "Human pose estimation with iterative error feedback." arXiv preprint arXiv:1507.06550 (2015). Cao, Zhe, et al. "Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields." arXiv preprint arXiv:1611.08050 (2016). | |
31 | |||||
32 | March 13 | Spring Break | Spring Break | Spring Break | |
33 | |||||
34 | March 15 | Spring Break | Spring Break | Spring Break | |
35 | |||||
36 | March 20 | ??? | Action Recognition | Background: Simonyan, Karen, and Andrew Zisserman. "Two-stream convolutional networks for action recognition in videos." Advances in Neural Information Processing Systems. 2014. D. Tran, L. Bourdev, R. Fergus, L. Torresani, M. Paluri. Learning Spatiotemporal Features with 3D Convolutional Networks. ICCV 2015. X. Wang, A. Farhadi, A. Gupta. Actions~Transformations. Arxiv 2015. Reading (Presented in next class): Wang, Limin, et al. "Temporal segment networks: towards good practices for deep action recognition." ECCV, 2016. Gkioxari, Georgia, Ross Girshick, and Jitendra Malik. "Contextual action recognition with r* cnn." ICCV 2015. Feichtenhofer, Christoph, Axel Pinz, and Andrew Zisserman. "Convolutional two-stream network fusion for video action recognition." CVPR 2016. Bilen, Hakan, et al. "Dynamic image networks for action recognition." CVPR 2016. | |
37 | |||||
38 | March 22 | ??? | Training with synthetic models | Background: Su, Hao, et al. "Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views." CVPR 2015. Reading: A. Bansal, B. Russell, A. Gupta. Marr Revisited: 2D-3D Alignment via Surface Normal Prediction. Arxiv 2016 Varol, Gül, et al. "Learning from Synthetic Humans." arXiv preprint arXiv:1701.01370 (2017). | |
39 | |||||
40 | March 27 | Lerrel?? | Introduction to Tensor Flow | Abadi, Martín, et al. "TensorFlow: A system for large-scale machine learning." arXiv preprint arXiv:1605.08695 (2016). | ASSIGNMENT 2 DUE |
41 | |||||
42 | March 29 | Lerrel/Senthil | Mid-Term Presentations | ||
43 | |||||
44 | April 3 | Abhinav Gupta | Self-Supervised Learning | Background: Doersch, Carl, Abhinav Gupta, and Alexei A. Efros. "Unsupervised visual representation learning by context prediction." Proceedings of the IEEE International Conference on Computer Vision. 2015. Wang, Xiaolong, and Abhinav Gupta. "Unsupervised learning of visual representations using videos." Proceedings of the IEEE International Conference on Computer Vision. 2015. Pinto, Lerrel, and Abhinav Gupta. "Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours." Robotics and Automation (ICRA), 2016 IEEE International Conference on. IEEE, 2016. Reading: Jayaraman, Dinesh, and Kristen Grauman. "Learning image representations tied to ego-motion." ICCV 2015. Agrawal, Pulkit, Joao Carreira, and Jitendra Malik. "Learning to see by moving." ICCV 2015. Misra, Ishan, C. Lawrence Zitnick, and Martial Hebert. "Shuffle and learn: unsupervised learning using temporal order verification." ECCV 2016. Noroozi, Mehdi, and Paolo Favaro. "Unsupervised learning of visual representations by solving jigsaw puzzles." ECCV 2016. | |
45 | |||||
46 | April 5 | Abhinav Gupta / Xiaolong Wang | Webly Supervised | Background: X. Chen, A. Shrivastava, A. Gupta. NEIL: Extracting Visual Knowledge from Web Data. ICCV 2013 X. Chen, A. Gupta Webly Supervised Learning of Convolutional Networks. ICCV 2015. Reading in April 12: S. Divvala, A. Farhadi, C. Guestrin.Learning Everything about Anything: Webly-Supervised Visual Concept Learning. CVPR 2014. A. Joulin, L. van der Maaten, A. Jabri, N. Vasilache. Learning Visual Features from Large Weakly Supervised Data. Arxiv 2015. | ASSIGNMENT 3 RELEASED |
47 | |||||
48 | April 10 | Gunnar | Deep Sequential Models | ||
49 | |||||
50 | April 12 | Abhinav | Generative Models | Background: D. Kingma, M. Welling Auto-Encoding Variational Bayes. ICLR 2014 A. Radford, L. Metz, S. Chintala, Unsupervised Representation Learning With Deep Convolutional Generative Adversarial Networks. Arxiv 2015.. Reading on April 17: Wang, Xiaolong, and Abhinav Gupta. "Generative image modeling using style and structure adversarial networks." ECCV, 2016. Reading on April 19: Isola, Phillip, et al. "Image-to-image translation with conditional adversarial networks." arXiv preprint arXiv:1611.07004 (2016). Zhang, Han, et al. "StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks." arXiv preprint arXiv:1612.03242 (2016). | |
51 | |||||
52 | April 17 | Lerrel | Introduction to Deep RL | ||
53 | |||||
54 | April 19 | ??? | Image Captioning & VQA | Background: O. Vinyals, A. Toshev, S. Bengio, D. Erhan. Show and Tell: A Neural Image Caption Generator. CVPR 2015. Antol, Stanislaw, et al. "Vqa: Visual question answering." ICCV. 2015. J. Devlin, S. Gupta, R. Girshick, M. Mitchell, C.L. Zitnick. Exploring Nearest Neighbor Approaches for Image Captioning. Arxiv 2015. B. Zhou, Y. Tian, S. Sukhbaatar, A. Szlam, R. Fergus Simple Baseline for Visual Question Answering Arxiv 2015 Reading: Xu, Kelvin, et al. "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention." ICML. Vol. 14. 2015. Donahue, Jeffrey, et al. "Long-term recurrent convolutional networks for visual recognition and description." CVPR 2015. | ASSIGNMENT 3 DUE (APRIL 21st) |
55 | |||||
56 | April 24 | ??? | Reasoning: Context | Background: D. Hoiem, A.A. Efros, M. Hebert. Putting Objects in Perspective. CVPR 2006. S.K. Divvala, D. Hoiem, J.H. Hays, A.A. Efros, M. Hebert. An Empirical Study of Context in Object Detection. CVPR 2009. Reading (Presented on 04/26): Shrivastava, Abhinav, and Abhinav Gupta. "Contextual Priming and Feedback for Faster R-CNN." ECCV, 2016. Xiaolong Wang, David F. Fouhey, and Abhinav Gupta. Designing Deep Networks for Surface Normal Estimation. Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. New Topics (Presented on 04/26): J. Walker, C. Doersch, A. Gupta, and M. Hebert. An Uncertain Future: Forecasting from Static Images Using Variational Autoencoders. ECCV 2016 Lerer, Adam, Sam Gross, and Rob Fergus. "Learning physical intuition of block towers by example." arXiv preprint arXiv:1603.01312 (2016). | |
57 | |||||
58 | April 26 | ??? | Self-supervised Actions | ||
59 | |||||
60 | May 1 | Project Presentations | |||
61 | |||||
62 | May 3 | Project Presentations | |||
63 | |||||
64 |