1
Low Level / Classical Techniques in Vision
And Image Processing
2
[R|t] =
Explain/draw out a sketch of the pinhole camera model.
3D point -> 2D point (w=1)
2D point -> 3D Ray (w=1, assume undistort is identity)
3
What is (4,2,0)w in turtle coordinates?
T^t_w = Transformation from world -> turtle
Method 1: Find turtle to world transform and then its inverse
Method 2: Directly describe transform to align axes
4
What are 2 types of camera distortion and how can it be corrected?
Left/middle are radial distortion, right is tangential distortion.
Radial
Tangential
5
What is SIFT and what problems does it tackle?
Image Stitching
SfM
Depth Estimation
Correspondences
6
What is ORB-SLAM?
7
What is RANSAC and how can it be used to find correspondences?
Random sample consensus (RANSAC) is an iterative probabilistic method to estimate parameters of a mathematical model from a set of observed data that contains outliers. Here, outliers are not given any influence on the values of the estimates; only inliers.
RANSAC psuedocode:
Input to RANSAC: set of observed data values, way of fitting some kind of parameterized model to the observations, and some hyperparameters
Until convergence:
For example, RANSAC can be applied to linear regression to exclude outliers.
RANSAC can be applied to find correspondences between SIFT features in two images. In this case, the model is a homography (aka projective transformation) that attempts to align two images taken from different perspectives together.
At a high level to use RANSAC for SIFT correspondences, you repeatedly:
In the end, you keep the homography with the smallest number of outliers.
Left is Planar; Right is Approx Planar due to distance; Bottom is captured under cam. Rotation only
8
At a high level, what is optical flow?
In both cases, the goal is to understand the motion of a scene.
9
What is nearest-neighbors interpolation vs bilinear interpolation?
10
Deep Learning Fundamentals
11
In general, how do neural networks operate?
12
How many neurons & parameters does the below neural network have? Explain the Matrix multiplication view, graphical view, algebraic view, and biological view.
13
What is the output depth, padding, stride, and filter size of a conv layer?
14
What makes convnets translationally invariant? What is the “receptive field”?
15
How do you calculate output volume size of a conv layer given input size W, padding P, stride S, filter (kernel) size F, and O filters? How do you maintain the spatial size for filter sizes 3 and 5?
What about a pooling layer of input size W, filter size F and stride S?
Explain regular, pointwise/1x1, transpose/up/de/fractionally strided, atrous/dilated, and depthwise separable convolutions.
16
Regular Convolutions
| 1x1 Convolutions
| De/Up/Transpose/fractionally strided Conv.
| Atrous/Dilated Conv.
| Depthwise Separable Conv. (From MobileNet) For efficiency, this operation decomposes the regular convolution into two stages:
|
Normal (top) vs depthwise (bottom) conv
17
What are grouped convolutions?
18
What are coord convolutions?
19
How does max and average differ? Which is more popular? Why do some dislike pooling altogether?
20
What is global average pooling?
21
What is the softmax function, and what loss function is it usually associated with?
The softmax is often paired with the cross-entropy loss (aka negative log likelihood), which operates on vectors of class probabilities.
22
Draw and describe the following activation functions: sigmoid, ReLU, leaky ReLU, ReLU6, tanh,
Some notes:
Leaky ReLU (usually a is around 0.01)
23
What is gradient descent?
24
At a high level describe SGD + momentum, adagrad, RMSprop, and Adam. Also explain (Multi)StepLR and reduceLROnPlateau.
An additional benefit to note for the last three is that their adaptive nature makes the optimization more robust to learning rate.
Besides these, pytorch also has other schedulers, for example:
In some cases, two optimization schemes can complement each other, even if both adjust learning rates (eg, adam + StepLR scheduler)
G_ii is the sum of squared gradients of theta_i up to time t.
25
What is the intuition behind cosine annealing with warm restarts / warmups?
26
What’s the difference between Adam and AdamW?
27
Describe how weights are usually initialized for neural networks. Can you just set them to 0?
28
Describe some basic regularization techniques (5).
29
What is batchnorm and why does it work? Does it go before/after ReLU? What happens at test time? What are some other considerations?
Implementation Notes:
Several theories on why batchnorm works:
30
Explain/compare layer norm, group norm, and instance norm
31
How does backpropagation work in neural networks?
Overview
Details
Consider the functional representation for a neural network (left). using the chain rule repeatedly for each layer will yield the partial derivatives w.r.t. each parameter. This is shown below.
Notes:
32
Describe how the following functions/gates propagate the upstream gradient during backpropagation: add, max, multiply
Gate | Behavior | Intuition |
Add | Adds all the upstream gradient on its output and distributes that sum equally to all of its inputs. This follows from the fact that the local gradient for the add operation (e.g. x+7) is simply +1.0 | Forwards gradients, unchanged. |
Max | Distributes the gradient (unchanged) to exactly one of its inputs (the input that had the highest value during the forward pass). This is because the local gradient for a max gate is 1.0 for the highest value, and 0.0 for all other values. | Routes the gradient to the largest forward pass input value. |
Multiply | Local gradients are the input values (except switched), and this is multiplied by the gradient on its output during the chain rule. This is regular multiplication if by a scalar. | Multiplies gradient by the other input value(s). |
33
Complete the following computational graph for backpropagation.
34
What is the vanishing/exploding gradients problem?
Residual connections
35
What are some general tips/tricks on tuning these hyperparameters? Learning rate, batch size.
Learning Rate:
Batch Size:
36
At a high level, what are 3 approaches for ML if you only have a small dataset of labeled data and a lot of unlabeled data?
37
What are some sources of randomness/stochasticity in neural networks? Why is this sometimes desired?
38
What’s a dead neuron, how can they be detected, and how can they be prevented?
Leaky ReLU (usually a is around 0.01)
Possibly dead activation map if empty for many data inputs
39
What is the universal approximation theorem, and what are its limitations for practical applications?
40
What is label smoothing?
41
Seminal & Foundational Topics in Deep Learning
42
At a high level, describe the following CNN backbone architectures: AlexNet, VGG, Inception/GoogLeNet, ResNet, DenseNet, EfficientNet.
43
What are the benefits of residual connections in neural networks?
44
What is the triplet loss, and when is it used? How can it be trained effectively?
45
At a high level, explain what “Attention”, and “Self-Attention” is. What is soft vs hard attention?
46
How do Non-Local Neural Networks work?
SA-GAN
47
Neural Networks
Designed for
Sequential Data
(RNNs, LSTMs, Transformers)
48
What can RNNs be used for, and how do they work?
49
What are LSTMs used for and how do they work?
50
In NLP, what are Transformers and how do they work? How do they differ from LSTMs/RNNs?
There are several significant differences between vanilla Transformers and RNNs:
Encoder input preparation:
Decoder is similar to the encoder, with a few changes:
Multi-Head Attention:
51
How do sinusoidal positional embeddings work?
52
How can Transformers be applied to vision? How do they differ from CNNs?
Accuracy
53
Transfer Learning
54
What is transfer learning and when is it useful? How is it related to semi-supervised learning and few-shot learning?
55
Describe how to formally define transfer learning, and state some common settings.
56
What are some common ways to perform transfer learning, and what are some rules of thumb of when to use each of them?
Labels available in target domain (i.e. “supervised transfer learning”):
Labels unavailable in target domain, only available in source domain (i.e. “target unsupervised transfer learning”):
57
Unsupervised & Self-Supervised Learning
58
At a high level, what is unsupervised learning? What are its advantages and disadvantages? How does it compare to Fully-supervised, Reinforcement, semi-supervised, and weakly supervised learning?
59
Give some examples and applications of unsupervised learning.
60
What is self-supervised learning? Explain some common methods based on reconstruction, “common sense”, and automatic labels.
61
What are some approaches to Unsupervised Domain Adaptation?
62
How can SSL be performed using contrastive learning?
63
How does CLIP work?
Contrastive Training
Test Time Zero-Shot Nearest Neighbor Classifier
64
Semi-Supervised Learning
65
At a high level, what is semi-supervised learning, and what are the common approaches/assumptions?
66
Explain the following approaches to semi-supervised learning: GANs, UDA, Consistency regularization, pseudolabeling/self-training.
Also: Mixup, entropy regularization
67
Question goes here
Answer goes here
68
Question goes here
Answer goes here