1 of 1

RFP: Representational Collapse & Intrinsic Dimension

Representational Collapse(RC) forces N-dimensional representations to lie on an n-dimensional manifold(n<N). It is seen as a bad thing. Is it really?
Recent results show that lower-dimensional representations generalize better. How low-dimensional can the manifold of representations be?
An ideal lossless encoding (in SSL) could encode data(images) to representations lying at least in an n-dimensional manifold, (n=number of dimensions of the representation that didn’t collapse, which is equal to the Intrinsic Dimension(ID) of the image dataset)
Even a lossy encoding should still give n as the upper bound of the ID of the image dataset(n<ID means information loss).
Questions:

Can we find ID of a dataset via RC in a SSL setup?
How is it related to generalization?

Some references:
Understanding Dimensional Collapse In Contrastive Self-Supervised Learning
Intrinsic Dimension Of Data Representations In Deep Neural Networks
The Intrinsic Dimension of Images and Its Impact on Learning

Singular Value Rank Index

Log of singular values

Fig 1.(red): Log of magnitude vs rank index of the ordered singular values of the covariance matrix of the embedding matrix of ResNet-18(Trained using SimCLR) on CIFAR-10 Test Set. [See the 1st reference]

Requester: Vaisakh M (Email)