1 of 25

Learning Category-Specific Mesh Reconstruction from Image Collections

10/21/25

🏆, 👩🏼‍🚀: Hao 🧑🏻‍⚖️, 🧑🏽‍💻: Tianyu

2 of 25

🏆: Paper Summary from Champion

You are a champion. Your goal is to convey to the classroom why you like this work. Address the following questions – add a slide for each bullet point. You should be fair and not overlook potential weaknesses. You can (and should) acknowledge cons about a paper and still be in favor of it.

What is this paper about and what problem does it tackle? Why is the problem important?
What is it that you like about the paper?

Is it the motivation (see intro section)
Is it the positioning among prior work (see related work section)
Is it the approach (see method section)

Are the experiments sufficient? (see experiments section)
What are the limitations?

3 of 25

🏆: Motivation

3D annotations are expensive
Easy to get class label and 2d annotation

Reconstruction

Recognition

4 of 25

🏆: Overall Objective

With one 2d annotated image, infer a mesh representation of the 3D shape and the texture

5 of 25

🏆: Method

6 of 25

🏆: Method (camera and shape)

Camera pose

weak-perspective projection, aka no depth difference across object

Key points

keypoint locations induced by vertices V can be obtained as A·V .
Matrix A initialized uniformly, but over the course of training it learns to better associate semantic keypoints with appropriate mesh vertices

In summary

given an image I of an instance, predict the corresponding camera π and the shape deformation ∆V as (π,∆V ) = f(I)
Learn {V mean, A}
V = V mean + ∆V, key points = AV

7 of 25

🏆: Methods (Loss)

Keypoint projection loss
Mesh-rendered mask loss
Sfm obtained cam loss
Smoothness
Deformation regularization
Keypoint association

8 of 25

🏆: Method (texture)

mean shape is isomorphic to a sphere

texture can be represented as an image
the values of which get mapped onto the surface via a fixed UV mapping (akin to unrolling a globe into a flat map)

9 of 25

🏆: Method (texture)

10 of 25

🏆: Experiment

Dataset

CUB-200-2011 dataset
Has 6000 training and test images of 200 species of birds
Each image is annotated with the bounding box, visibility indicator and locations of 14 semantic keypoints, and the ground truth foreground mask

Encoder

ImageNet pretrained ResNet-18
Followed by a convolutional layer that downsamples the spatial and the channel dimensions by half

11 of 25

🏆: Results

12 of 25

🏆: Results

13 of 25

🏆: Results

14 of 25

🏆: Good Jobs Done

Explored the field of performing 3D reconstruction using only monocular 2D image with category-level supervision
Mesh representation allow texture switching

15 of 25

🏆: Limitations

Category-specific
Largely reply on predicted mean shape

What about objects with more drastic deformations?

Assume depth is negligible

16 of 25

🧑🏻‍⚖️: Paper Summary from Critic

You are a critic. Your goal is to showcase weaknesses of the paper. Address the following questions – add a slide for each bullet point. You should be fair, even if negative. Not all the parts of the paper need to have weaknesses; e.g. a paper might have a great positioning in related work or great motivation but weaknesses in the method.

What is this paper about and what problem does it tackle? Why is the problem important?
What is your critique of the paper?

Is it the motivation (see intro section)
Is it the positioning among prior work (see related work section)
Is it the approach (see method section)

Are the experiments sufficient? (see experiments section)
What are the limitations?

17 of 25

🧑🏻‍⚖️: Critique of Motivation

Could better emphasize practical, high-impact use cases

discussing real-world scenarios or applications

Not addressing the broader limitations of relying on 2D image annotations that are still required

high-quality annotated 2D data is also resource-intensive to create

18 of 25

🧑🏻‍⚖️: Critique of Prior Work

Contrasted their mesh-based method against voxel or point-cloud-based methods

Mesh models offer memory efficiency and allow for surface-level reasoning
Voxel-based methods computationally more expensive

Texture prediction

limitations of previous texturing techniques
limitations of texture reconstruction

19 of 25

🧑🏻‍⚖️: Critique of Approach

The reliance on learned category-specific mean shapes limits the generalizability across different object categories

May fail when applied to more diverse categories

The approach seems to struggle with fine-grained details or objects with reflective textures
May limit its ability to handle objects with significant perspective distortions

20 of 25

🧑🏻‍⚖️: Critique of Experiments

The method’s ability to generalize to a broader set of categories is not sufficiently tested

A category with relatively uniform structures
Objects with significant variability in shape
Objects with non-rigid parts

Doesn't fully capture the quality of the 3D shape predictions
No quantitative texture metrics

21 of 25

👩🏼‍🚀: Paper Summary from Pioneer

You are a pioneer. Your goal is to think how the paper being discussed could be used to accelerate other findings, help in other disciplines (e.g. robotics, science), and be combined with other techniques you have seen to create a novel result worthy of a solid publication.

Think of two or three novel applications of the work and present them
Tell us how you would go about pursuing these ideas to showcase their efficacy

22 of 25

👩🏼‍🚀: With this paper, we can…

Explore other tasks with a similar approach

Object recognition/detection, scene rendering, etc

Try to fix the present problems and limitations

Other categories
Shiny surfaces

23 of 25

🧑🏽‍💻: Paper Summary from Entrepreneur

You are an entrepreneur. This means you are constantly on the lookout for cool new ideas and to build new products (which will hopefully make profit!). Your goal is to think how the paper being discussed could be used to build a new product – remember the product does not have to be “novel” but it should have high chances of working well and robustly.

Think of one or two products derived from the work
Tell us how you would go about building a demo to showcase each idea – this is the demo you would show for your seed or round A funding.

24 of 25

🧑🏽‍💻: 3D Object Creation for AR E-commerce

The product would use the mesh and texture prediction capabilities from this paper

Allow online retailers or brands to create realistic 3D models of their products from 2D images
These models could then be used in AR applications to showcase

Demo

Image Upload
3D Model Generation
AR Integration
Showcase in AR

1 of 25

2 of 25

3 of 25

4 of 25

5 of 25

6 of 25

7 of 25

8 of 25

9 of 25

10 of 25

11 of 25

12 of 25

13 of 25

14 of 25

15 of 25

16 of 25

17 of 25

18 of 25

19 of 25

20 of 25

21 of 25

22 of 25

23 of 25

24 of 25

25 of 25