1 of 25

Learning Category-Specific Mesh Reconstruction from Image Collections

10/21/25

πŸ†, πŸ‘©πŸΌβ€πŸš€: Hao πŸ§‘πŸ»β€βš–οΈ, πŸ§‘πŸ½β€πŸ’»: Tianyu

2 of 25

πŸ†: Paper Summary from Champion

You are a champion. Your goal is to convey to the classroom why you like this work. Address the following questions – add a slide for each bullet point. You should be fair and not overlook potential weaknesses. You can (and should) acknowledge cons about a paper and still be in favor of it.

  • What is this paper about and what problem does it tackle? Why is the problem important?
  • What is it that you like about the paper?
    • Is it the motivation (see intro section)
    • Is it the positioning among prior work (see related work section)
    • Is it the approach (see method section)
  • Are the experiments sufficient? (see experiments section)
  • What are the limitations?

3 of 25

πŸ†: Motivation

  • 3D annotations are expensive
  • Easy to get class label and 2d annotation

Reconstruction

Recognition

4 of 25

πŸ†: Overall Objective

With one 2d annotated image, infer a mesh representation of the 3D shape and the texture

5 of 25

πŸ†: Method

6 of 25

πŸ†: Method (camera and shape)

  • Camera pose
    • weak-perspective projection, aka no depth difference across object
  • Key points
    • keypoint locations induced by vertices V can be obtained as AΒ·V .
    • Matrix A initialized uniformly, but over the course of training it learns to better associate semantic keypoints with appropriate mesh vertices
  • In summary
    • given an image I of an instance, predict the corresponding camera Ο€ and the shape deformation βˆ†V as (Ο€,βˆ†V ) = f(I)
    • Learn {V mean, A}
    • V = V mean + βˆ†V, key points = AV

7 of 25

πŸ†: Methods (Loss)

  • Keypoint projection loss
  • Mesh-rendered mask loss
  • Sfm obtained cam loss
  • Smoothness
  • Deformation regularization
  • Keypoint association

8 of 25

πŸ†: Method (texture)

  • mean shape is isomorphic to a sphere
    • texture can be represented as an image
    • the values of which get mapped onto the surface via a fixed UV mapping (akin to unrolling a globe into a flat map)

9 of 25

πŸ†: Method (texture)

10 of 25

πŸ†: Experiment

Dataset

  • CUB-200-2011 dataset
  • Has 6000 training and test images of 200 species of birds
  • Each image is annotated with the bounding box, visibility indicator and locations of 14 semantic keypoints, and the ground truth foreground mask

Encoder

  • ImageNet pretrained ResNet-18
  • Followed by a convolutional layer that downsamples the spatial and the channel dimensions by half

11 of 25

πŸ†: Results

12 of 25

πŸ†: Results

13 of 25

πŸ†: Results

14 of 25

πŸ†: Good Jobs Done

  • Explored the field of performing 3D reconstruction using only monocular 2D image with category-level supervision
  • Mesh representation allow texture switching

15 of 25

πŸ†: Limitations

  • Category-specific
  • Largely reply on predicted mean shape
    • What about objects with more drastic deformations?
  • Assume depth is negligible

16 of 25

πŸ§‘πŸ»β€βš–οΈ: Paper Summary from Critic

You are a critic. Your goal is to showcase weaknesses of the paper. Address the following questions – add a slide for each bullet point. You should be fair, even if negative. Not all the parts of the paper need to have weaknesses; e.g. a paper might have a great positioning in related work or great motivation but weaknesses in the method.

  • What is this paper about and what problem does it tackle? Why is the problem important?
  • What is your critique of the paper?
    • Is it the motivation (see intro section)
    • Is it the positioning among prior work (see related work section)
    • Is it the approach (see method section)
  • Are the experiments sufficient? (see experiments section)
  • What are the limitations?

17 of 25

πŸ§‘πŸ»β€βš–οΈ: Critique of Motivation

  • Could better emphasize practical, high-impact use cases
    • discussing real-world scenarios or applications
  • Not addressing the broader limitations of relying on 2D image annotations that are still required
    • high-quality annotated 2D data is also resource-intensive to create

18 of 25

πŸ§‘πŸ»β€βš–οΈ: Critique of Prior Work

  • Contrasted their mesh-based method against voxel or point-cloud-based methods
    • Mesh models offer memory efficiency and allow for surface-level reasoning
    • Voxel-based methods computationally more expensive
  • Texture prediction
    • limitations of previous texturing techniques
    • limitations of texture reconstruction

19 of 25

πŸ§‘πŸ»β€βš–οΈ: Critique of Approach

  • The reliance on learned category-specific mean shapes limits the generalizability across different object categories
    • May fail when applied to more diverse categories
  • The approach seems to struggle with fine-grained details or objects with reflective textures
  • May limit its ability to handle objects with significant perspective distortions

20 of 25

πŸ§‘πŸ»β€βš–οΈ: Critique of Experiments

  • The method’s ability to generalize to a broader set of categories is not sufficiently tested
    • A category with relatively uniform structures
    • Objects with significant variability in shape
    • Objects with non-rigid parts
  • Doesn't fully capture the quality of the 3D shape predictions
  • No quantitative texture metrics

21 of 25

πŸ‘©πŸΌβ€πŸš€: Paper Summary from Pioneer

You are a pioneer. Your goal is to think how the paper being discussed could be used to accelerate other findings, help in other disciplines (e.g. robotics, science), and be combined with other techniques you have seen to create a novel result worthy of a solid publication.

  • Think of two or three novel applications of the work and present them
  • Tell us how you would go about pursuing these ideas to showcase their efficacy

22 of 25

πŸ‘©πŸΌβ€πŸš€: With this paper, we can…

  • Explore other tasks with a similar approach
    • Object recognition/detection, scene rendering, etc
  • Try to fix the present problems and limitations
    • Other categories
    • Shiny surfaces

23 of 25

πŸ§‘πŸ½β€πŸ’»: Paper Summary from Entrepreneur

You are an entrepreneur. This means you are constantly on the lookout for cool new ideas and to build new products (which will hopefully make profit!). Your goal is to think how the paper being discussed could be used to build a new product – remember the product does not have to be β€œnovel” but it should have high chances of working well and robustly.

  • Think of one or two products derived from the work
  • Tell us how you would go about building a demo to showcase each idea – this is the demo you would show for your seed or round A funding.

24 of 25

πŸ§‘πŸ½β€πŸ’»: 3D Object Creation for AR E-commerce

  • The product would use the mesh and texture prediction capabilities from this paper
    • Allow online retailers or brands to create realistic 3D models of their products from 2D images
    • These models could then be used in AR applications to showcase
  • Demo
    • Image Upload
    • 3D Model Generation
    • AR Integration
    • Showcase in AR

25 of 25

Thank you!