1 of 17

Animal Pose Estimation

Presented by:

Devansh Shah, Dinisha Suryawanshi, Fan Li

COMPUTER VISION CS-5243

1

2 of 17

Introduction

Problem statement:

  • In today's fast-paced lives, many people own pets but often struggle to provide them with a quality life due to busy work schedules.
  • Pets frequently experience serious health or behavioral issues that their owners find difficult to understand.
  • To address this, we propose an idea to monitor and analyze pet activities, helping to better identify and comprehend these issues.

SAMPLE FOOTER TEXT

2

12/1/2024

3 of 17

Introduction

How it works:

  • A CCTV camera is used to monitor and record the pet's activities continuously for 24 hours.
  • The recorded data is analyzed to identify any unusual or abnormal behavior.
  • Based on the activity log, owners can consult a veterinarian or animal behavior specialist for further assistance.

SAMPLE FOOTER TEXT

3

12/1/2024

Img ref: https://images.prismic.io/furbo-prismic/ZyhCba8jQArT0JcJ_US_doggettingsmart.png

4 of 17

Introduction

  • Use Cases:
  • This system can help detect mental health concerns, physical injuries, and other health problems in pets.
  • Example 1: If a pet is frequently licking its leg, it may indicate an injury or the presence of ticks and fleas.
  • Example 2: If a dog is sleeping excessively and showing reduced appetite, it could be a sign of underlying health issues.

SAMPLE FOOTER TEXT

4

12/1/2024

5 of 17

Dataset

  • Dataset Used: The project utilizes the Stanford Dogs Dataset for animal pose estimation.
  • Dataset Details: It contains images of 120 dog breeds, amounting to a total of 20,580 images.
  • Bounding Box Annotations: Bounding box annotations are available for all images in the dataset.
  • Key Point Annotations: Key point annotations are provided for 12,538 images, covering 20 key points of a dog's pose.
  • Key Points Breakdown:
    • 3 key points for each leg.
    • 2 key points for each ear.
    • 2 key points for the tail.
    • Key points for the nose and jaw.

SAMPLE FOOTER TEXT

5

12/1/2024

6 of 17

Dataset

  • There are 24 key points
  • We use 0 for not visible and 1 for the visible
  • The data is divided in three set
  • The train, validation, and test - 6773, 4062, and 1703 images, respectively.

SAMPLE FOOTER TEXT

6

12/1/2024

Img ref:

https://cdn-ilcabpl.nitrocdn.com/XTpGTaZWYQSxctfMHQPVOQKOsBspWTQi/assets/images/optimized/rev-4cdf608/learnopencv.com/wp-content/uploads/2023/09/animal-pose-estimation-dog-kpts.png

7 of 17

Dataset

  • Annotation:
  • In dataset we have annotation in Json format
  • It contains image path, width, height, box coordinates, is multiple dogs check, and then key points coordinates and visible check
  • But Yolov8 does not support this format. So, we need to convert this accordingly.

SAMPLE FOOTER TEXT

7

12/1/2024

8 of 17

Approach

SAMPLE FOOTER TEXT

8

12/1/2024

dog video

dog pose frames(by finetuned-YOLO(our model1)

dog pose features(by our model2)

QR code(by QR code generator)

9 of 17

YoloV8

  • YOLOv8 (You Only Look Once Version 8): A state-of-the-art model for object detection and segmentation.
  • Why Choose YOLOv8 for Animal Pose Estimation?
  • Accuracy: Excels at identifying and pinpointing intricate animal postures.
  • Performance: Designed for real-time use, making it ideal for dynamic pose estimation tasks.
  • Adaptability: Supports seamless integration with custom datasets, catering to various species and use cases.
  • Key point Detection: Can be fine-tuned to identify specific joints or key points essential for pose estimation.
  • Training: Utilizes labeled datasets with detailed annotations of animal joints to enable supervised learning.

SAMPLE FOOTER TEXT

9

12/1/2024

10 of 17

YoloV8 Annotation

  • One text file per image: Each image in the dataset has a corresponding .txt file with the same name as the image.
  • One row per object: Every row in the text file represents a single object instance in the image.
  • Object details per row: Each row contains the following information:
    • Object class index: A numeric identifier for the object class (e.g., 0 for person, 1 for car).
    • Object center coordinates: The x and y coordinates of the object's center, normalized between 0 and 1.
    • Object width and height: Both dimensions are normalized to a range between 0 and 1.

SAMPLE FOOTER TEXT

10

12/1/2024

11 of 17

Koopman-Operator Basics

SAMPLE FOOTER TEXT

11

12/1/2024

12 of 17

Koopman-Operator Framework

SAMPLE FOOTER TEXT

12

12/1/2024

13 of 17

Encoder Structure

SAMPLE FOOTER TEXT

13

12/1/2024

14 of 17

Inner Neural Network Structure

SAMPLE FOOTER TEXT

14

12/1/2024

15 of 17

Workflow

SAMPLE FOOTER TEXT

15

12/1/2024

16 of 17

References

  • https://learnopencv.com/animal-pose-estimation/
  • Zhang, S., Wang, Y., Li, A.: Cross-view gait recognition with deep universal linear embeddings. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9095–9104 (2021)
  • L. Jiang, C. Lee, D. Teotia, and S. Ostadabbas, “Animal pose estimation:A closer look at the state-of-the-art, existing gaps and opportunities, ”Computer Vision and Image Understanding, vol. 222, p. 103483, 2022.
  • B. Biggs, O. Boyne, J. Charles, A. Fitzgibbon, and R. Cipolla, “Who leftthe dogs out? 3d animal reconstruction with expectation maximization inthe loop,” in Computer Vision–ECCV 2020: 16th European Conference,Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16. Springer,2020, pp. 195–211.
  • A. Khosla, N. Jayadevaprakash, B. Yao, and F.-F. Li, “Novel datasetfor fine-grained image categorization: Stanford dogs,” in Proc. CVPRworkshop on fine-grained visual categorization (FGVC), vol. 2, no. 1,2011

SAMPLE FOOTER TEXT

16

12/1/2024

17 of 17

THANK YOU

Q&A

SAMPLE FOOTER TEXT

17

12/1/2024