1 of 17

Animal Pose Estimation

Presented by:

Devansh Shah, Dinisha Suryawanshi, Fan Li

COMPUTER VISION CS-5243

2 of 17

Introduction

Problem statement:

In today's fast-paced lives, many people own pets but often struggle to provide them with a quality life due to busy work schedules.
Pets frequently experience serious health or behavioral issues that their owners find difficult to understand.
To address this, we propose an idea to monitor and analyze pet activities, helping to better identify and comprehend these issues.

SAMPLE FOOTER TEXT

12/1/2024

3 of 17

Introduction

How it works:

A CCTV camera is used to monitor and record the pet's activities continuously for 24 hours.
The recorded data is analyzed to identify any unusual or abnormal behavior.
Based on the activity log, owners can consult a veterinarian or animal behavior specialist for further assistance.

SAMPLE FOOTER TEXT

12/1/2024

Img ref: https://images.prismic.io/furbo-prismic/ZyhCba8jQArT0JcJ_US_doggettingsmart.png

4 of 17

Introduction

Use Cases:
This system can help detect mental health concerns, physical injuries, and other health problems in pets.
Example 1: If a pet is frequently licking its leg, it may indicate an injury or the presence of ticks and fleas.
Example 2: If a dog is sleeping excessively and showing reduced appetite, it could be a sign of underlying health issues.

SAMPLE FOOTER TEXT

12/1/2024

5 of 17

Dataset

Dataset Used: The project utilizes the Stanford Dogs Dataset for animal pose estimation.
Dataset Details: It contains images of 120 dog breeds, amounting to a total of 20,580 images.
Bounding Box Annotations: Bounding box annotations are available for all images in the dataset.
Key Point Annotations: Key point annotations are provided for 12,538 images, covering 20 key points of a dog's pose.
Key Points Breakdown:

3 key points for each leg.
2 key points for each ear.
2 key points for the tail.
Key points for the nose and jaw.

SAMPLE FOOTER TEXT

12/1/2024

6 of 17

Dataset

There are 24 key points
We use 0 for not visible and 1 for the visible
The data is divided in three set
The train, validation, and test - 6773, 4062, and 1703 images, respectively.

SAMPLE FOOTER TEXT

12/1/2024

Img ref:

https://cdn-ilcabpl.nitrocdn.com/XTpGTaZWYQSxctfMHQPVOQKOsBspWTQi/assets/images/optimized/rev-4cdf608/learnopencv.com/wp-content/uploads/2023/09/animal-pose-estimation-dog-kpts.png

7 of 17

Dataset

Annotation:
In dataset we have annotation in Json format
It contains image path, width, height, box coordinates, is multiple dogs check, and then key points coordinates and visible check
But Yolov8 does not support this format. So, we need to convert this accordingly.

SAMPLE FOOTER TEXT

12/1/2024

8 of 17

Approach

SAMPLE FOOTER TEXT

12/1/2024

dog video

dog pose frames(by finetuned-YOLO(our model1)

dog pose features(by our model2)

QR code(by QR code generator)

9 of 17

YoloV8

YOLOv8 (You Only Look Once Version 8): A state-of-the-art model for object detection and segmentation.
Why Choose YOLOv8 for Animal Pose Estimation?
Accuracy: Excels at identifying and pinpointing intricate animal postures.
Performance: Designed for real-time use, making it ideal for dynamic pose estimation tasks.
Adaptability: Supports seamless integration with custom datasets, catering to various species and use cases.
Key point Detection: Can be fine-tuned to identify specific joints or key points essential for pose estimation.
Training: Utilizes labeled datasets with detailed annotations of animal joints to enable supervised learning.

SAMPLE FOOTER TEXT

12/1/2024

10 of 17

YoloV8 Annotation

One text file per image: Each image in the dataset has a corresponding .txt file with the same name as the image.
One row per object: Every row in the text file represents a single object instance in the image.
Object details per row: Each row contains the following information:

Object class index: A numeric identifier for the object class (e.g., 0 for person, 1 for car).
Object center coordinates: The x and y coordinates of the object's center, normalized between 0 and 1.
Object width and height: Both dimensions are normalized to a range between 0 and 1.

SAMPLE FOOTER TEXT

12/1/2024

11 of 17

Koopman-Operator Basics

SAMPLE FOOTER TEXT

12/1/2024

12 of 17

Koopman-Operator Framework

SAMPLE FOOTER TEXT

12/1/2024

13 of 17

Encoder Structure

SAMPLE FOOTER TEXT

12/1/2024

14 of 17

Inner Neural Network Structure

SAMPLE FOOTER TEXT

12/1/2024

15 of 17

Workflow

SAMPLE FOOTER TEXT

12/1/2024

16 of 17

References

https://learnopencv.com/animal-pose-estimation/
Zhang, S., Wang, Y., Li, A.: Cross-view gait recognition with deep universal linear embeddings. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9095–9104 (2021)
L. Jiang, C. Lee, D. Teotia, and S. Ostadabbas, “Animal pose estimation:A closer look at the state-of-the-art, existing gaps and opportunities, ”Computer Vision and Image Understanding, vol. 222, p. 103483, 2022.
B. Biggs, O. Boyne, J. Charles, A. Fitzgibbon, and R. Cipolla, “Who leftthe dogs out? 3d animal reconstruction with expectation maximization inthe loop,” in Computer Vision–ECCV 2020: 16th European Conference,Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16. Springer,2020, pp. 195–211.
A. Khosla, N. Jayadevaprakash, B. Yao, and F.-F. Li, “Novel datasetfor fine-grained image categorization: Stanford dogs,” in Proc. CVPRworkshop on fine-grained visual categorization (FGVC), vol. 2, no. 1,2011

SAMPLE FOOTER TEXT

12/1/2024

17 of 17

THANK YOU

Q&A

SAMPLE FOOTER TEXT

12/1/2024