Building Footprint Extraction Using Convolution Neural Network
ABHINANDHAN VELAGAPUDI
LUDDY SCHOOL OF INFORMATICS AND COMPUTING
Introduction
SECTION 1
Introduction
A building footprint is the border of a building drawn along the exterior walls, to create a polygon, representing the total area of the building.
These are the typical sources to generate building footprints and are usually extracted from manual digitizing from high-resolution satellite imagery.
Problem Statement
Manual extraction of building footprints is time-consuming, labor-intensive, and prone to human errors, especially in large-scale urban areas.
Traditional automated methods, such as thresholding and edge detection, often struggle with complex urban landscapes, occlusions, and varying lighting conditions, leading to suboptimal results.
Need for a Deep Learning Solution
Deep learning techniques, particularly convolutional neural networks (CNNs), have shown remarkable success in image segmentation tasks, including building footprint extraction. By leveraging large-scale datasets and learning complex spatial patterns, deep learning models can significantly improve the accuracy and efficiency of building footprint extraction from satellite imagery.
Aim and Objectives
Aim:
The aim of the project is to develop and evaluate a deep learning model for building footprint extraction from aerial imagery, focusing on the Massachusetts Buildings Dataset.
Objectives:
1. Develop a deep learning model tailored for building footprint extraction, leveraging architectures such as U-Net or its variants.
2. Model Training and Optimization:
- Train the developed model using the training set of aerial images and corresponding building footprints.
- Optimize model hyperparameters, such as learning rate and batch size, to maximize performance.
3. Evaluation Metrics:
- Evaluate the performance of the trained model using standard metrics like Intersection over Union (IoU), accuracy, precision, recall, and F1-score.
4. Comparison with Other Models:
- Implement and compare the developed model with existing architectures such as U-Net++, DeepLabV3, and Feature Pyramid Networks (FPN).
- Evaluate the performance of each model variant using the same evaluation metrics and dataset splits.
Dataset Overview
SECTION 2
Data
Number of Images:151 aerial images of the Boston area
Image Size: Each image is 1500 × 1500 pixels
Coverage: 2.25 square kilometers per image, totaling approximately
- 340 square kilometers
Data Split:
- Training Set: 137 images
- Test Set: 10 images
- Validation Set: 4 images
The mask dataset is an 8-bit image,
0 – No building present,
255 - Building is present
Model Architecture
SECTION 3
What is U-Net?
Architecture Overview:
Encoder:
What is U-Net?
Decoder:
Skip Connections:
Applications:
Deployed U-Net Model
Consists of four DownBlocks and each DownBlock:
Comprises four UpBlocks and each UpBlock:
Why U-Net for Semantic Segmentation?
Semantic segmentation: Precisely label each pixel in an image with its corresponding class.
Conclusion
�
Model Execution
SECTION 4
Dataset Sample Visualization
This a sample image from the dataset along with its ground truth segmentation mask and one-hot encoded representation. It provides a comprehensive view of the input data used for training the U-Net model, aiding in understanding the segmentation task and evaluating the model's performance.
Interpretation of Evaluation Metrics
Dice Loss: 0.1431
IoU Score: 0.8134
Precision: 0.8612
Recall: 0.9344
Accuracy: 0.8966
F1 Score: 0.5826
Interpretation
Graphical Interpretation of Evaluation Metrics
Prediction results on Test set
Model – U-Net
Few more predicted segment masks generated by U-Net
Predicted segment mask generated by DeepLabV3 and U-Net++
Comparison & Results
SECTION 5
Models used for comparison:
U-Net++:
U-Net++ is an extension of the original U-Net architecture, featuring dense skip connections and nested U-Net blocks. It aims to enhance feature propagation and capture more context information, leading to improved segmentation performance.
DeepLabV3:
DeepLabV3 is a deep learning model designed for semantic segmentation tasks. It employs atrous convolution to effectively capture multi-scale information, allowing it to achieve high-resolution segmentation results.
FPN (Feature Pyramid Network):
FPN is a convolutional neural network architecture designed to extract features at multiple scales. It uses a top-down architecture with lateral connections to build a feature pyramid from a single input image, enabling effective object detection and segmentation across different scales.
Model Comparison
Model | F1 Score | IoU Score | Dice Loss | Precession | Accuracy | Recall |
Unet | 0.5826 | 0.8134 | 0.1431 | 0.8612 | 0.8966 | 0.9344 |
Unet++ | 0.5825 | 0.8233 | 0.1494 | 0.8800 | 0.9045 | 0.9260 |
DeepLabV3 | 0.5830 | 0.7734 | 0.1550 | 0.8331 | 0.8710 | 0.9133 |
FPN | 0.5826 | 0.6669 | 0.2060 | 0.7762 | 0.7994 | 0.8139 |
The evaluation results on the test data reveal the performance of different segmentation models. Across the models tested, U-Net++ achieved the highest mean IoU score of 0.8233, indicating better pixel-level accuracy in segmentation. However, U-Net had the highest mean recall of 0.9344, suggesting its effectiveness in capturing true positive instances.
Conclusion:
Scope For Further Research
1. Utilization of PAN Images:
2. PAN Sharpening Techniques:
3. Hybrid Models:
References
Thank You