Object Detection in Pytorch Using Mask R-CNN
Object Detection in Pytorch Using Mask R-CNN
Abstract:- This research paper aims to investigate the idea convolutional neural network created have the regions that are
of object detection in PyTorch employing the most widely categorized into distinct object categories in the second stage.
known object detection and localization algorithm that Faster R-CNN, however, does not offer pixel-level
employs image segmentation techniques and deep learning segmentation; instead, it only offers bounding box information
approach, which is Mask Region-based Convolutional for object localization. Due to this restriction, Mask R-CNN
Neural Network. Mask R-CNN is widely used in many was created [7]; an addition to Faster Region-Based
fields, such as industrial and medical applications, due to its Convolutional Neural Network that creates a supplementary
ability to accurately identify objects and generate branch needed for object segmentation mask generation. The
segmentation masks for each instance. The Mask R-CNN Mask R-CNN technique aims to address the insufficiency of the
algorithm combines the region proposal generation and Faster Region-Based Convolutional Neural Network by
object classification stages of Faster R-CNN with an integrating instance segmentation features. [1]. Implementing
additional branch for pixel-level segmentation. Mask Region-based Convolutional Neural Network therefore
makes it possible to obtain pixel-wise segmentation masks for
Keywords:- Convolutional Neural Network, Object Detection, each object in an image [2]
Pre-trained Model, PyTorch, Object Detection, Image
Preprocessing, Pandas, NumPy, Pretrained Model, Mask B. Using PyTorch for Object Detection
Region-Based Convolutional Neural Network. A popular deep learning library called PyTorch offers an
easy-to-use interface for developing training and object
I. INTRODUCTION detection models. It supports usage of several previously trained
models, and development of various machine learning
The ability to identify and characterize objects in an image algorithms including Mask Region-Based CNN. Using
or video is one of the primary functions of computer vision. PyTorch, researchers can quickly install and configure the Mask
Many applications, including autonomous driving, robotics, R-CNN model for object detection. Additionally, the Mask R-
image understanding, and surveillance systems, depend on CNN implementation in PyTorch enables model customization
accurate object detection. In the past couple of years, computer and fine-tuning using new datasets for particular instance
vision has advanced dramatically, particularly regarding object segmentation tasks. Researchers can easily incorporate the
detection techniques. Object detection algorithms combine the Mask R-CNN algorithm into their object detection pipeline by
tasks of object localization and image classification to identify utilizing PyTorch's capabilities. In order to apply Mask R-CNN
and locate objects within an image or video. These algorithms in PyTorch, researchers must take the subsequent actions:
achieve precise and effective object detection by leveraging
deep learning techniques. The R-CNN family, which includes Design the neural network architecture for Mask R-CNN by
Mask R-CNN, Fast R-CNN, and Faster R-CNN, is one well- combining the networks for feature extraction, region
liked family of object detection algorithms [6].These algorithms proposal, instance detection, and segmentation. This can be
have gained considerable attention and have been widely used achieved by leveraging the power of PyTorch's modular
in various fields due to their superior performance and design, which allows researchers to easily define and
versatility. customize the different components of the Mask Region-
Based CNN architecture.
A. Mask R-CNN: An Extension of Faster R-CNN Prepare the data by loading the dataset and transforming it
Mask R-CNN is an extension of the Faster R-CNN into a format compatible with PyTorch's DataLoader.
algorithm, which has been a significant breakthrough in the Implement the necessary data augmentation techniques to
field of object detection. The idea of region-based convolutional increase the diversity and robustness of the training dataset.
neural networks for object detection was first presented by This can include techniques such as random cropping,
Faster R-CNN [3]. The creation of region proposals and object rotation, and flipping of images to introduce variations in
classification are its two phases. Using a neural network object appearance and enhance the model's generalization.
approach to forecast potential object locations, regions of
interest are put in place in the first stage of Faster R-CNN. The
III. DATA
Resizing The model weights are updated, the gradients are reset to
An image size of 224 served as the basis for the resizing; zero, and the loss and gradients are computed for each batch of
this value was likewise specified in the PyTorch transform sub loaded training data during the thirty training epochs. The
training loss for every epoch is also measured. Our model is
module.
evaluated using the validation dataset; to do this; we switch off
auto grading and put the model in an evaluation mode. The
Normalizing number of accurate validation predictions is determined, along
The picture was next converted into a tensor image after with the computation of the total loss.
being normalized using a stipulated Pytorch Mean and Standard
Deviation of [0.485, 0.456, 0.406] and [0.229, 0.224, 0.225], VI. RESULTS
respectively.
Parameters including accuracy, precision, recall, and F1
IV. METHODS score were examined in the analysis of the obtained results.
After training the model for thirty epochs, the average
There are multiple crucial steps in the suggested Mask R- validation accuracy was 91.2%, while the average training
CNN object detection method in PyTorch. Feature extraction, accuracy was 87.6%.
region-proposal, instance detection, and segmentation networks
are combined to create the first step of the neural network
VII. DISCUSSIONS AND CONCLUSIONS [4]. G. Huang, Z. Liu, L. V. D. Maaten, and K. Q. Weinberger,
‘‘Densely connected convolutional networks,’’ in Proc.
This project describes an object detection model that uses IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul.
Mask R CNN in PyTorch to detect different images of car. 2017, pp. 2261–2269.
Preprocessing methods such as resizing, flipping the horizontal [5]. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.,
axis and normalization were used to optimize the model. The Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich,
model is trained for 30 epochs with an Adam optimizer and a A. (2015). Going deeper with convolutions. In
learning rate of 0.002, with a batch size of 8. The proposed Proceedings of the IEEE Conference on Computer Vision
model demonstrated effectiveness of 91.2% accuracy, 89.6% and Pattern Recognition (CVPR) (pp. 1-9). June 2015.
F1-Score, 90.4% precision, and 88.7% recall. [6]. Thomas, E. A., Gerster, S., Jean, H., & Oates, T.. (2020,
October 26). Computer vision supported pedestrian
In conclusion, there are a number of benefits leveraging tracking: A demonstration on trail bridges in rural
PyTorch's Mask Region-based CNN for object detection and Rwanda.
instance segmentation, including its cutting-edge performance, https://scite.ai/reports/10.1371/journal.pone.0241379
adaptability, and accessibility to pre-trained models for transfer [7]. Su, Peifeng, J. (2022, January 25). New particle formation
learning. The prospects for object detection using Mask R-CNN event detection with Mask R-CNN.
in PyTorch are bright given the ongoing developments in https://scite.ai/reports/10.5194/acp-22-1293-2022
machine learning and computer vision. Researchers can
investigate developments in neural network architectures, such
as adding new backbone networks or attention mechanisms, to
further improve the precision and effectiveness of object
detection. Enhancements can also be achieved by fine-tuning
the model on datasets specific to a given domain and optimizing
the Mask R-CNN hyper-parameters. Future prospects of object
detection using Mask R-CNN in PyTorch are also influenced by
the availability of large-scale training datasets for the network.
Scholars may investigate the utilization of publicly accessible
datasets, like COCO or Pascal VOC, that offer annotated
examples for training and assessment. Additionally, by adding
segmentation masks and bounding box annotations to photos,
researchers can also create their own datasets. This allows them
to customize the training data to meet their unique needs and
enhance the model's performance on their target objects or
scenarios.
REFERENCES