Real-Time Instance Segmentation Using Lightweight CNN-Transformer Hybrids

Sajud Hamza Elinjulliparambil

doi:10.63282/3050-9246.IJETCSIT-V4I4P117

Authors

Sajud Hamza Elinjulliparambil Pace University. Author

DOI:

https://doi.org/10.63282/3050-9246.IJETCSIT-V4I4P117

Keywords:

Instance segmentation, real-time vision, CNN Transformer hybrid, attention mechanism, model efficiency, autonomous system, robotics, computer vision

Abstract

The problem of instance segmentation is a basic computer vision problem in which localization, classification, and pixel-level localization of individual instances of objects should occur simultaneously. The trade-off between computational performance and segmentation quality is a big challenge in achieving real-time performance in instance segmentation especially in resource-constrained platforms like edge devices and embedded systems. Hybrid architecture Lightweight CNN Transformer hybrid networks have become an exciting solution, which combines the ability to extract local features efficiently of convolutional networks with the ability to model the global context of Transformers. This literature review entails an in-depth examination of the instant segmentation methods in real-time, with the focus on CNN-based pipelines, Transformer models, and their hybrid versions. Lightweight design strategies, such as model compression, efficient attention mechanisms, and backbone optimization, and standard datasets and benchmarking protocols to ensure consistent evaluation are discussed. Lastly, we look at real-life implementations in autonomous driving, robotics, and in industry vision, and determine current challenges and future research opportunities in accuracy, efficiency and robustness in real-time implementation

Downloads

Download data is not yet available.

References

[1] A. M. Hafiz and G. M. Bhat, “A survey on instance segmentation: State of the art,” Int. J. Multimedia Inf. Retrieval, vol. 9, no. 3, pp. 171–189, 2020.

[2] D. Bolya, C. Zhou, F. Xiao, and Y. J. Lee, “YOLACT: Real-time instance segmentation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Seoul, South Korea, 2019, pp. 9157–9166.

[3] R. Yang and Y. Yu, “Artificial convolutional neural network in object detection and semantic segmentation for medical imaging analysis,” Frontiers in Oncology, vol. 11, Art. no. 638182, 2021.

[4] Z.-M. Chen, X.-S. Wei, P. Wang, and Y. Guo, “SST: Spatial and semantic transformers for multi-label image recognition,” IEEE Trans. Image Process., vol. 31, pp. 2570–2583, 2022.

[5] Eshed Ohn-Bar and M. M. Trivedi, “Hand gesture recognition in real time for automotive interfaces: A multimodal vision-based approach and evaluations,” IEEE Trans. Intell. Transp. Syst., vol. 15, no. 6, pp. 2368–2377, 2014.

[6] Y. Li, K. Yang, W. Chen, and Y. Li, “Contextual transformer networks for visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 2, pp. 1489–1500, 2022.

[7] J. Yang, C. Fan, H. Wang, Y. Wang, and B. Chen, “Focal attention for long-range interactions in vision transformers,” in Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 34, 2021, pp. 30008–30022.

[8] S. Mehtab, Deep Neural Networks for Road Scene Perception in Autonomous Vehicles Using LiDARs and Vision Sensors, Ph.D. dissertation, Auckland Univ. of Technology, Auckland, New Zealand, 2022.

[9] P. Luc, C. Couprie, S. Chintala, and J. Verbeek, “Predicting future instance segmentation by forecasting convolutional features,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Munich, Germany, 2018, pp. 584–599.

[10] J. Cao et al., “D2Det: Towards high quality object detection and instance segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Seattle, WA, USA, 2020, pp. 11485–11494.

[11] L.-C. Chen et al., “MaskLab: Instance segmentation by refining object detection with semantic and direction features,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Salt Lake City, UT, USA, 2018, pp. 4013–4022.

[12] S. M. Harrison, L. G. Biesecker, and H. L. Rehm, “Overview of specifications to the ACMG/AMP variant interpretation guidelines,” Curr. Protoc. Hum. Genet., vol. 103, no. 1, Art. no. e93, 2019.

[13] David DeBonis et al., “APE: Metrics for understanding application performance efficiency under power caps,” Sustainable Computing: Informatics and Systems, vol. 34, Art. no. 100702, 2022.

[14] A. M. Shabut et al., “An intelligent mobile-enabled expert system for tuberculosis disease diagnosis in real time,” Expert Syst. Appl., vol. 114, pp. 65–77, 2018.

[15] W. Wang, H. Lin, and J. Wang, “CNN-based lane detection with instance segmentation in edge-cloud computing,” J. Cloud Comput., vol. 9, no. 1, Art. no. 27, 2020.

[16] J. Park and H. Moon, “Lightweight Mask R-CNN for warship detection and segmentation,” IEEE Access, vol. 10, pp. 24936–24944, 2022.

[17] B. Kim et al., “Energy-efficient acceleration of deep neural networks on real-time-constrained embedded edge devices,” IEEE Access, vol. 8, pp. 216259–216270, 2020.

[18] C. Zhou, YOLACT++: Better Real-Time Instance Segmentation, Univ. of California, Davis, CA, USA, Tech. Rep., 2020.

[19] M. Ekman, Learning Deep Learning: Theory and Practice of Neural Networks, Computer Vision, Natural Language Processing, and Transformers Using TensorFlow, Boston, MA, USA: Addison-Wesley, 2021.

[20] S. Khan et al., “Transformers in vision: A survey,” ACM Comput. Surveys, vol. 54, no. 10s, pp. 1–41, 2022.

[21] B. Yang et al., “Context-aware self-attention networks,” in Proc. AAAI Conf. Artif. Intell., vol. 33, no. 1, 2019, pp. 9334–9341.

[22] N. Carion et al., “End-to-end object detection with transformers,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Cham, Switzerland: Springer, 2020, pp. 213–229.

[23] R. Shao, X.-J. Bi, and Z. Chen, “A novel hybrid transformer-CNN architecture for environmental microorganism classification,” PLOS ONE, vol. 17, no. 11, Art. no. e0277557, 2022.

[24] C. Zhang et al., “Transformer and CNN hybrid deep neural network for semantic segmentation of very-high-resolution remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–20, 2022.

[25] H. Caesar, J. Uijlings, and V. Ferrari, “COCO-Stuff: Thing and stuff classes in context,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Salt Lake City, UT, USA, 2018, pp. 1209–1218.

[26] M. Cordts et al., “The Cityscapes dataset for semantic urban scene understanding,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, NV, USA, 2016, pp. 3213–3223.

Real-Time Instance Segmentation Using Lightweight CNN-Transformer Hybrids

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

How to Cite

Similar Articles

callforpaper

Submission

Menu

Latest publications

Information

Reach US

Ethics and Policies

Important Links

Downloads & Indexing

Similar Articles

Advancements in Deep Reinforcement Learning: A Comprehensive Survey on Policy Optimization Techniques

Vulnerability Management in the Age of IoT: Adapting ISO 27001 for Connected Devices in Healthcare

A Hybrid WebSocket-REST Approach for Scalable Real-Time API Design`

AI for Microservice Monitoring & Anomaly Detection

A Deep Learning-Based Security Model for ERP-Integrated IoT in National Defense Manufacturing Environments

AI at the Edge: Transforming Real-Time Data Processing

Data-Governed Autonomous Decisioning: AI Models for Real-Time Optimization of Enterprise Financial Journeys

Blockchain and Machine Learning Integration for Real-Time Fraud Detection in Fintech

AI-Powered Renewable Energy Forecasting: A Hybrid Deep Learning and Physics-Based Model for Solar and Wind Energy Prediction in Smart Grid Applications

Generalist Vision Models for Any-to-Any Image-to-Video Understanding