Designing Hybrid ETL Pipelines for Multi-Cloud Integration

Authors

  • Chitiz Tayal Senior Director, Data and AI. Author

DOI:

https://doi.org/10.63282/3050-9246.IJETCSIT-V4I4P114

Keywords:

Hybrid ETL pipeline, multi-cloud integration, Cloud data interoperability, Data transformation and warehousing, Python-based simulation, Data performance metrics, Scalability and efficiency, Digital twin optimization, Zero-ETL architecture, Cloud-native data engineering

Abstract

The growing popularity of multi-cloud infrastructures has led to the need for scalable and interoperable ways to integrate data that is distributed across several platforms. In this project, we outline the design and simulation of a multi-cloud integration ETL pipeline using Python. We train using generated data from public clouds such as, AWS, Microsoft Azure and GCP, into a single cloud-agnostic data warehouse with open source libraries like pandas, SQLite3 and matplotlib. The ETL system was architecture as a set of modules, including extraction, transformation, loading and performance monitoring. Scalability and resource usage were tested in three runs using graduated data volumes. The results indicate that the processing times were between 0.01 and 0.02 seconds but used a fairly constant memory size, ranging from 90.98 MB with 32 records to 91.41 MB for parsing of up to 864 records. These results validate the efficiency and stability of the pipeline, as well as its ability to account for multi-source data integration in a low computational burden. The work shows that local simulation of hybrid ETL systems is possible and gives a scalable and reproducible basis for deployment in real-world multi-cloud infrastructures to come. This work adds to the efforts on data interoperability, performance optimization and cloud-native ETL architecture design

Downloads

Download data is not yet available.

References

[1] K. Arul, “Optimizing data pipelines in cloud-based big data ecosystems: A comparative study of modern ETL tools,” International Journal of Engineering and Computer Science, vol. 10, no. 4, 2021.

[2] E. Zdravevski, P. Lameski, A. Dimitrievski, M. Grzegorowski, and C. Apanowicz, “Cluster-size optimization within a cloud-based ETL framework for Big Data,” Proceedings of the 2019 IEEE International Conference on Big Data (BigData 2019), pp. 3754–3763, 2019.

[3] Cost Optimization for Big Data Workloads Based on Dynamic Scheduling and Cluster‑Size Tuning, Marek Grzegorowski, Eftim Zdravevski, Andrzej Janusz, Petre Lameski, Cas Apanowicz & Dominik Slezak, Big Data Research, vol. 25, 100203, 2021.

[4] J. George, “Optimizing hybrid and multi-cloud architectures for real-time data streaming and analytics: Strategies for scalability and integration,” World Journal of Advanced Engineering Technology and Sciences, vol. 7, no. 1, pp. 10–30574, 2022.

[5] C. Haase, T. Röseler, and M. Seidel, “METL: A modern ETL pipeline with a dynamic mapping matrix,” arXiv preprint, arXiv:2203.10289, 2022.

[6] P. Kathiravelu, A. Sharma, H. Galhardas, P. Van Roy, and L. Veiga, “On-demand big data integration: A hybrid ETL approach for reproducible scientific research,” arXiv preprint, arXiv:1804.08985, 2018.

[7] G. Papastefanatos, P. Vassiliadis, A. Simitsis, and Y. Vassiliou, “Metrics for the prediction of evolution impact in ETL ecosystems: A case study,” Journal on Data Semantics, vol. 1, no. 2, pp. 75–97, 2012.

[8] Santosh Kumar Singu, “Real-Time Data Integration: Tools, Techniques, and Best Practices,” ESP Journal of Engineering & Technology Advancements, vol. 1, no. 1, pp. 158–172, 2021.

[9] R. Kumar, “Multi-Cloud and Hybrid Cloud Strategies – Balancing Flexibility, Cost, and Security,” International Journal for Multidisciplinary Research, vol. 3, no. 2, pp. 1–9, Mar.–Apr. 2021.

[10] A. Wojciechowski, “E-ETL: Framework for managing evolving ETL workflows,” Foundations of Computing and Decision Sciences, vol. 38, no. 2, pp. 131–142, 2013

Published

2023-12-30

Issue

Section

Articles

How to Cite

1.
Tayal C. Designing Hybrid ETL Pipelines for Multi-Cloud Integration. IJETCSIT [Internet]. 2023 Dec. 30 [cited 2025 Dec. 10];4(4):129-34. Available from: https://ijetcsit.org/index.php/ijetcsit/article/view/468

Similar Articles

1-10 of 392

You may also start an advanced similarity search for this article.