Incorporating Real-Time Data Pipelines Using Snowflake and dbt

Authors

  • Sarbaree Mishra Program Manager at Molina Healthcare Inc., USA. Author
  • Jeevan Manda Project Manager, Metanoia Solutions Inc, USA. Author
  • Jeevan Manda Project Manager, Metanoia Solutions Inc, USA. Author

DOI:

https://doi.org/10.63282/3050-9246.IJETCSIT-V2I1P108

Keywords:

Real-time data pipelines, Snowflake, dbt, cloud-native architecture, data storage, data transformation, data ingestion, data processing, analytics workflows, data-driven decisions, data quality, real-time analysis, cloud technologies, data collaboration, scalable data workflows, agile reporting

Abstract

Companies are increasingly trying to make these decisions right away by using actual time data pipelines in their operations in a data-driven world. Snowflake, a cloud-based data warehouse, and dbt (data build tool), a transformation tool, are key to this change because they provide scalable and too many efficient ways to handle and analyze huge amounts of information. This article speaks about how these data pipelines that work in the actual time are becoming more and more significant and what Snowflake and dbt do in the latest system. By employing Snowflake's flexible, cloud-native architecture and data transformation tools to manage huge volumes of information, businesses can make their data processing faster, more efficient and easier to access. The research looks at the pros and cons of employing these kinds of technology, such as how they may help save money, improve the growth of a business and make it easier to get to the information. It also talks about other possible problems, such data latency and the problems that come with putting these kinds of systems together. It looks at the best ways to set up actual time data pipelines, such as making sure the information is of high quality, making it easier to scale and speeding it up. The article shows businesses how to improve their data architecture and stresses how these technologies may help with their business intelligence and making these decisions. It also talks about possible future trends in the actual time data processing, such as advances in AI-driven analytics and automation. This in-depth study aims to provide more businesses the knowledge they need to add more successfully Snowflake and debt to their actual time data pipelines, allowing them to stay more competitive in a world that is becoming more data-driven

Downloads

Download data is not yet available.

References

[1] Atwal, H., and Atwal, H. (2020). Dataops technology. Practical DataOps: Delivering Agile Data Science at Scale, 215-247.

[2] Warehouse, C. P. (2001). The Buyers Guide.

[3] Ibragimov, D. (2017). Optimizing Analytical Queries over Semantic Web Sources.

[4] Nookala, G. (2020). Automation of privileged access control as part of enterprise control procedure. Journal of Big Data and Smart Systems, 1(1).

[5] Veluru, Sai Prasad. "Leveraging AI and ML for Automated Incident Resolution in Cloud Infrastructure." International Journal of Artificial Intelligence, Data Science, and Machine Learning 2.2 (2021): 51-61.

[6] Oud, B., Guadalupe-Medina, V., Nijkamp, J. F., de Ridder, D., Pronk, J. T., van Maris, A. J., and Daran, J. M. (2013). Genome duplication and mutations in ACE2 cause multicellular, fast-sedimenting phenotypes in evolved Saccharomyces cerevisiae. Proceedings of the National Academy of Sciences, 110(45), E4223-E4231.

[7] Manda, Jeevan Kumar. "5G Network Slicing: Use Cases and Security Implications." Available at SSRN 5003611 (2021).

[8] Suresh, Bhatt Vihaan. "Building a Unified Data Warehouse for Sales and Customer Analytics." (2016).

[9] Jani, Parth. "Privacy-Preserving AI in Provider Portals: Leveraging Federated Learning in Compliance with HIPAA." The Distributed Learning and Broad Applications in Scientific Research 6 (2020): 1116-1145.

[10] Nookala, Guruprasad. "End-to-End Encryption in Data Lakes: Ensuring Security and Compliance." Journal of Computing and Information Technology 1.1 (2021).

[11] Aidoo, Samuel, et al. "Engineering Robust Health Data Systems: Comparative Analysis of Snowflake, BigQuery, and Redshift in Enhancing ML Model Integrity and Accuracy." (2019).

[12] Immaneni, J. (2020). Building MLOps Pipelines in Fintech: Keeping Up with Continuous Machine Learning. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 1(2), 22-32.

[13] Atwal, Harvinder. "Dataops technology." Practical DataOps: Delivering Agile Data Science at Scale. Berkeley, CA: Apress, 2019. 215-247.

[14] Shaik, Babulal. "Automating Zero-Downtime Deployments in Kubernetes on Amazon EKS." Journal of AI-Assisted Scientific Discovery 1.2 (2021): 355-77.

[15] Allam, Hitesh. Exploring the Algorithms for Automatic Image Retrieval Using Sketches. Diss. Missouri Western State University, 2017.

[16] Raj, Aiswarya, et al. "Modelling data pipelines." 2020 46th Euromicro conference on software engineering and advanced applications (SEAA). IEEE, 2020.

[17] Mohammad, Abdul Jabbar. “Sentiment-Driven Scheduling Optimizer”. International Journal of Emerging Research in Engineering and Technology, vol. 1, no. 2, June 2020, pp. 50-59

[18] Munappy, Aiswarya Raj, Jan Bosch, and Helena Homström Olsson. "Data pipeline management in practice: Challenges and opportunities." International Conference on Product-Focused Software Process Improvement. Cham: Springer International Publishing, 2020.

[19] Shaik, Babulal. "Network Isolation Techniques in Multi-Tenant EKS Clusters." Distributed Learning and Broad Applications in Scientific Research 6 (2020).

[20] Sai Prasad Veluru. “Optimizing Large-Scale Payment Analytics With Apache Spark and Kafka”. JOURNAL OF RECENT TRENDS IN COMPUTER SCIENCE AND ENGINEERING ( JRTCSE), vol. 7, no. 1, Mar. 2019, pp. 146–163

[21] Cottur, Karthik, and Veena Gadad. "Design and development of data pipelines." Int Res J Eng Technol (IRJET) 7 (2020): 2715-2718.

[22] Patel, Piyushkumar. "The Implementation of Pillar Two: Global Minimum Tax and Its Impact on Multinational Financial Reporting." Australian Journal of Machine Learning Research and Applications 1.2 (2021): 227-46.

[23] Manda, Jeevan Kumar. "Securing Remote Work Environments in Telecom: Implementing Robust Cybersecurity Strategies to Secure Remote Workforce Environments in Telecom, Focusing on Data Protection and Secure Access Mechanisms." Focusing on Data Protection and Secure Access Mechanisms (April 04, 2020) (2020).

[24] Pervaiz, Fahad, Aditya Vashistha, and Richard Anderson. "Examining the challenges in development data pipeline." Proceedings of the 2nd ACM SIGCAS Conference on Computing and Sustainable Societies. 2019.

[25] Immaneni, J. (2021). Scaling Machine Learning in Fintech with Kubernetes. International Journal of Digital Innovation, 2(1).

[26] Sicilia, Miguel-Ángel, et al. "Ontologies for data science: On its application to data pipelines." Research Conference on Metadata and Semantics Research. Cham: Springer International Publishing, 2018.

[27] Arugula, Balkishan, and Sudhkar Gade. “Cross-Border Banking Technology Integration: Overcoming Regulatory and Technical Challenges”. International Journal of Emerging Research in Engineering and Technology, vol. 1, no. 1, Mar. 2020, pp. 40-48

[28] Patel, Piyushkumar. "Navigating Impairment Testing During the COVID-19 Pandemic: Impact on Asset Valuation." Distributed Learning and Broad Applications in Scientific Research 6 (2020): 858-75.

[29] Udalski, Andrzej. "The optical gravitational lensing experiment. Real time data analysis systems in the OGLE-III survey." arXiv preprint astro-ph/0401123 (2004).

[30] Manda, Jeevan Kumar. "Cloud Security Best Practices for Telecom Providers: Developing comprehensive cloud security frameworks and best practices for telecom service delivery and operations, drawing on your cloud security expertise." Available at SSRN 5003526 (2020).

[31] Mohammad, Abdul Jabbar. “AI-Augmented Time Theft Detection System”. International Journal of Artificial Intelligence, Data Science, and Machine Learning, vol. 2, no. 3, Oct. 2021, pp. 30-38

[32] Jani, Parth, and Sarbaree Mishra. "Data Mesh in Federally Funded Healthcare Networks." The Distributed Learning and Broad Applications in Scientific Research 6 (2020): 1146-1176.

[33] Ott, Stephan, Herschel Science Centre, and European Space Agency. "The Herschel data processing system-HIPE and pipelines-up and running since the start of the mission." arXiv preprint arXiv:1011.1209 (2010).

[34] Stoianov, Ivan, et al. "PIPENETa wireless sensor network for pipeline monitoring." Proceedings of the 6th international conference on Information processing in sensor networks. 2007. 3

[35] Sreejith Sreekandan Nair, Govindarajan Lakshmikanthan (2020). Beyond VPNs: Advanced Security Strategies for the Remote Work Revolution. International Journal of Multidisciplinary Research in Science, Engineering and Technology 3 (5):1283-1294.

[36] Sreejith Sreekandan Nair, Govindarajan Lakshmikanthan (2020). Beyond VPNs: Advanced Security Strategies for the Remote Work Revolution. International Journal of Multidisciplinary Research in Science, Engineering and Technology 3 (5):1283-1294.

Published

2021-03-30

Issue

Section

Articles

How to Cite

1.
Mishra S, Manda J, Manda J. Incorporating Real-Time Data Pipelines Using Snowflake and dbt. IJETCSIT [Internet]. 2021 Mar. 30 [cited 2025 Oct. 14];2(1):63-7. Available from: https://ijetcsit.org/index.php/ijetcsit/article/view/316

Similar Articles

1-10 of 305

You may also start an advanced similarity search for this article.