A Domain Driven Data Architecture for Improving Data Quality in Distributed Datasets
DOI:
https://doi.org/10.63282/3050-9246.IJETCSIT-V2I3P109Keywords:
Domain-Driven Design, Data Architecture, Data Quality, Distributed Datasets, Data Consistency, Data Validation, Event-Driven Architecture, Data Governance, Data Integrity, Real-Time Data Flow, Data Models, Business Domains, Data Accuracy, Data Reliability, Data Accountability, Data Access, Decision-Making, Data Management, Distributed Systems, Data Discrepancies, Data Integration, Event-Driven SystemsAbstract
Organizations have a hard time keeping track of huge amounts of information that may be spread out over several other systems and many divisions. As datasets grow in size and complexity, it becomes harder to make sure that the quality becomes the same. This is especially true when datasets are spread out across several platforms with different formats and many other architectures. A domain-driven data architecture solves the problem by breaking down more complicated data systems into smaller, easier-to-manage parts, each of which is governed by its own domain. Businesses may improve data management by clearly defining ownership, employing data validation and the transformation protocols, and making sure that data is synchronized across more different systems. These are all things that can be done using domain-driven design (DDD) principles. This strategy makes it easier to keep track of data quality in distant situations in a more structured and unified way. A key part of this architecture is validating and transforming this information in a way that is appropriate to the domain. This makes sure that each dataset meets quality standards before it is processed or shared across these systems. Event-driven architectures are also important for keeping remote datasets in sync. This makes sure that changes in one area are quickly reflected in all relevant systems, which keeps data accurate and more consistent. This domain-centric approach may be used with these modern technologies like data lakes, warehouses and governance platforms to improve data quality management at every stage of the data lifecycle. This paper shows how domain-driven design improves data quality by using actual world examples from a variety of fields. It makes data more reliable, accessible and consistent across enterprises. This strategy helps businesses deal with the inherent difficulties of managing these scattered datasets, making sure that their data is an asset rather than a liability when making these decisions. This method gives you a structured way to handle different datasets, link them to the goals of the business and encourage a data-driven culture that values quality at all levels
Downloads
References
[1] Karkouch, A., Mousannif, H., Al Moatassime, H., and Noel, T. (2016). Data quality in internet of things: A state-of-the-art survey. Journal of Network and Computer Applications, 73, 57-81.
[2] Batini, C., Cappiello, C., Francalanci, C., and Maurino, A. (2009). Methodologies for data quality assessment and improvement. ACM computing surveys (CSUR), 41(3), 1-52.
[3] Talakola, Swetha. “Comprehensive Testing Procedures”. International Journal of AI, BigData, Computational and Management Studies, vol. 2, no. 1, Mar. 2021, pp. 36-46
[4] Lee, K., Weiskopf, N., and Pathak, J. (2018, April). A framework for data quality assessment in clinical research datasets. In AMIA Annual Symposium Proceedings (Vol. 2017, p. 1080).
[5] Arugula, Balkishan. “Change Management in IT: Navigating Organizational Transformation across Continents”. International Journal of AI, BigData, Computational and Management Studies, vol. 2, no. 1, Mar. 2021, pp. 47-56
[6] Patel, Piyushkumar. "Navigating Impairment Testing During the COVID-19 Pandemic: Impact on Asset Valuation." Distributed Learning and Broad Applications in Scientific Research 6 (2020): 858-75.
[7] Gudivada, V., Apon, A., and Ding, J. (2017). Data quality considerations for big data and machine learning: Going beyond data cleaning and transformations. International Journal on Advances in Software, 10(1), 1-20.
[8] Manda, J. K. "IoT Security Frameworks for Telecom Operators: Designing Robust Security Frameworks to Protect IoT Devices and Networks in Telecom Environments." Innovative Computer Sciences Journal 7.1 (2021).
[9] Zheng, Y. (2015). Methodologies for cross-domain data fusion: An overview. IEEE transactions on big data, 1(1), 16-34.
[10] Jani, Parth. “AI-Powered Eligibility Reconciliation for Dual Eligible Members Using AWS Glue”. American Journal of Data Science and Artificial Intelligence Innovations, vol. 1, June 2021, pp. 578-94
[11] Nookala, Guruprasad. "End-to-End Encryption in Data Lakes: Ensuring Security and Compliance." Journal of Computing and Information Technology 1.1 (2021).
[12] Lemmen, C. (2012). A domain model for land administration.
[13] Shaik, Babulal. "Automating Zero-Downtime Deployments in Kubernetes on Amazon EKS." Journal of AI-Assisted Scientific Discovery 1.2 (2021): 355-77.
[14] Wang, R. Y., Storey, V. C., and Firth, C. P. (1995). A framework for analysis of data quality research. IEEE transactions on knowledge and data engineering, 7(4), 623-640.
[15] Mohammad, Abdul Jabbar, and Waheed Mohammad A. Hadi. “Time-Bounded Knowledge Drift Tracker”. International Journal of Artificial Intelligence, Data Science, and Machine Learning, vol. 2, no. 2, June 2021, pp. 62-71
[16] Arugula, Balkishan, and Sudhkar Gade. “Cross-Border Banking Technology Integration: Overcoming Regulatory and Technical Challenges”. International Journal of Emerging Research in Engineering and Technology, vol. 1, no. 1, Mar. 2020, pp. 40-48
[17] Kahn, M. G., Callahan, T. J., Barnard, J., Bauck, A. E., Brown, J., Davidson, B. N., ... and Schilling, L. (2016). A harmonized data quality assessment terminology and framework for the secondary use of electronic health record data. Egems, 4(1).
[18] Manda, Jeevan Kumar. "5G Network Slicing: Use Cases and Security Implications." Available at SSRN 5003611 (2021).
[19] Patel, Piyushkumar. "The Role of Financial Stress Testing During the COVID-19 Crisis: How Banks Ensured Compliance With Basel III." Distributed Learning and Broad Applications in Scientific Research 6 (2020): 789-05.
[20] Khatri, V., and Brown, C. V. (2010). Designing data governance. Communications of the ACM, 53(1), 148-152.
[21] Immaneni, J. (2021). Securing Fintech with DevSecOps: Scaling DevOps with Compliance in Mind. Journal of Big Data and Smart Systems, 2.
[22] Kambatla, K., Kollias, G., Kumar, V., and Grama, A. (2014). Trends in big data analytics. Journal of parallel and distributed computing, 74(7), 2561-2573.
[23] Shaik, Babulal. "Network Isolation Techniques in Multi-Tenant EKS Clusters." Distributed Learning and Broad Applications in Scientific Research 6 (2020).
[24] Veluru, Sai Prasad. “AI-Driven Data Pipelines: Automating ETL Workflows With Kubernetes”. American Journal of Autonomous Systems and Robotics Engineering, vol. 1, Jan. 2021, pp. 449-73
[25] Allam, Hitesh. Exploring the Algorithms for Automatic Image Retrieval Using Sketches. Diss. Missouri Western State University, 2017.
[26] Mendes, P. N., Mühleisen, H., and Bizer, C. (2012, March). Sieve: linked data quality assessment and fusion. In Proceedings of the 2012 joint EDBT/ICDT workshops (pp. 116-123).
[27] Manda, Jeevan Kumar. "Cloud Security Best Practices for Telecom Providers: Developing comprehensive cloud security frameworks and best practices for telecom service delivery and operations, drawing on your cloud security expertise." Available at SSRN 5003526 (2020).
[28] Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A., and Khan, S. U. (2015). The rise of “big data” on cloud computing: Review and open research issues. Information systems, 47, 98-115.
[29] Immaneni, J. (2020). Building MLOps Pipelines in Fintech: Keeping Up with Continuous Machine Learning. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 1(2), 22-32.
[30] Jani, Parth. "Real-Time Patient Encounter Analytics with Azure Databricks during COVID-19 Surge." The Distributed Learning and Broad Applications in Scientific Research 6 (2020): 1083-1115.
[31] Loshin, D. (2001). Enterprise knowledge management: The data quality approach. Morgan Kaufmann.
[32] Veluru, Sai Prasad. “Real-Time Model Feedback Loops: Closing the MLOps Gap with Flink-Based Pipelines”. American Journal of Data Science and Artificial Intelligence Innovations, vol. 1, Feb. 2021, pp. 485-11
[33] Nookala, G., Gade, K. R., Dulam, N., and Thumburu, S. K. R. (2021). Unified Data Architectures: Blending Data Lake, Data Warehouse, and Data Mart Architectures. MZ Computing Journal, 2(2).
[34] Wang, R. Y. (2001). Data quality. Kluwer Academic Pub.
[35] Devillers, R., Bédard, Y., and Jeansoulin, R. (2005). Multidimensional management of geospatial data quality information for its dynamic use within GIS. Photogrammetric Engineering and Remote Sensing, 71(2), 205-215.
[36] Sreejith Sreekandan Nair, Govindarajan Lakshmikanthan (2020). Beyond VPNs: Advanced Security Strategies for the Remote Work Revolution. International Journal of Multidisciplinary Research in Science, Engineering and Technology 3 (5):1283-1294.