Automated Data Mapping and Schema Matching For Improving Data Quality in Master Data Management.

Authors

  • Sarbaree Mishra Program Manager at Molina Healthcare Inc., USA. Author
  • Sairamesh Konidala Vice President, JP Morgan and Chase, USA. Author

DOI:

https://doi.org/10.63282/3050-9246.IJETCSIT-V4I3P109

Keywords:

Automated Data Mapping, Schema Matching, Data Quality, Master Data Management, Machine Learning, Data Integration, Data Consistency, Data Transformation, Data Governance, Data Synchronization, Data Accuracy, Data Unification, Data Modeling, Data Mapping Algorithms, Data Alignment, Data Enrichment, Data Validation, Metadata Management, Data Standardization, Data Profiling, Data Cleansing, Real-time Data Processing, Data Redundancy Reduction, Data Automation, Data Harmonization, Data Integrity, Data Optimization, Data Migration, Cross-system Data Integration

Abstract

Data quality is the foundation that helps to ensure an organization's information is accurate, consistent, and reliable, particularly in the main data management (MDM) domain. A major problem that businesses have is that they need to incorporate data sourced from different places, which have their own schema and format, thus causing inconsistency and confusion in the creation of a data view that is unified. Automated data mapping and schema matching are the current solutions for these problems, as they improve the compatibility and the homogeneity of the data structures in various systems. The computer programs and machine learning models come to assistance in this regard because they provide the identification of data field relationships, thus considerably lessening the human labor and mistakes that are usually involved. Through this process organizations gain additional time to focus on other tasks since they can easily and quickly map and merge data deriving from several sources; hence, they can undertake the whole integration process with less effort while obtaining results that are more accurate and consistent. Moreover, these technologies facilitate rapid data integration and simultaneously reduce the risk of human mistakes that are particularly critical when working with datasets that are massive and complex. At the same time, automated data mapping and schema matching are the factors that bring the improvement in data quality, as they guarantee that data is consistently organized in all the systems; hence, there is better decision-making and operational efficiency. Eliminating the duplications within data as well as inaccuracies becomes more permissible with these techniques, and therefore, a single and reliable source of truth for the main business information can be maintained with less effort. Those innovations are a complete game-changer regarding the way businesses manage data integration. They definitely keep data as a trusted asset that can support more informed decision-making as well as the realization of business growth

Downloads

Download data is not yet available.

References

[1] Loshin, D. (2010). Master data management. Morgan Kaufmann.

[2] Drumm, C., Schmitt, M., Do, H. H., and Rahm, E. (2007, November). Quickmig: automatic schema matching for data migration projects. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management (pp. 107-116).

[3] Talburt, J. R., and Zhou, Y. (2015). Entity information life cycle for big data: Master data management and information integration. Morgan Kaufmann.

[4] Datla, Lalith Sriram, and Rishi Krishna Thodupunuri. “Designing for Defense: How We Embedded Security Principles into Cloud-Native Web Application Architectures”. International Journal of Emerging Research in Engineering and Technology, vol. 2, no. 4, Dec. 2021, pp. 30-38

[5] Immaneni, J. (2023). Detecting Complex Fraud with Swarm Intelligence and Graph Database Patterns. Journal of Computing and Information Technology, 3.

[6] Jani, Parth, and Sarbaree Mishra. "Governing Data Mesh in HIPAA-Compliant Multi-Tenant Architectures." International Journal of Emerging Research in Engineering and Technology 3.1 (2022): 42-50.

[7] Shahbaz, Q. (2015). Data mapping for data warehouse design. Elsevier.

[8] Allam, Hitesh. “Unifying Operations: SRE and DevOps Collaboration for Global Cloud Deployments”. International Journal of Emerging Research in Engineering and Technology, vol. 4, no. 1, Mar. 2023, pp. 89-98

[9] Abdul Jabbar Mohammad. “Timekeeping Accuracy in Remote and Hybrid Work Environments”. American Journal of Cognitive Computing and AI Systems, vol. 6, July 2022, pp. 1-25

[10] Mahanti, R. (2019). Data quality: dimensions, measurement, strategy, management, and governance. Quality Press.

[11] Nookala, G. (2023). Secure multiparty computation (SMC) for privacy-preserving data analysis. Journal of Big Data and Smart Systems, 4(1).

[12] Zhu, Y., and Yang, J. (2019). Automatic data matching for geospatial models: a new paradigm for geospatial data and models sharing. Annals of GIS, 25(4), 283-298.

[13] Manda, Jeevan Kumar. "Zero Trust Architecture in Telecom: Implementing Zero Trust Architecture Principles to Enhance Network Security and Mitigate Insider Threats in Telecom Operations." Journal of Innovative Technologies 5.1 (2022).

[14] Vasanta Kumar Tarra, and Arun Kumar Mittapelly. “Predictive Analytics for Risk Assessment and Underwriting”. JOURNAL OF RECENT TRENDS IN COMPUTER SCIENCE AND ENGINEERING ( JRTCSE), vol. 10, no. 2, Oct. 2022, pp. 51-70

[15] Geisler, S., Quix, C., Weber, S., and Jarke, M. (2016). Ontology-based data quality management for data streams. Journal of Data and Information Quality (JDIQ), 7(4), 1-34.

[16] Shaik, Babulal, and Jayaram Immaneni. "Enhanced Logging and Monitoring With Custom Metrics in Kubernetes." African Journal of Artificial Intelligence and Sustainable Development 1 (2021): 307-30.

[17] Balkishan Arugula. “AI-Driven Fraud Detection in Digital Banking: Architecture, Implementation, and Results”. European Journal of Quantum Computing and Intelligent Agents, vol. 7, Jan. 2023, pp. 13-41

[18] Nookala, G. (2022). Metadata-Driven Data Models for Self-Service BI Platforms. Journal of Big Data and Smart Systems, 3(1).

[19] Curino, C., Moon, H. J., Deutsch, A., and Zaniolo, C. (2013). Automating the database schema evolution process. The VLDB Journal, 22, 73-98.

[20] Manda, J. K. "Data privacy and GDPR compliance in telecom: ensuring compliance with data privacy regulations like GDPR in telecom data handling and customer information management." MZ Comput J 3.1 (2022).

[21] Morrison, J. L. (1995). Spatial data quality. Elements of spatial data quality, 202, 1-12.

[22] Chaganti, Krishna C. "Advancing AI-Driven Threat Detection in IoT Ecosystems: Addressing Scalability, Resource Constraints, and Real-Time Adaptability.

[23] Patel, Piyushkumar. "Accounting for Climate-Related Contingencies: The Rise of Carbon Credits and Their Financial Reporting Impact." African Journal of Artificial Intelligence and Sustainable Development 3.1 (2023): 490-12.

[24] Datla, Lalith Sriram, and Rishi Krishna Thodupunuri. “Methodological Approach to Agile Development in Startups: Applying Software Engineering Best Practices”. International Journal of AI, BigData, Computational and Management Studies, vol. 2, no. 3, Oct. 2021, pp. 34-45

[25] Talakola, Swetha. “Leverage Microsoft Power BI Reports to Generate Insights and Integrate With the Application”. International Journal of AI, BigData, Computational and Management Studies, vol. 3, no. 2, June 2022, pp. 31-40

[26] Veluru, Sai Prasad. "Leveraging AI and ML for Automated Incident Resolution in Cloud Infrastructure." International Journal of Artificial Intelligence, Data Science, and Machine Learning 2.2 (2021): 51-61.

[27] Gal, A. (2006). Managing uncertainty in schema matching with top-k schema mappings. In Journal on Data Semantics VI (pp. 90-114). Berlin, Heidelberg: Springer Berlin Heidelberg.

[28] Mohammad, Abdul Jabbar. “Predictive Compliance Radar Using Temporal-AI Fusion”. International Journal of AI, BigData, Computational and Management Studies, vol. 4, no. 1, Mar. 2023, pp. 76-87

[29] Shaik, Babulal. "Automating Zero-Downtime Deployments in Kubernetes on Amazon EKS." Journal of AI-Assisted Scientific Discovery 1.2 (2021): 355-77.

[30] Woodall, P., Oberhofer, M., and Borek, A. (2014). A classification of data quality assessment and improvement methods. International Journal of Information Quality 16, 3(4), 298-321.

[31] Jani, Parth. "Predicting Eligibility Gaps in CHIP Using BigQuery ML and Snowflake External Functions." International Journal of Emerging Trends in Computer Science and Information Technology 3.2 (2022): 42-52.

[32] Allam, Hitesh. "Declarative Operations: GitOps in Large-Scale Production Systems." International Journal of Emerging Trends in Computer Science and Information Technology 4.2 (2023): 68-77.

[33] Patel, Piyushkumar, and Deepu Jose. "Preparing for the Phased-Out Full Expensing Provision: Implications for Corporate Capital Investment Decisions." Australian Journal of Machine Learning Research and Applications 3.1 (2023): 699-18

[34] Chaganti, Krishna Chaitanya. "AI-Powered Threat Detection: Enhancing Cybersecurity with Machine Learning." International Journal of Science And Engineering 9.4 (2023): 10-18.

[35] Loshin, D. (2010). The practitioner's guide to data quality improvement. Elsevier.

[36] Balkishan Arugula. “From Monolith to Microservices: A Technical Roadmap for Enterprise Architects”. Journal of Artificial Intelligence and Machine Learning Studies, vol. 7, June 2023, pp. 13-41

[37] Immaneni, J. (2022). Strengthening Fraud Detection with Swarm Intelligence and Graph Analytics. International Journal of Digital Innovation, 3(1).

[38] Ehrlinger, L., Werth, B., and Wöß, W. (2018). Automated continuous data quality measurement with QuaIIe. International Journal on Advances in Software, 11(3), 400-417.

[39] Manda, Jeevan Kumar. "Augmented Reality (AR) Applications in Telecom Maintenance: Utilizing AR Technologies for Remote Maintenance and Troubleshooting in Telecom Infrastructure." Available at SSRN 5136767 (2023).

[40] Dreibelbis, A. (2008). Enterprise master data management: an SOA approach to managing core information. Pearson Education India.

[41] Konstantinou, N., Koehler, M., Abel, E., Civili, C., Neumayr, B., Sallinger, E., ... and Paton, N. W. (2017, May). The VADA architecture for cost-effective data wrangling. In Proceedings of the 2017 ACM International Conference on Management of Data (pp. 1599-1602).

[42] Lakshmikanthan, G. (2022). EdgeChain Health: A Secure Distributed Framework for Next-Generation Telemedicine. International Journal of AI, BigData, Computational and Management Studies, 3(1), 32-36.

Published

2023-10-30

Issue

Section

Articles

How to Cite

1.
Mishra S, Konidala S. Automated Data Mapping and Schema Matching For Improving Data Quality in Master Data Management. IJETCSIT [Internet]. 2023 Oct. 30 [cited 2025 Sep. 12];4(3):80-9. Available from: https://ijetcsit.org/index.php/ijetcsit/article/view/320

Similar Articles

1-10 of 253

You may also start an advanced similarity search for this article.