Impact of Data Versioning on Longitudinal Analytical Model Performance

Authors

  • Ishwarya Shri Indramathi College, Trichy, India. Author

DOI:

https://doi.org/10.56472/ICCSAIML25-155

Keywords:

Data Versioning, Longitudinal Analysis, Model Performance Drift, Temporal Data, Reproducibility, Predictive Modeling, Data Management, Time-Series, Machine Learning, Data Lineage

Abstract

In data-driven environments where datasets evolve over time, the challenge of maintaining consistent and high-performing analytical models becomes increasingly critical. This paper investigates the impact of data versioning on the performance of longitudinal analytical models. We explore how changes in data over time, captured through systematic versioning, influence model accuracy, stability, and generalizability. Using real-world longitudinal datasets and a comparative modeling framework, we assess various data versioning strategies and their effects on predictive performance. Our findings reveal that integrating data versioning not only enhances reproducibility but also enables more robust handling of performance drift over time. This research offers practical insights for data scientists and engineers aiming to maintain the fidelity of analytical systems in dynamic environments

Downloads

Download data is not yet available.

References

[1] Breck, E., Cai, S., Nielsen, E., Salib, M., & Sculley, D. (2017). The ML test score: A rubric for ML production readiness and technical debt reduction. In Proceedings of SysML Conference.

[2] B. C. C. Marella, G. C. Vegineni, S. Addanki, E. Ellahi, A. K. K and R. Mandal, "A Comparative Analysis of Artificial Intelligence and Business Intelligence Using Big Data Analytics," 2025 First International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT), Bhimtal, Nainital, India, 2025, pp. 1139-1144, doi: 10.1109/CE2CT64011.2025.10939850.

[3] Augmented Reality Modelling based Learning and Investigation of Electronic Components and Its Operation - Sree Lakshmi Vineetha Bitragunta, Muthukumar Paramasivan, Gunnam Kushwanth - IJAIDR Volume 14, Issue 2, July-December 2023, PP-1-9, DOI 10.5281/zenodo.14598805.

[4] Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794).

[5] Marella, B.C.C., & Kodi, D. (2025). “Fraud Resilience: Innovating Enterprise Models for Risk Mitigation”. Journal of Information Systems Engineering and Management, 10(12s), 683–695.

[6] RK Puvvada . “SAP S/4HANA Finance on Cloud: AI-Powered Deployment and Extensibility” - IJSAT-International Journal on Science and …16.1 2025 :1-14.

[7] Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., ... & Dennison, D. (2015). Hidden technical debt in machine learning systems. In Advances in Neural Information Processing Systems (pp. 2503–2511).

[8] Venu Madhav Aragani, Arunkumar Thirunagalingam, “Leveraging Advanced Analytics for Sustainable Success: The Green Data Revolution,” in Driving Business Success Through Eco-Friendly Strategies, IGI Global, USA, pp. 229- 248, 2025.

[9] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., ... & Chintala, S. (2019). PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, 32.

[10] Lakshmi Narasimha Raju Mudunuri, Pronaya Bhattacharya, “Ethical Considerations Balancing Emotion and Autonomy in AI Systems,” in Humanizing Technology With Emotional Intelligence, IGI Global, USA, pp. 443-456, 2025.

[11] Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., & Zhang, G. (2018). Learning under concept drift: A review. IEEE Transactions on Knowledge and Data Engineering, 31(12), 2346–2363.

[12] S. Panyaram, "Connected Cars, Connected Customers: The Role of AI and ML in Automotive Engagement," International Transactions in Artificial Intelligence, vol. 7, no. 7, pp. 1-15, 2023.

[13] Susmith Barigidad. “Edge-Optimized Facial Emotion Recognition: A High-Performance Hybrid Mobilenetv2-Vit Model". IJAIBDCMS [International JournalofAI,BigData,ComputationalandManagement Studies]. 2025 Apr. 3 [cited 2025 Jun. 4]; 6(2):PP. 1-10.

[14] Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., & Stoica, I. (2013). Discretized streams: Fault-tolerant streaming computation at scale. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (pp. 423–438).

[15] Pulivarthy, P. (2023). ML-driven automation optimizes routine tasks like backup and recovery, capacity planning and database provisioning. Excel International Journal of Technology, Engineering and Management, 10(1), 22–31. https://doi.uk.com/7.000101/EIJTEM

[16] Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Computing Surveys (CSUR), 46(4), 1–37.

[17] Praveen Kumar Maroju, "Assessing the Impact of AI and Virtual Reality on Strengthening Cybersecurity Resilience Through Data Techniques," Conference: 3rd International conference on Research in Multidisciplinary Studies Volume: 10, 2024

[18] Kuznetsov, S., & Nivargi, R. (2020). Delta Lake: High-performance ACID table storage over cloud object stores. Data Engineering, 43(1), 45–56.

[19] Mohanarajesh Kommineni (2024) “Investigate Methods for Visualizing the Decision-Making Processes of a Complex AI System, Making Them More Understandable and Trustworthy in financial data analysis” International Transactions in Artificial Intelligence, Pages 1-21

[20] Sato, R., Lin, Y., Shibata, Y., & Ohsuga, A. (2021). Reproducible machine learning with Pachyderm: Data versioning and pipeline orchestration. In IEEE International Conference on Big Data (pp. 3029–3038).

[21] Tolosana-Calasanz, R., Rana, O. F., & Parashar, M. (2022). Data provenance and versioning for trustworthy AI. Future Generation Computer Systems.

[22] Bhagath Chandra Chowdari Marella, “From Silos to Synergy: Delivering Unified Data Insights across Disparate Business Units”, International Journal of Innovative Research in Computer and Communication Engineering, vol.12, no.11, pp. 11993-12003, 2024.

[23] Divya K, “Efficient CI/CD Strategies: Integrating Git with automated testing and deployment”, World Journal of Advanced Research and Reviews: an International ISSN Approved Journal, vol.20, no.2, pp. 1517-1530, 2023.

[24] A. K. K, G. C. Vegineni, C. Suresh, B. C. Chowdari Marella, S. Addanki and P. Chimwal, "Development of Multi Objective Approach for Validation of PID Controller for Buck Converter," 2025 First International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT), Bhimtal, Nainital, India, 2025, pp. 1186-1190, doi: 10.1109/CE2CT64011.2025.10939724.

[25] Pugazhenthi, V. J., Pandy, G., Jeyarajan, B., & Murugan, A. (2025, March). AI-Driven Voice Inputs for Speech Engine Testing in Conversational Systems. In SoutheastCon 2025 (pp. 700-706). IEEE.

[26] Kiran Nittur, Srinivas Chippagiri, Mikhail Zhidko, “Evolving Web Application Development Frameworks: A Survey of Ruby on Rails, Python, and Cloud-Based Architectures”, International Journal of New Media Studies (IJNMS), 7 (1), 28-34, 2020.

[27] Puvvada, R. K. "Optimizing Financial Data Integrity with SAP BTP: The Future of Cloud-Based Financial Solutions." European Journal of Computer Science and Information Technology 13.31 (2025): 101-123.

[28] Designing Of Sepic Pfc Based Plug-In Electric Vehicle Charging Station, Sree Lakshmi Vineetha Bitragunta, International Journal of Core Engineering & Management, Volume-7, Issue-01, 2022, PP-233-242.

[29] Animesh Kumar, “AI-Driven Innovations in Modern Cloud Computing”, Computer Science and Engineering, 14(6), 129-134, 2024.

[30] Kirti Vasdev. (2019). “GIS in Disaster Management: Real-Time Mapping and Risk Assessment”. International Journal on Science and Technology, 10(1), 1–8. https://doi.org/10.5281/zenodo.14288561

[31] Noor, S., Awan, H.H., Hashmi, A.S. et al. “Optimizing performance of parallel computing platforms for large-scale genome data analysis”. Computing 107, 86 (2025). https://doi.org/10.1007/s00607-025-01441-y.

[32] Venkata Nagendra Kumar Kundavaram, Venkata Krishna Reddy Kovvuri, Krishna Prasanth Brahmaji Kanagarla. Data Quality Evaluation Framework For High-Volume Database Systems. International Journal of Engineering Development and Research.(2025)13(3), 209-218.

[33] Venkata SK Settibathini. Optimizing Cash Flow Management with SAP Intelligent Robotic Process Automation (IRPA). Transactions on Latest Trends in Artificial Intelligence, 2023/11, 4(4), PP 1-21, https://www.ijsdcs.com/index.php/TLAI/article/view/469/189

Published

2025-05-18

How to Cite

1.
Ishwarya. Impact of Data Versioning on Longitudinal Analytical Model Performance. IJETCSIT [Internet]. 2025 May 18 [cited 2025 Sep. 13];:482-91. Available from: https://ijetcsit.org/index.php/ijetcsit/article/view/289

Similar Articles

1-10 of 250

You may also start an advanced similarity search for this article.