Operational Telemetry and Observability in Ingestion Pipelines

Authors

  • Shreyansh Sharma Jersey City, NJ. Author

DOI:

https://doi.org/10.63282/3050-9246.IJETCSIT-V7I1P137

Keywords:

Operational Telemetry, Data Pipeline Observability, Ingestion Pipeline Monitoring, Open Telemetry

Abstract

The rise in reliance of contemporary businesses on data-driven decision-making has placed data ingestion pipelines in mission-critical infrastructure, however, many of them are operationally opaque, providing little insight into their internal behavior and health. The paper is a synthesis of an operational telemetry and observability framework of data ingestion pipelines, which has been developed by systematically reviewing the literature of 18 published in the years 2018-2025. The paper suggests a four-dimensional observability framework including infrastructure observability, the pipeline process observability, data content observability, and end-to-end lineage observability, which adds the three pillars of metrics, logs, and traces to the specific needs of the data-intensive systems. The secondary data collection methodology was used, which included the descriptive statistical aggregation, thematic coding, and cross-study benchmarking based on four levels of observability maturity. The results prove that the observability based on telemetry delivers significant and consistent gains in all industry domains and technology stacks, and highlight the importance of ensuring the successful implementation is based on both the technological implementation and culturally altering the engineering team.

Downloads

Download data is not yet available.

References

[1] M. Kleppmann, Designing data-intensive applications: the big ideas behind reliable, scalable, and maintainable systems. Sebastopol, Ca: O’reilly Media, 2018.

[2] A. Polimeno, “Data governance framework for ML-based, data-intensive distributed systems,” Depositolegale.it, Dec. 2025, [Online]. Available: https://hdl.handle.net/20.500.14242/352541.

[3] A. N. Montanari and L. A. Aguirre, “Observability of Network Systems: A Critical Review of Recent Results,” Journal of Control, Automation and Electrical Systems, vol. 31, no. 6, pp. 1348–1374, Aug. 2020, doi: https://doi.org/10.1007/s40313-020-00633-5.

[4] M. Litoiu et al., “What Can Control Theory Teach Us About Assurances in Self-Adaptive Software Systems?,” Lecture Notes in Computer Science, pp. 90–134, 2017, doi: https://doi.org/10.1007/978-3-319-74183-3_4.

[5] R. Dave, “Implementing MLOps on Edge-Cloud Systems: A New Paradigm for Training at the Edge,” Uwaterloo.ca, Aug. 18, 2023. https://uwspace.uwaterloo.ca/items/bc1f43a4-96dd-44d6-946f-b5cd19660647 (accessed Sep. 25, 2025).

[6] B. Beyer, C. Jones, J. Petoff, and N. R. Murphy, Site reliability engineering : how Google runs production systems. Sebastopol, Ca: Oreilly, 2016.

[7] B. Madupati, “Observability in Microservices Architectures: Leveraging Logging, Metrics, and Distributed Tracing in Large-Scale Systems,” European Journal of Advances in Engineering and Technology, 2023, doi: https://doi.org/10.5281/ZENODO.13951032.

[8] G. Jesus, A. Casimiro, and A. Oliveira, “A Survey on Data Quality for Dependable Monitoring in Wireless Sensor Networks,” Sensors, vol. 17, no. 9, p. 2010, Sep. 2017, doi: https://doi.org/10.3390/s17092010.

[9] S. Niedermaier, F. Koetter, A. Freymann, and S. Wagner, “On observability and monitoring of distributed systems–an industry interview study” In International Conference on Service-Oriented Computing. vol. 11895, pp. 36–52, 2019, doi: https://doi.org/10.1007/978-3-030-33702-5_3.

[10] P. N. Satheesh et al., “Flow-based Anomaly Intrusion Detection using Machine Learning Model with Software Defined Networking for OpenFlow Network,” Microprocessors and Microsystems, p. 103285, Oct. 2020, doi: https://doi.org/10.1016/j.micpro.2020.103285.

[11] A. A. Pol, G. Cerminara, C. Germain, and M. Pierini, “Data Quality Monitoring Anomaly Detection,” Artificial Intelligence for High Energy Physics, pp. 115–149, Feb. 2022, doi: https://doi.org/10.1142/9789811234033_0005.

[12] O. V. Talaver and T. A. Vakaliuk, “Telemetry to solve dynamic analysis of a distributed system,” Journal of edge computing, May 2024, doi: https://doi.org/10.55056/jec.728.

[13] J. Mace, R. Roelke, and R. Fonseca, “Pivot Tracing,” ACM Transactions on Computer Systems, vol. 35, no. 4, pp. 1–28, Dec. 2018, doi: https://doi.org/10.1145/3208104.

[14] F. Silvestri and L. Bellin, “Monitoring at high scale for very heterogeneous distributed systems,” 2024. [Online], Available: https://thesis.unipd.it/retrieve/ac4b6668-c6bb-42c0-a0be-fad721fdd309/Bellin_Leonardo.pdf

[15] J. Anderson, “Methods and Applications of Synthetic Data Generation,” Clemson OPEN, 2021. https://open.clemson.edu/all_dissertations/2917/ (accessed Feb. 20, 2026).

[16] S. Shankar and A. Parameswaran, “Towards Observability for Production Machine Learning Pipelines,” arXiv.org, 2021. https://arxiv.org/abs/2108.13557 (accessed Mar. 22, 2025).

[17] T. Tayor Bukhari, O. Oladimeji, E. David Etim, and J. Oluwagbenga Ajayi, “Advances in End-to-End Pipeline Observability for Data Quality Assurance in Complex Analytics Systems,” International Journal of Advanced Multidisciplinary Research and Studies, vol. 4, no. 4, pp. 1465–1487, Aug. 2024, doi: https://doi.org/10.62225/2583049x.2024.4.4.4949.

[18] C. Napoli, Giorgio De Magistris, C. Ciancarelli, F. Corallo, F. Russo, and D. Nardi, “Exploiting Wavelet Recurrent Neural Networks for satellite telemetry data modeling, prediction and control,” Expert Systems with Applications, vol. 206, pp. 117831–117831, Nov. 2022, doi: https://doi.org/10.1016/j.eswa.2022.117831.

[19] R. Wickham, “Secondary Analysis research,” Journal of the Advanced Practitioner in Oncology, vol. 10, no. 4, pp. 395–400, 2020, doi: https://doi.org/10.6004/jadpro.2019.10.4.7.

[20] Trace context, “Trace Context,” W3.org, Nov. 23, 2021. https://www.w3.org/TR/trace-context/

[21] N. Elias, “Optimizing Distributed Tracing Overhead in a Cloud Environment with OpenTelemetry,” DIVA, 2024. https://www.diva-portal.org/smash/record.jsf?pid=diva2:1867119 (accessed Feb. 23, 2026).

[22] F. Skopik, M. Wurzenberger, and M. Landauer, Smart Log Data Analytics. Springer Nature, 2021. doi: https://doi.org/10.1007/978-3-030-74450-2.

[23] Tamang Bomjan, Prasanna, “ML-Driven Predictive Alerting and Dashboard Development for Cloud-Ops Monitoring,” Theseus.fi, 2025, [Online]. Available: http://www.theseus.fi/handle/10024/905188.

[24] M. Saremi, A. Hezarkhani, S. A. A. S. Mirzabozorg, R. DehghanNiri, A. Shirazy, and A. Shirazi, “Unsupervised Anomaly Detection for Mineral Prospectivity Mapping Using Isolation Forest and Extended Isolation Forest Algorithms,” Minerals, vol. 15, no. 4, p. 411, Apr. 2025, doi: https://doi.org/10.3390/min15040411.

[25] S. Nimmagadda, “Applying AI/ML to Kubernetes Logging and Monitoring in Enhancing Observability Through Intelligent Systems,” European Journal of Computer Science and Information Technology, vol. 13, no. 49, pp. 141–152, Jun. 2025, doi: https://doi.org/10.37745/ejcsit.2013/vol13n49141152.

[26] M. S. Iqbal, S. Khazraeian, and M. Hadi, “A Methodology to Assess the Quality of Travel Time Estimation and Incident Detection Based on Connected Vehicle Data,” Transportation Research Record: Journal of the Transportation Research Board, vol. 2672, no. 42, pp. 203–212, May 2018, doi: https://doi.org/10.1177/0361198118773199.

[27] L. Gray and M. M. Webster, “False alarms and information transmission in grouping animals,” Biological Reviews, vol. 98, no. 3, Jan. 2023, doi: https://doi.org/10.1111/brv.12932.

[28] E. A. Haque, “Systematic Review of Calibration Technologies and their Impact on Safety in Global Critical Infrastructure,” Journal of Sustainable Development and Policy, vol. 03, no. 04, pp. 174–204, Dec. 2024, doi: https://doi.org/10.63125/cznpnr41.

[29] M. Sophocleous, C. Sapsanis, A. G. Andreou, and J. Georgiou, “Trade-offs in Sensor Systems Design: A Tutorial,” IEEE Sensors Journal, vol. 22, no. 11, pp. 10040–10061, Jan. 2022, doi: https://doi.org/10.1109/jsen.2022.3151129.

[30] A. Paul, “Introducing Prometheus with Grafana: Metrics Collection and Monitoring,” Medium, 2020. [Online]. Available: https://levelup.gitconnected.com/introducing-prometheus-with-grafana-metrics-collection-and-monitoring-36ca88ac4332

[31] A. Mahida, “Integrating Observability with DevOps Practices in Financial Services Technologies: A Study on Enhancing Software Development and Operational Resilience,” International Journal of Advanced Computer Science and Applications, vol. 15, no. 7, 2024, doi: https://doi.org/10.14569/ijacsa.2024.0150701.

[32] A. Rafiq, M. Z. Shakir, D. Gray, J. Inglis, and F. Ferguson, “AI and IoT-Driven Monitoring and Visualisation for Optimising MSP Operations in Multi-Tenant Networks: A Modular Approach Using Sensor Data Integration,” Sensors, vol. 25, no. 19, p. 6248, Oct. 2025, doi: https://doi.org/10.3390/s25196248.

[33] Narendra Reddy Sanikommu, “Managing Cardinality in Observability Data: Practical Strategies for Sustainable Monitoring,” Journal of Computer Science and Technology Studies, vol. 7, no. 4, pp. 682–691, May 2025, doi: https://doi.org/10.32996/jcsts.2025.7.4.81.

[34] K. Vinnakota and M. Kolla, “Creating Effective Alerts for Monitoring Distributed Systems,” International Journal of Computer Trends and Technology, vol. 73, no. 5, pp. 172–178, May 2025, doi: https://doi.org/10.14445/22312803/ijctt-v73i5p122.

[35] M. Hasan, A. Iqbal, M. R. U. Islam, A. J. M. I. Rahman, and A. Bosu, “Using a balanced scorecard to identify opportunities to improve code review effectiveness: an industrial experience report,” Empirical Software Engineering, vol. 26, no. 6, Sep. 2021, doi: https://doi.org/10.1007/s10664-021-10038-w.

[36] K. Chu et al., “eInfer: Unlocking Fine-Grained Tracing for Distributed LLM Inference with eBPF,” Proceedings of the 3rd Workshop on eBPF and Kernel Extensions, pp. 76–83, Sep. 2025, doi: https://doi.org/10.1145/3748355.3748372.

[37] R. Mohammed, “The Future of Site Reliability Engineering in Financial Platforms: Ensuring Uptime for Multi-Billion-Dollar Transactions,” International Journal of Emerging Trends in Computer Science and Information Technology, vol. 7, no. 1, pp. 73–86, 2026, doi: https://doi.org/10.63282/3050-9246.ijetcsit-v7i1p110.

[38] A. Aguilar, “Lowering Mean Time to Recovery (MTTR) in Responding to System Downtime or Outages: An Application of Lean Six Sigma Methodology,” 2023. Available: https://ieomsociety.org/proceedings/2023manila/39.pdf

Published

2026-03-04

Issue

Section

Articles

How to Cite

1.
Sharma S. Operational Telemetry and Observability in Ingestion Pipelines. IJETCSIT [Internet]. 2026 Mar. 4 [cited 2026 Apr. 9];7(1):245-53. Available from: https://ijetcsit.org/index.php/ijetcsit/article/view/619

Similar Articles

111-120 of 466

You may also start an advanced similarity search for this article.