Serverless Cloud Solutions for Scalable and Efficient AI Model Management

Prudhvi Naayini

doi:10.63282/3050-9246.IJETCSIT-V6I2P102

Authors

Prudhvi Naayini Independent Researcher, USA. Author

DOI:

https://doi.org/10.63282/3050-9246.IJETCSIT-V6I2P102

Keywords:

Serverless Computing, AI Model Management, AWS Lambda, Knative, Scalability, Cloud Architecture, Model Orchestration, Cost Optimization, Kubernetes, Security

Abstract

Managing and deploying AI models at scale presents significant challenges, particularly when balancing scalability, cost-efficiency, and operational simplicity. This paper explores the application of serverless cloud architectures to streamline AI model management and deployment. We leverage key technolo- gies, including AWS Lambda, API Gateway, and Kubernetes- based serverless platforms like AWS EKS with Knative, to propose a fully serverless model lifecycle framework. Our ap- proach introduces innovative strategies such as dynamic resource allocation, intelligent model versioning, and event-driven model orchestration. Architectural diagrams and pseudo-code illustrate the seamless integration of these techniques within a cloud-native environment. Through analytical evaluations and simulations using AWS performance and pricing data, we demonstrate how our serverless solution achieves automatic scaling, reduced operational overhead, and consistent low-latency performance. Furthermore, a comprehensive threat model is incorporated to address security and privacy considerations. Real-world case studies covering domains like real-time analytics, recommenda- tion systems, and anomaly detection highlight the practical ef- fectiveness of our framework. The paper concludes by discussing future research avenues, including serverless training pipelines and advanced orchestration strategies

Downloads

Download data is not yet available.

References

[1] S. Venkataraman, “Ai goes serverless: Are systems ready?”ACM SIGARCH, Aug. 2023. [Online]. Available: https://www.sigarch.org/ ai-goes-serverless-are-systems-ready/

[2] J. Gu, Y. Zhu, P. Wang, M. Chadha, and M. Gerndt, “Fast-gshare: Enabling efficient spatio-temporal gpu sharing in serverless computing for deep learning inference,” in Proceedings of the 52nd International Conference on Parallel Processing, 2023, pp. 635 644. [Online].

[3] Available: https://arxiv.org/abs/2309.00558

[4] AWS Lambda Developer Guide, Best Practices for Working with AWS Lambda Functions, AWS, 2023. [Online].

[5] Available:https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html

[6] M. Yu, Z. Jiang, H. C. Ng, W. Wang, R. Chen, and B. Li, “Gillis: Serving large neural networks in serverless functions with automatic model partitioning,” in Proceedings of IEEE ICDCS, 2021, pp. 138–148. [Online].

[7] Available:https://ieeexplore.ieee.org/document/9546452

[8] Kubeflow Authors, What is KServe?, Kubeflow KServe Documentation, Sep. 2021. [Online]. Available: https://www.kubeflow.org/docs/ external-add-ons/kserve/introduction/

[9] K. Kojs, “A survey of serverless machine learning model inference,”arXiv preprint arXiv:2311.13587, 2023. [Online]. Available: https: //arxiv.org/abs/2311.13587

[10] P. Naayini, P. K. Myakala, and C. Bura, “How ai is reshaping the cybersecurity landscape,” Available at SSRN 5138207, 2025. [Online]. Available: https://www.irejournals.com/paper-details/1707153

[11] Y. Fu, L. Xue, Y. Huang, A.-O. Brabete, D. Ustiugov, Y. Patel, and L. Mai, “Serverlessllm: Low-latency serverless inference for large language models,” in 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24), 2024, pp. 135–153. [Online].

[12] Available: https://arxiv.org/abs/2401.14351

[13] M. Yu, A. Wang, D. Chen, H. Yu, X. Luo, Z. Li, W. Wang, R. Chen, Nie, and H. Yang, “Faaswap: Slo-aware, gpu-efficient serverless inference via model swapping,” in Proceedings of the 2024 IEEE International Conference on Cloud Engineering (IC2E), 2024, pp. 1–12. [Online]. Available: https://arxiv.org/abs/2306.03622

[14] C. McKinnel, “Massively parallel machine learn- ing inference using aws lambda,” McKinnel.me Blog, Apr. 2021. [Online]. Available: https://mckinnel.me/ massively-parallel-machine learning-inference-using-aws-lambda.html

[15] Gallego, U. Odyurt, Y. Cheng, Y. Wang, and Z. Zhao, “Machine learning inference on serverless platforms using model decomposition,” in Proceedings of the IEEE/ACM 16th International Conference on Utility and Cloud Computing, 2024, pp. 1–6. [Online]. Available:https://dl.acm.org/doi/10.1145/3603166.3632535

[16] P. Naayini, P. K. Myakala, C. Bura, A. K. Jonnalagadda, and S. Ka- matala, “Ai-powered assistive technologies for visual impairment,” arXiv preprint arXiv:2503.15494, 2025.

[17] Bura, “Enriq: Enterprise neural retrieval and intelligent querying,” REDAY - Journal of Artificial Intelligence & Computational Science, 2025.

[18] L. Wang, Y. Jiang, and N. Mi, “Advancing serverless computing for scalable ai model inference: Challenges and opportunities,” in Proceedings of the 10th International Workshop on Serverless Computing, 2024, pp. 1–6. [Online]. Available:https://dl.acm.org/doi/10.1145/3702634.3702950

[19] R. Rajkumar, “Designing a serverless recommender in aws,” Medium, Jan. 2021. [Online]. Available: https://d-s-brambila.medium. com/designing-a-serverless-recommender-in-aws-fcf2de9a807e

[20] V. Ishakian, V. Muthusamy, and A. Slominski, “Serving deep learning models in a serverless platform,” in 2018 IEEE International Conference on Cloud Engineering (IC2E), 2018, pp. 257–262. [Online]. Available: https://arxiv.org/abs/1710.08460

[21] AWS Whitepaper, Security Overview of AWS Lambda, AWS, Nov. 2022. [Online]. Available: https://docs.aws.amazon.com/whitepapers/ latest/security-overview-aws-lambda/

[22] J. Duan, S. Qian, D. Yang, H. Hu, J. Cao, and G. Xue, “Mopar: A model partitioning framework for deep learning inference services on serverless platforms,” in Proceedings of the 2024 IEEE International Conference on Cloud Computing (CLOUD), 2024, pp. 1–10. [Online].

[23] Available: https://arxiv.org/abs/2404.02445

[24] S. Kamatala, A. K. Jonnalagadda, and P. Naayini, “Transformers beyond nlp: Expanding horizons in machine learning,” Iconic Research And Engineering Journals, vol. 8, no. 7, 2025.

[25] P. K. Myakala and S. Kamatala, “Scalable decentralized multi-agent fed- erated reinforcement learning: Challenges and advances,” International Journal of Electrical, Electronics and Computers, vol. 8, no. 6, 2023.

[26] Ba¨uerle et al., “Fedless: Secure and scalable federated learning using serverless computing,” arXiv preprint arXiv:2111.03396, 2021.

[27] T. Wang et al., “Apodotiko: Enabling efficient serverless federated learn- ing in heterogeneous environments,” arXiv preprint arXiv:2404.14033, 2024.

[28] Microsoft, “Model training on serverless compute – azure machine learn- ing,” 2023, https://learn.microsoft.com/en-us/azure/machine-learning/ how-to-use-serverless-compute.

[29] G. Cloud, “Serverless machine learning pipelines on google cloud,” 2023, https://cloud.google.com/blog/products/ai-machine-learning/ serverless-machine-learning-pipelines-on-google-cloud.

[30] P. Patel et al., “Expanding the cloud-to-edge continuum to the iot in serverless computing,” Future Generation Computer Systems, vol. 145,pp. 223–234, 2024.

[31] “Aws lambda,” 2024, https://en.wikipedia.org/wiki/AWS Lambda.

[32] “Amazon braket-quantum computing service,” 2024, https://aws. amazon.com/braket/.

[33] S. Kamatala, P. Naayini, and P. K. Myakala, “Mitigating bias in ai: A framework for ethical and fair machine learning models,” Available at SSRN 5138366, 2025. [Online]. Available: https://www.ijrar.org/papers/IJRAR25A2090.pdf

Serverless Cloud Solutions for Scalable and Efficient AI Model Management

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

How to Cite

Similar Articles

callforpaper

Submission

Menu

Latest publications

Information

Reach US

Ethics and Policies

Important Links

Downloads & Indexing

Similar Articles

Kubernetes and AWS Lambda for Serverless Computing: Optimizing Cost and Performance Using Kubernetes in a Hybrid Serverless Model

Multi-Cloud Serverless Computing & FaaS Architectures for Resilient and Cost-Efficient Systems

Cloud-Native Microservices Architectures: Performance, Security, and Cost Optimization Strategies

Edge AI with Kubernetes: Deploying machine learning models at scale

Cloud Migration for Fintech: How Kubernetes Enables Multi-Cloud Success

Comparing AWS Glue vs. Apache Airflow for Data Orchestration: A Comprehensive Performance and Cost Analysis

Hybrid Cloud Approaches for Large-Scale Medicaid Data Engineering Using AWS and Hadoop

The Serverless Revolution in Healthcare: What It Means and How to Get There

Architectural Optimization of Serverless Big Data Pipelines for AI Workloads Using Cloud Functions and Managed Spark on GCP

Optimizing Data Ingestion and Processing: A Study of Snowpipe Streaming and Data Lake Architectures