Serverless Cloud Solutions for Scalable and Efficient AI Model Management

Authors

  • Prudhvi Naayini Independent Researcher, USA. Author

DOI:

https://doi.org/10.63282/3050-9246.IJETCSIT-V6I2P102

Keywords:

Serverless Computing, AI Model Management, AWS Lambda, Knative, Scalability, Cloud Architecture, Model Orchestration, Cost Optimization, Kubernetes, Security

Abstract

Managing and deploying AI models at scale presents significant challenges, particularly when balancing scalability, cost-efficiency, and operational simplicity. This paper explores the application of serverless cloud architectures to streamline AI model management and deployment. We leverage key technolo- gies, including AWS Lambda, API Gateway, and Kubernetes- based serverless platforms like AWS EKS with Knative, to propose a fully serverless model lifecycle framework. Our ap- proach introduces innovative strategies such as dynamic resource allocation, intelligent model versioning, and event-driven model orchestration. Architectural diagrams and pseudo-code illustrate the seamless integration of these techniques within a cloud-native environment. Through analytical evaluations and simulations using AWS performance and pricing data, we demonstrate how our serverless solution achieves automatic scaling, reduced operational overhead, and consistent low-latency performance. Furthermore, a comprehensive threat model is incorporated to address security and privacy considerations. Real-world case studies covering domains like real-time analytics, recommenda- tion systems, and anomaly detection highlight the practical ef- fectiveness of our framework. The paper concludes by discussing future research avenues, including serverless training pipelines and advanced orchestration strategies

Downloads

Download data is not yet available.

References

[1] S. Venkataraman, “Ai goes serverless: Are systems ready?”ACM SIGARCH, Aug. 2023. [Online]. Available: https://www.sigarch.org/ ai-goes-serverless-are-systems-ready/

[2] J. Gu, Y. Zhu, P. Wang, M. Chadha, and M. Gerndt, “Fast-gshare: Enabling efficient spatio-temporal gpu sharing in serverless computing for deep learning inference,” in Proceedings of the 52nd International Conference on Parallel Processing, 2023, pp. 635 644. [Online].

[3] Available: https://arxiv.org/abs/2309.00558

[4] AWS Lambda Developer Guide, Best Practices for Working with AWS Lambda Functions, AWS, 2023. [Online].

[5] Available:https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html

[6] M. Yu, Z. Jiang, H. C. Ng, W. Wang, R. Chen, and B. Li, “Gillis: Serving large neural networks in serverless functions with automatic model partitioning,” in Proceedings of IEEE ICDCS, 2021, pp. 138–148. [Online].

[7] Available:https://ieeexplore.ieee.org/document/9546452

[8] Kubeflow Authors, What is KServe?, Kubeflow KServe Documentation, Sep. 2021. [Online]. Available: https://www.kubeflow.org/docs/ external-add-ons/kserve/introduction/

[9] K. Kojs, “A survey of serverless machine learning model inference,”arXiv preprint arXiv:2311.13587, 2023. [Online]. Available: https: //arxiv.org/abs/2311.13587

[10] P. Naayini, P. K. Myakala, and C. Bura, “How ai is reshaping the cybersecurity landscape,” Available at SSRN 5138207, 2025. [Online]. Available: https://www.irejournals.com/paper-details/1707153

[11] Y. Fu, L. Xue, Y. Huang, A.-O. Brabete, D. Ustiugov, Y. Patel, and L. Mai, “Serverlessllm: Low-latency serverless inference for large language models,” in 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24), 2024, pp. 135–153. [Online].

[12] Available: https://arxiv.org/abs/2401.14351

[13] M. Yu, A. Wang, D. Chen, H. Yu, X. Luo, Z. Li, W. Wang, R. Chen, Nie, and H. Yang, “Faaswap: Slo-aware, gpu-efficient serverless inference via model swapping,” in Proceedings of the 2024 IEEE International Conference on Cloud Engineering (IC2E), 2024, pp. 1–12. [Online]. Available: https://arxiv.org/abs/2306.03622

[14] C. McKinnel, “Massively parallel machine learn- ing inference using aws lambda,” McKinnel.me Blog, Apr. 2021. [Online]. Available: https://mckinnel.me/ massively-parallel-machine learning-inference-using-aws-lambda.html

[15] Gallego, U. Odyurt, Y. Cheng, Y. Wang, and Z. Zhao, “Machine learning inference on serverless platforms using model decomposition,” in Proceedings of the IEEE/ACM 16th International Conference on Utility and Cloud Computing, 2024, pp. 1–6. [Online]. Available:https://dl.acm.org/doi/10.1145/3603166.3632535

[16] P. Naayini, P. K. Myakala, C. Bura, A. K. Jonnalagadda, and S. Ka- matala, “Ai-powered assistive technologies for visual impairment,” arXiv preprint arXiv:2503.15494, 2025.

[17] Bura, “Enriq: Enterprise neural retrieval and intelligent querying,” REDAY - Journal of Artificial Intelligence & Computational Science, 2025.

[18] L. Wang, Y. Jiang, and N. Mi, “Advancing serverless computing for scalable ai model inference: Challenges and opportunities,” in Proceedings of the 10th International Workshop on Serverless Computing, 2024, pp. 1–6. [Online]. Available:https://dl.acm.org/doi/10.1145/3702634.3702950

[19] R. Rajkumar, “Designing a serverless recommender in aws,” Medium, Jan. 2021. [Online]. Available: https://d-s-brambila.medium. com/designing-a-serverless-recommender-in-aws-fcf2de9a807e

[20] V. Ishakian, V. Muthusamy, and A. Slominski, “Serving deep learning models in a serverless platform,” in 2018 IEEE International Conference on Cloud Engineering (IC2E), 2018, pp. 257–262. [Online]. Available: https://arxiv.org/abs/1710.08460

[21] AWS Whitepaper, Security Overview of AWS Lambda, AWS, Nov. 2022. [Online]. Available: https://docs.aws.amazon.com/whitepapers/ latest/security-overview-aws-lambda/

[22] J. Duan, S. Qian, D. Yang, H. Hu, J. Cao, and G. Xue, “Mopar: A model partitioning framework for deep learning inference services on serverless platforms,” in Proceedings of the 2024 IEEE International Conference on Cloud Computing (CLOUD), 2024, pp. 1–10. [Online].

[23] Available: https://arxiv.org/abs/2404.02445

[24] S. Kamatala, A. K. Jonnalagadda, and P. Naayini, “Transformers beyond nlp: Expanding horizons in machine learning,” Iconic Research And Engineering Journals, vol. 8, no. 7, 2025.

[25] P. K. Myakala and S. Kamatala, “Scalable decentralized multi-agent fed- erated reinforcement learning: Challenges and advances,” International Journal of Electrical, Electronics and Computers, vol. 8, no. 6, 2023.

[26] Ba¨uerle et al., “Fedless: Secure and scalable federated learning using serverless computing,” arXiv preprint arXiv:2111.03396, 2021.

[27] T. Wang et al., “Apodotiko: Enabling efficient serverless federated learn- ing in heterogeneous environments,” arXiv preprint arXiv:2404.14033, 2024.

[28] Microsoft, “Model training on serverless compute – azure machine learn- ing,” 2023, https://learn.microsoft.com/en-us/azure/machine-learning/ how-to-use-serverless-compute.

[29] G. Cloud, “Serverless machine learning pipelines on google cloud,” 2023, https://cloud.google.com/blog/products/ai-machine-learning/ serverless-machine-learning-pipelines-on-google-cloud.

[30] P. Patel et al., “Expanding the cloud-to-edge continuum to the iot in serverless computing,” Future Generation Computer Systems, vol. 145,pp. 223–234, 2024.

[31] “Aws lambda,” 2024, https://en.wikipedia.org/wiki/AWS Lambda.

[32] “Amazon braket-quantum computing service,” 2024, https://aws. amazon.com/braket/.

[33] S. Kamatala, P. Naayini, and P. K. Myakala, “Mitigating bias in ai: A framework for ethical and fair machine learning models,” Available at SSRN 5138366, 2025. [Online]. Available: https://www.ijrar.org/papers/IJRAR25A2090.pdf

Published

2025-04-10

Issue

Section

Articles

How to Cite

1.
Naayini P. Serverless Cloud Solutions for Scalable and Efficient AI Model Management. IJETCSIT [Internet]. 2025 Apr. 10 [cited 2025 May 15];6(2):10-2. Available from: https://ijetcsit.org/index.php/ijetcsit/article/view/145

Similar Articles

1-10 of 104

You may also start an advanced similarity search for this article.