Silo, Pool, and Bridge for Multi-Tenant RAG: Measuring Isolation, Noisy-Neighbor Effects, and Cost in SaaS Microservices

Ritesh Kumar

doi:10.63282/3050-9246.IJETCSIT-V7I1P106

Authors

Ritesh Kumar Independent Researcher, Pennsylvania, USA. Author

DOI:

https://doi.org/10.63282/3050-9246.IJETCSIT-V7I1P106

Keywords:

Retrieval-Augmented Generation, Multi-tenancy, Tenant isolation, Enterprise SaaS, Vector databases, Embeddings, Access control, Threat modeling, Microservices, Kubernetes, Noisy neighbor effects

Abstract

Multi-tenant Retrieval-Augmented Generation (RAG) enables enterprise SaaS platforms to ground large language model outputs in customer-specific data while sharing infrastructure across tenants. This deployment model introduces a hard requirement for strict tenant isolation across storage, embedding generation, vector indexing, retrieval orchestration, and response construction, without unacceptable cost or performance variance under mixed workloads. This paper formalizes three isolation patterns for multi-tenant RAG systems, Silo, Pool, and Bridge, and introduces an isolation taxonomy across four planes: data plane, vector plane, orchestration plane, and LLM plane. A threat model specific to multi-tenant RAG is presented, covering cross-tenant embedding leakage through similarity search, membership inference risk, retrieval contamination from incorrect scoping or poisoned content, and metadata inference. A Kubernetes-native reference architecture is specified to implement tenant-aware controls and explicit policy enforcement points across ingestion and retrieval. The paper also defines an evaluation approach for comparing isolation patterns using leakage testing under adversarial retrieval scenarios, mixed-tenant latency measurements (P50 and P95) to quantify noisy-neighbor effects, cost-per-query decomposition, and operational overhead.

Downloads

Download data is not yet available.

References

[1] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, S. Riedel, and D. Kiela, "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks," in Advances in Neural Information Processing Systems (NeurIPS), vol. 33, 2020, pp. 9459–9474. [Online]. Available: https://arxiv.org/abs/2005.11401

[2] V. Karpukhin, B. Oğuz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen, and W.-t. Yih, "Dense Passage Retrieval for Open-Domain Question Answering," in Proc. 2020 Conf. Empirical Methods in Natural Language Processing (EMNLP), 2020, pp. 6769–6781. doi: 10.18653/v1/2020.emnlp-main.550.

[3] K. Shuster, S. Poff, M. Chen, D. Kiela, and J. Weston, "Retrieval Augmentation Reduces Hallucination in Conversation," in Findings of the Association for Computational Linguistics: EMNLP 2021, 2021, pp. 3784–3803. doi: 10.18653/v1/2021.findings-emnlp.320.

[4] Y. A. Malkov and D. A. Yashunin, "Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs," IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 4, pp. 824–836, Apr. 2020. doi: 10.1109/TPAMI.2018.2889473.

[5] S. J. Subramanya, F. Devvrit, H. V. Simhadri, R. Krishnawamy, and R. Kadekodi, "DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node," in Advances in Neural Information Processing Systems (NeurIPS), vol. 32, 2019. [Online]. Available: https://proceedings.neurips.cc/paper/2019/hash/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Abstract.html

[6] S. Gollapudi, N. Karia, V. Sivashankar, R. Krishnaswamy, N. Begwani, S. Raz, Y. Lin, Y. Zhang, N. Mahapatro, P. Srinivasan, A. Singh, and H. Simhadri, "Filtered-DiskANN: Graph Algorithms for Approximate Nearest Neighbor Search with Filters," in Proc. ACM Web Conference (WWW), 2023, pp. 3406–3416. doi: 10.1145/3543507.3583552.

[7] L. Patel, P. Kraft, C. Guestrin, and M. Zaharia, "ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data," Proc. ACM Manag. Data, vol. 2, no. 3, art. 120, pp. 1–27, 2024. doi: 10.1145/3654923.

[8] Y. Jin, Y. Wu, W. Hu, B. M. Maggs, X. Zhang, and D. Zhuo, "Curator: Efficient Indexing for Multi-Tenant Vector Databases," arXiv preprint, 2024. doi: 10.48550/arXiv.2401.07119.

[9] C. D. Weissman and S. Bobrowski, "The Design of the Force.com Multitenant Internet Application Development Platform," in Proc. ACM SIGMOD Int. Conf. Management of Data, 2009, pp. 889–896. doi: 10.1145/1559845.1559942.

[10] M. Anderson, G. Amit, and A. Goldsteen, "Is My Data in Your Retrieval Database? Membership Inference Attacks Against Retrieval Augmented Generation," in Proc. 11th Int. Conf. Information Systems Security and Privacy (ICISSP), 2025, pp. 474–485. doi: 10.5220/0013108300003899.

[11] G. Wang, J. He, H. Li, M. Zhang, and D. Feng, "RAG-leaks: Difficulty-calibrated membership inference attacks on retrieval-augmented generation," Sci. China Inf. Sci., vol. 68, art. no. 160102, 2025. doi: 10.1007/s11432-024-4441-4.

[12] S. Zeng, J. Zhang, P. He, Y. Xing, Y. Su, T. Zhao, and W. Lu, "The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG)," in Findings of the Association for Computational Linguistics: ACL 2024, 2024, pp. 4505–4524. doi: 10.18653/v1/2024.findings-acl.267.

[13] W. Zou, R. Geng, B. Wang, and J. Jia, "PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models," in Proc. 34th USENIX Security Symposium, 2025, pp. 3827–3844. [Online]. Available: https://arxiv.org/abs/2402.07867

[14] A. Arzanipour, R. Behnia, R. Ebrahimi, and K. Dutta, "RAG Security and Privacy: Formalizing the Threat Model and Attack Surface," arXiv preprint, 2025. doi: 10.48550/arXiv.2509.20324.

[15] J. X. Morris, V. Kuleshov, V. Shmatikov, and A. M. Rush, "Text Embeddings Reveal (Almost) As Much As Text," in Proc. 2023 Conf. Empirical Methods in Natural Language Processing (EMNLP), 2023, pp. 12448–12460. doi: 10.18653/v1/2023.emnlp-main.765.

[16] G. Wu, Z. Zhang, W. Wang, Y. Zhang, G. Chen, and M. Yang, "I Know What You Asked: Prompt Leakage via KV-Cache Sharing in Multi-Tenant LLM Serving," in Proc. Network and Distributed System Security Symposium (NDSS), 2025. [Online]. Available: https://www.ndss-symposium.org/ndss-paper/i-know-what-you-asked-prompt-leakage-via-kv-cache-sharing-in-multi-tenant-llm-serving/

[17] W. Kwon, Z. Li, S. Zhuang, Y. Sheng, L. Zheng, C. H. Yu, J. Gonzalez, H. Zhang, and I. Stoica, "Efficient Memory Management for Large Language Model Serving with PagedAttention," in Proc. 29th ACM Symposium on Operating Systems Principles (SOSP), 2023, pp. 611–626. doi: 10.1145/3600006.3613165.

[18] B. Iftekhar, V. Viswanath, S. Guo, Z. Li, S. Agarwal, and A. Akella, "Ensuring Fair LLM Serving Amid Diverse Applications," arXiv preprint, 2024. doi: 10.48550/arXiv.2411.15997.

[19] OWASP Foundation, "OWASP Top 10 for Large Language Model Applications, Version 2025," 2025. [Online]. Available: https://genai.owasp.org/llm-top-10/

[20] PostgreSQL Global Development Group, "CREATE POLICY: Define a New Row-Level Security Policy for a Table," PostgreSQL Documentation. [Online]. Available: https://www.postgresql.org/docs/current/sql-createpolicy.html

[21] pgvector Contributors, "pgvector: Open-source Vector Similarity Search for Postgres," GitHub repository. [Online]. Available: https://github.com/pgvector/pgvector

[22] Kubernetes, "Declare Network Policy," Kubernetes Documentation. [Online]. Available: https://kubernetes.io/docs/tasks/administer-cluster/declare-network-policy/

[23] Open Policy Agent, "OPA Gatekeeper: Policy Controller for Kubernetes," Open Policy Agent Documentation. [Online]. Available: https://www.openpolicyagent.org/docs/latest/

[24] Pinecone, "Implement Multitenancy," Pinecone Documentation. [Online]. Available: https://docs.pinecone.io/guides/index-data/implement-multitenancy

[25] Milvus, "Implement Multi-tenancy," Milvus Documentation. [Online]. Available: https://milvus.io/docs/multi_tenancy.md

Silo, Pool, and Bridge for Multi-Tenant RAG: Measuring Isolation, Noisy-Neighbor Effects, and Cost in SaaS Microservices

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

How to Cite

Similar Articles

callforpaper

Submission

Menu

Latest publications

Information

Reach US

Ethics and Policies

Important Links

Downloads & Indexing

Similar Articles

Managing authentication in REST Assured OAuth, JWT and More

Cloud-Native Micro services Architecture

Microservices Architecture for Scalable Real-Time Data Processing at the Edge

Data-Governed Autonomous Decisioning: AI Models for Real-Time Optimization of Enterprise Financial Journeys

Cyber Insurance Evolution: Addressing Ransomware and Supply Chain Risks

Bridging the Gap: Analyzing Emerging Threats in SAP Cyber security for Enterprise Landscapes

AI-Driven Cloud Integration and Orchestration for Next-Generation Enterprise Systems

Zero Trust Identity Management with Azure Entra and Conditional Access

Interoperability and Vendor Neutrality in O-RAN Deployments

The Evolution of Software Delivery and the Rise of Saas Solutions