Artificial intelligence

LLMs with Retrieval-Augmented Generation Technology – Good or Bad for Privacy Compliance?

Published on 11th November 2025

Retrieval‑Augmented Generation (RAG) makes LLMs more accurate and up‑to‑date, but it also raises GDPR and privacy‑compliance risks. Drawing on EDPB guidance and the DSK’s RAG Guidelines (Oct 2025), this article summarizes the key privacy risks (data minimisation, purpose limitation, security and third‑party transfers) and benefits for privacy compliance (accuracy, transparency and lawfulness) and outlines possible risk mitigation measures, such as source due diligence, stringent access control, pseudonymisation, minimizing logging and retrieval filters.

What is RAG Technology?

Retrieval-Augmented Generation Technology, or RAG technology for short, is a technique that enables AI systems with large language models (LLMs) to leverage additional information and documents to optimize output. In contrast to training and fine-tuning, such additional information and documents have not been used to train and improve the LLM itself.

Put simply, documents or information retrieved from an external source are fed into the AI system to enhance the accuracy and reliability of the output generated by the LLM. The underlying LLM itself remains unchanged; it receives, however, additional information and context to inform its output.

A typical example of RAG technology is a chatbot with APIs to external databases to incorporate a document search functionality or with the ability to process a large number of uploaded documents to generate more tailored output.

RAG technology – privacy risks and benefits

The additional information and documents used by the RAG technology can – of course – contain personal data. As an initial reaction, privacy concerns arise because processing more personal data can have a negative impact on key privacy principles, such as data minimization, transparency and lawfulness, to name a few, and the use of the RAG technology may result in dataflows to external parties.

The paper AI Privacy Risks & Mitigations for Large Language Models published in March 2025 by the European Data Protection Board (EDPB)’s Support Pool of Experts discusses, among other things, privacy risks resulting from the use of RAG technology.

But, RAG technology can also have a significant positive impact on privacy compliance as concluded by the German data protection conference (“DSK”), the committee of the German federal and state data protection supervisory authorities, in their guidelines on data protection law issues specific to generative AI systems using the RAG technology (“RAG Guidelines”) published on 17 October 2025.

EDPB’s Support Pool of Experts Paper - what are the privacy risks of RAG technology?

According to the EDPB’s Support Pool of Experts paper, the key privacy risks and concerns when using RAG technology relate to the following aspects:

The external database may contain personal data, including sensitive data, that is subsequently processed by the AI system to generate the output without proper safeguards and privacy compliance measures in place. This risk arises particularly if neither the external databases nor the content of such external databases used as a source have been properly identified and assessed before deployment. Furthermore, if the RAG technology uses APIs to connect with external databases, prompts containing personal data may be transmitted to third parties without knowledge of any subsequent retention and processing by such third parties. Poorly configured retrieval logic could result in irrelevant, misleading or misunderstood context which is then fed into the AI system and processed to generate incorrect or hallucinated output. Additionally, insufficient security measures for personal data transmitted via APIs to external databases, as well as data processed and stored in AI systems using RAG technology, create risks of unauthorized access or data leaks. Privacy risks may also arise from inadvertent storage of personal data in log files of the RAG-based AI system.

These privacy risks must be addressed through appropriate mitigation measures. These measures vary on a case-by-case basis, but may include proper due diligence of the external sources, their content (including sensitive data) and the APIs used by the RAG technology, ideally restricting retrieval sources to internal sources with appropriate access right restrictions, due diligence on the data retention concepts of any external databases used by the RAG technology, anonymizing or pseudonymizing prompts shared with external databases used by the RAG technology, minimizing logging, data transfer agreements with third parties, regularly evaluating the output of the RAG technology based AI system for accuracy and non-hallucination, transparency measures, retrieval filters and configuration to reduce the risk of processing and leaking sensitive data, weighing or flagging retrieved content with source information for appropriate context, and a robust data rights management system.

DSK RAG Guidelines - what benefits can RAG technology offer for data privacy compliance?

The overarching benefit of RAG technology compared to generative AI systems without RAG technology is the generation of more accurate, dynamic, updated and real-time output by leveraging information and documents in addition to the knowledge base of the underlying LLM.

LLMs are trained on data up to a specific point in time and may lack awareness of recent developments or domain-specific information. The RAG Guidelines of the DSK examine not only the privacy risks, but also the positive impact of such benefits on privacy compliance, in particular with regard to the principles of accuracy, transparency and lawfulness.

Accuracy

The core benefit of RAG technology serves the accuracy principle. By leveraging information and data from additional sources beyond the knowledge base of the LLM itself, the output should be more up-to-date and context-specific, thereby reducing the risk of inaccurate and hallucinated output. According to the DSK, the level of risk reduction depends significantly upon the quality of the additional information and documents as well as various technical aspects, such as prioritization and subsequent processing by the AI system. A mix of languages in prompts and additional information/documents should be avoided to maintain the level of accuracy.

Transparency

RAG technology does not inherently increase the overall transparency of an AI system, and transparency obligations regarding the processing of personal data by the RAG-based AI system must be addressed through typical transparency methods. However, if the RAG technology allows identification of the sources for the generated output, greater clarity and explainability are achieved, which has a positive impact on the balancing of interest test.

Principle of Lawfulness

The processing of any personal data by a RAG-based AI system requires a legal basis, and for this analysis, the processing by the RAG components must be taken into account. However, more importantly, according to the DSK, RAG technology can support a positive balancing of interest test, particularly through the reduced risk of inaccurate or outdated output through the use of retrieved information, greater clarity and explainability, and the identification of sources used for the generated output.

Beyond the privacy benefits of RAG technology, the RAG Guidelines also discuss privacy risks stemming from RAG technology, in particular relating to data minimization, purpose limitation, and integrity and confidentiality.

Data Minimisation

Given the increased volume of data processed by an AI system through RAG technology, steps must be considered to minimize the processing of personal data. This could be achieved through restrictions on retrieval of information from external sources, retention policies for retrieved information and logs within the AI system, and removal of personal data from retrieved information (e.g., through data transformation steps) prior to further processing by the AI system.

Purpose limitation

To comply with the purpose limitation principle when retrieving personal data from external sources for further processing by RAG-based AI systems, the initial purposes for which such personal data was collected must be adhered to in the context of the AI system, particularly through appropriate access rights concepts and data separation measures. The potential linkage of retrieved personal data with personal data generated by the LLM as output may infringe upon the purpose limitation principle, which represents a common risk for AI systems.

Integrity and confidentiality

To comply with the principle of integrity and confidentiality, appropriate access rights concepts must be applied for RAG technology, particularly regarding the content of external sources. These measures may also enable the lawful processing of sensitive data by the RAG-based AI system, provided such sensitive data is not used for further LLM training and remains stored within the external source.

Data subject rights

The DSK acknowledges that compliance with data subject rights relating to the LLM itself remains an unresolved issue, as already explained in the discussion paper of the Hamburg supervisory authority on LLMs and personal data dated July 17 2024 and in the EDPB Opinion 28/2024 on certain data protection aspects related to the processing of personal data in the context of AI models dated 17 December 2024. However, the DSK also states that data subject rights can and must at least be complied with regarding both input and output of an AI system, as well as the retrieved information and documents used by the RAG technology.

Key takeaways - balancing risks and benefits of RAG with case-by-case assessments

On the one hand, RAG technology, as any other technology used in the context of AI systems, bears privacy risks. The EDPB's Support Pool of Experts paper AI Privacy Risks & Mitigations for Large Language Models and the RAG Guidelines provide helpful guidance for identifying these particular privacy risks. Nevertheless, a case-by-case risk assessment for each RAG-based AI system remains a compliance necessity.

On the other hand, as illustrated by the DSK’s RAG Guidelines, RAG technology can also help achieve compliance with certain privacy principles, most notably the principle of lawfulness. In particular, the reduced risk of inaccurate or outdated output through the use of retrieved information, a greater level of clarity and explainability, and the identification of the sources used for the generated output can help tip the scale towards an overriding legitimate interest. Understanding and configuring how RAG technology is implemented for a specific AI system is key to leveraging such benefits in an organization’s privacy compliance assessments.

* This article is current as of the date of its publication and does not necessarily reflect the present state of the law or relevant regulation.

Connect with one of our experts

Interested in hearing more from Osborne Clarke?