Implications of the German GEMA ./. OpenAI judgement on the Life Sciences and Healthcare Sector
Published on 24th November 2025
The District Court of Munich I has issued a landmark ruling on copyright infringement involving AI (11 November 2025, case no. 42 O 14139/24). Although the dispute concerned song lyrics, its reasoning is likely to resonate across all categories of copyright-protected works. What does this mean for the Life Sciences sector?
The widely watched case brought by GEMA, a German collecting society that administers the exploitation rights in musical works assigned to it by composers, lyricists, and music publishers, against OpenAI has, for now, reached a conclusion. The District Court issued a detailed holding that OpenAI infringed copyright through the training and operation of ChatGPT insofar as it reproduced song lyrics. This is the first German decision to legally assess the use of copyright‑protected works by generative AI models and to find in favour of rights holders.
Interference with copyright
The court found that the exclusive right of exploitation was infringed through reproduction and publication (in legal terms: “making available to the public”) of the protected song lyrics. To come to this finding, it distinguished between three chronological phases of an AI large language model (LLM):
In the first phase, a training dataset is created and converted into a machine-readable format. Then, in phase two, the dataset is analysed and enriched with metadata. The AI model is trained using this dataset. The third phase, finally, involves using the trained model via prompts and receiving outputs.
According to the court, the relevant reproduction primarily takes place in phase two. The court acknowledges that the automated analysis of training data is not a copyright‑relevant act, regardless of the data containing protected works. But, when the AI model is trained with protected works by not only analysing but also memorising them, this constitutes a reproduction since the works are then contained in the AI model in a way that makes them reproducible through simple prompts. Thus, it is the memorisation of protected works by the AI model in phase two which constitutes a reproduction.
However, phase three is also relevant as further infringements can take place when copyrighted works are then reproduced by the AI model. Despite minor changes the song lyrics at issue were still recognizable in the outputs and were therefore, in the court’s view, reproduced and made publicly available in the sense of § 19a UrhG.
Permissibility based on copyright limitations
The court then examined whether the reproduction of protected works by the AI model was permissible under statutory copyright limitations, such as permissible reproduction as an insignificant accessory under § 57 UrhG. However, it especially focused on the statutory limitation of permissible text and data mining (TDM) under § 44b UrhG, which was finally rejected with respect to reproduction through training (phase two).
In the court’s view, the TDM limitation only permits reproductions of protected works when necessary for compiling the training dataset (phase one). However, it does not allow any further reproductions, especially not the permanent incorporation and memorisation of the protected works in the AI model in a way that makes them reproducible.
The court further held that OpenAI’s interference with GEMA’s exploitation rights is not justified by any consent from the rightsholders, because training models is not to be regarded as a customary and foreseeable form of use that a rightsholder must anticipate. That undermines arguments of implied licences.
Implications on pharma, research and healthcare
Depending on the facts, the findings of this District Court decision could have implications for the life sciences industry, as many life‑science models are trained on copyright‑protected texts or databases. The court could treat “memorisation” of such information in the model as a copyright reproduction and reject coverage by the text‑and‑data‑mining exception (§ 44b UrhG). That increases risk where models have been trained on such sources and can reproduce longer passages verbatim.
Copyright‑protected works may, for example, include research publications, protocols of clinical trials, expert opinions, visualisation of study findings, packaging layout of healthcare products, standard operating procedures (SOPs), lab manuals, validation reports, verification reports, technical study reports, statistical analysis plans (SAPs), data management plans, case report form (CRF) designs, eCRF user flows, informed consent forms, patient information leaflets, investigator brochures, regulatory submission narratives, regulatory summaries (e.g., CTD Module 2), risk management files, IFUs (instructions for use), scientific posters, and many others. Even if individual pieces of information (e.g., research data) are not protected, the aggregated form may be protected as a database.
If an AI LLM reproduces protected works in a recognisable way in response to simple prompts, some courts could treat that as evidence from which memorisation may be inferred, though whether such an inference is drawn - and what technical proof is required - remains disputed and fact‑specific. The specific TDM exception for scientific research (§ 60d UrhG) typically helps research institutions with a non‑profit mission. Corporate R&D is generally not covered. Even in joint projects with universities, the subsequent commercial use of a memorising model remains questionable - under the judgment’s logic, likely not covered.
It is important to note that the court considered the model operator responsible for copyright‑infringing outputs (not the user). For life‑science chatbots, clinical decision‑support tools or internal research assistants, this means that if the system outputs near‑verbatim passages from protected sources, the primary liability most probably sits with the provider.
Apart from memorisation, outputs generated by AI models may also raise copyright issues. Whether any given output infringes copyright will turn on its similarity to protected material, the originality of the output, and the availability of exceptions or defences. Life sciences companies that publish AI‑generated results or incorporate them into products should proceed cautiously; in the event of a dispute, they may face demands for takedown or modification and, in some jurisdictions, injunctive relief, damages or an account of profits. The precise remedies and obligations, including any product‑related measures, are highly jurisdiction‑ and case‑dependent.
Conclusion
The Munich District Court’s ruling signals a restrictive approach to AI training and outputs that elevates copyright risk for life‑science organisations operating in Germany and the EU. Even though not final, the decision frames (i) model “memorisation” of protected works during training as an infringing reproduction not covered by the text‑and‑data‑mining exception (§ 44b UrhG), and (ii) near‑verbatim outputs as further infringements for which the model operator—not the user—bears primary responsibility. Corporate R&D is unlikely to benefit from the research TDM exception (§ 60d UrhG). Given that many life‑science workflows rely on copyright‑protected materials (from research publications and clinical protocols to IFUs and regulatory summaries), companies should treat model training on such sources as a licensed activity and assume liability exposure if protected content can be reproduced by simple prompts. However, the practical and legal effects of the judgment are still unclear. It remains to be seen whether the District Court’s assessment will be confirmed in further instances, by other national courts and by the CJEU.
Practical takeaway
- Adopt a conservative posture until appellate or EU‑level clarity: secure training licences that expressly cover fixation in the model and potential reproductions; respect TDM opt‑outs.
- Strengthen data governance and source curation; avoid paywalled or restricted corpora without clear rights; be mindful that curated databases may carry sui generis protections.
- Minimise memorisation and verbatim output risks (deduplication, regularisation, output filters, thresholds for near‑verbatim strings) and prefer retrieval‑based designs with licensed sources.
- Build contractual protections with AI vendors (training‑data warranties, compliance with TDM, output controls, indemnities) and separate research prototypes from commercial products.
- Implement pre‑publication/product review for AI‑generated content and an incident response plan for takedown/modification demands.
This insight was created with the contribution of Alexa Gablenz.