The EDPB recently adopted Opinion 28/2024, addressing data protection concerns in the development and deployment of AI models. The opinion tackles key questions about AI models trained on personal data, the use of legitimate interests as a legal basis for the processing, and the impact of unlawful data use during AI model development.
Anonymity in AI Models: A Case-by-Case Analysis
The EDPB clarifies that AI models trained on personal data cannot, in all cases, be considered anonymous. Supervisory authorities should consider three elements to determine whether an AI model is anonymous:
- Assess the likelihood of model extraction and data extraction: This needs to be considered in light of the WP29 Opinion 05/2014 on Anonymisation Techniques, and if it is not possible to single out, link and infer information from a dataset, the data may be considered anonymous. AI models will require a thorough evaluation on the risks of identification.
- Take into account all the means reasonably likely to be used by the controller or another person to identify individuals: This determination should be based on objective factors, including the context in which the AI model is released and/or processed, the costs and amount of time that the person would need to obtain additional information to identify an individual.
- Consider whether the controllers have assessed the risk of identification by the controller and by different types of other persons: This includes unintended third parties accessing the AI model.
The EDPB also provides a list of elements to assess a controller’s claim of anonymity, which is expected to be backed by appropriate documentation such as data protection impact assessments.
We discussed the question of whether training data is retained in an AI model in identifiable form in a podcast on AI compliance (here) with Kat James, Technical Director at Faculty. In this, Kat noted that while training data is not technically stored within an AI model, LLMs and neural networks are composed of vast, intricate networks of parameters, which means in certain circumstances these can be used to reverse-engineer and reconstruct elements of the training data.
Legitimate Interests in AI Development
The opinion outlines a three-step test for organisations relying on legitimate interests under GDPR:
- Define the interest: The interest must be lawful, real, and precise (e.g., improving threat detection systems).
- Assess the necessity: Data processing should be necessary for the purpose of the legitimate interest pursued, and there should be no less intrusive way of pursuing the purpose. The intrusiveness may vary depending on whether the controller has a direct relationship with the data subjects (first-party data) or not (third-party data).
- Balance of interests: Legitimate interests should not override data subjects’ interests and fundamental rights (e.g. financial interests, personal benefits, freedom of expression), considering the impact of the processing on data subjects, reasonable expectations and mitigating measures.
The EDPB provides examples of mitigating measures in relation to the development and deployment of AI models, which include pseudonymisation measures and proposal of an unconditional ‘opt-out’ from the outset. In addition, in relation to web scraping, the Opinion lists specific measures, such as imposing limits on collection, e.g. ensuring that certain data categories are not collected.
Unlawful Data Use: Ripple Effects on Deployment
The Opinion considers three scenarios of unlawfully processed data during AI model development:
- If personal data is still present in the model, such unlawful processing may impact the lawfulness of any subsequent processing by the same controller, depending on the purposes of the processing activities.
- If personal data that is unlawfully processed is retained in the model and processed by another controller during deployment, then the controller deploying the model should conduct an appropriate assessment as part of its accountability obligations.
- If the personal data used to develop the model is anonymised before deployment, the GDPR does not apply to its further use, provided that supervisory authorities assess the considerations related to anonymity.
How Does the ICO’s GenAI Consultation Response Compare?
As discussed in our previous blog, the ICO has recently published its responses to its consultation series on how data protection law should apply to genAI models. The EDPB’s guidance, while not directly focussed on genAI, touches on large language models and their data protection implications, particularly in the context of relying on legitimate interests when processing personal data. Both ICO and EDPB stress the importance of measures that allow individuals to exercise their rights and emphasise transparency. Although the EDPB’s guidance appears more detailed and prescriptive, it remains uncertain whether the recommended measures will be feasible for AI practitioners to implement in practice.
Final Thoughts
The EDPB’s opinion sets out important considerations for data protection in AI, emphasising anonymity and compliance with GDPR. While it provides valuable guidance, its impact will depend on how effectively organisations can translate the principles and measures into actionable practices.