This browser is not actively supported anymore. For the best passle experience, we strongly recommend you upgrade your browser.
Digital developments in focus
| 4 minutes read

Transparency and generative AI – opening the black box?

As generative AI continues to dominate headlines, one aspect that governments and regulators across the world are trying to grapple with is how best to balance the rights of AI developers and IP rights holders. With real-world disputes starting to come before the courts (see here) and generative AI continuing to develop at an increasingly rapid pace, finding the right balance is crucial.

In seeking to find that balance, the European Parliament’s Internal Market and Civil Liberties committees have proposed adding into the EU’s draft Artificial Intelligence Act (“AI Act”) new transparency requirements on providers of generative AI. If adopted, these will require those providers to document and publish summaries of copyright-protected data used to train their models, potentially opening the AI black box.  

Whilst the question of IP infringement in the context of generative AI extends beyond copyright (e.g. to database rights and trade marks), we focus on these proposals from a copyright perspective below.

TDM and copyright infringement

As readers of this blog will know, training AI systems (including generative AI) often involves a process known as text and data mining (“TDM”). This involves huge amounts of data being inputted into a computer system in order for that system to analyse the data and spot patterns, trends and correlations. The data and information used in these training processes often includes things like text, artworks, photographs and music, which may be protected by copyright. Use of such copyright works for TDM without a licence may amount to copyright infringement, so there has been a lot of focus recently on potential exceptions that AI developers could rely on to avoid liability.

UK and EU look to take different approaches

Last year, the UK Intellectual Property Office indicated that it would adopt a broad new exception, which would allow TDM for any purpose (including commercial purposes) with no ability for rights holders to opt out or contract out. However, following significant backlash from those in the creative industries, those proposals have now been scrapped. Instead, the latest proposal is a new code of practice that will be aimed at supporting AI companies to access copyright-protected materials for use in training data, whilst still considering the rights of IP owners (see further here).

The EU, in contrast, already has an exception which allows TDM for commercial purposes, but this is subject to rights holders having the ability to opt their content out (e.g. by machine-readable means, such as through metadata, or website terms and conditions, for online content). Whilst the position in the EU arguably strikes a better balance than that currently in play in the UK, it is by no means perfect. For example, how should rights holders best express their opt out? And, even if they do it correctly, given the black box nature of many AI systems, how can they be certain that their content has not been used in breach of their opt out?

These sorts of questions have led many rights holders to push for greater transparency.

Cue the European Parliament’s new transparency proposals

On Thursday 11 May, the European Parliament’s Internal Market Committee and Civil Liberties Committee adopted a draft negotiating mandate, which contained several proposed amendments to the European Commission’s original draft of the AI Act. Whilst the AI Act is not copyright focussed, the proposed amendments include a few transparency proposals for generative AI that are worthy of note. In particular, providers of generative AI would need to:

  • register their model in a new public database, together with a description of the data sources used in the development of the model;
  • publish “a sufficiently detailed summary of the use of training data protected under copyright law”;
  • design their models in such a way as to “ensure adequate safeguards against the generation of content in breach of Union law”; and
  • flag where content has been generated by AI.

These obligations aren’t toothless, either - failure to comply could result in administrative fines of up to €10 million or 2% of total worldwide annual turnover, whichever is higher.

Next steps

This draft negotiating mandate will need to be endorsed by the full European Parliament before negotiations with the Council on the final form of the AI Act can begin. It is expected that the Parliamentary vote will take place between 12-15th June, with the trilogue to follow shortly after. Whether these latest proposals will survive that process, and if so in what form, remains to be seen.


In a world in which generative AI is on everyone’s minds and in which IP disputes are starting to come before the courts, it’s good to see that lawmakers are beginning to consider in earnest how best to regulate it.

Whilst the current copyright position under EU law is more favourable to generative AI developers than the UK (with the UK’s existing TDM exception only covering TDM for non-commercial research), it is not without fault. The proposed transparency obligations may go some way to alleviating some of the issues. But clarity on the scope of the obligations will need to be provided if they are to have the desired effect. For example, what does “a sufficiently detailed summary” look like? If this will require developers to disclose granular details of copyright-protected data used for training then it’s difficult to see how, practically, developers would comply with these obligations in their current form given that there will likely be millions of data points involved in the training of these tools. If, however, the summary is to be provided at a much higher level of abstraction then it may not achieve its stated aim of providing transparency. (Alternatively, the proposed language could be read as requiring a summary of the “use” made of copyright-protected data, as opposed to a summary of the data itself, but the press announcement suggests this isn’t the intention.)  

Questions also arise as to what is meant by “adequate safeguards against the generation of content in breach of Union law” and what developers would be required to do to comply.

It will be fascinating to see how this, and the UK proposals, develop over the next few months. We’ll keep you posted!

“Generative foundation models, like GPT, would have to comply with additional transparency requirements, like disclosing that the content was generated by AI, designing the model to prevent it from generating illegal content and publishing summaries of copyrighted data used for training.”


ai, emerging tech, ip