7/16/2025 11:48:39 AM | 5 minute read

Mining the copyright chapter of the GPAI Code

Get in touch

Richard Barker

Senior Knowledge Lawyer

Get in touch

Richard Barker

Senior Knowledge Lawyer

The final version of the EU’s General Purpose AI (GPAI) Code of Practice was published on 10 July, after a two month delay.

As we discussed in our recent podcast, that delay was caused in part by difficulties in getting both AI developers and those in the creative industries comfortable with the copyright aspects of the Code (see our earlier blog).

Now that we have the final version, the big question (at least for an IP lawyer like me!) is, have any changes been made to the copyright chapter to address the concerns raised? If so, how far do they go and in whose favour?

We take a closer look below.

What is the GPAI Code?

The GPAI Code is a voluntary code of practice, written by independent experts with stakeholder engagement, that is designed to help GPAI model providers demonstrate compliance with their obligations under Articles 53 and 55 of the EU AI Act.

Article 53(1)(c) is particularly important from an IP perspective, as it requires providers of GPAI models that are placed on the EU market to put in place a policy to comply with EU copyright law, including to identify and comply with rights holders’ opt-outs from the EU’s text and data mining (TDM) exception under Article 4(3) of the Copyright in the Digital Single Market Directive (DSM Directive). See our earlier blog for further detail.

The final version of the Code is made up of three chapters, one of which is on copyright. That chapter sets out a number of measures that signatories to the Code agree to implement in order to demonstrate compliance with Article 53(1)(c).

What does the copyright chapter say?

The final version of the copyright chapter contains five measures – one less than the previous version.

Under the first measure, signatories agree to draw up, keep up-to-date and implement a policy to comply with EU copyright law for all GPAI models they place on the EU market. Whilst signatories are encouraged to make a summary of their policy publicly available, that is not mandatory.
The second measure regulates the mining of web-crawled content and aims to ensure that signatories only reproduce and extract lawfully accessible works for training purposes. This includes commitments not to circumvent technological measures (e.g. paywalls and subscription barriers) and to exclude certain piracy-focussed domains from their web-crawling.
The third measure sets out signatories’ commitments for identifying and complying with rights holders’ TDM opt outs. This includes employing web-crawlers that read and follow instructions expressed in accordance with the Robot Exclusion Protocol; engaging with rights holders to develop machine-readable standards for expressing a rights reservation; and publishing information about the web-crawlers used, their robots.txt features and any other measures adopted to identify and comply with rights reservations.
The fourth measure sets out commitments to mitigate the risk that a downstream AI system generates infringing outputs. That includes implementing technical safeguards to prevent GPAI models from generating outputs that reproduce copyright-protected training material, as well as prohibiting copyright infringing uses in signatories’ acceptable use policies and terms and conditions.
Under the fifth and final measure, signatories commit to designate a point of contact for affected rights holders to communicate with and to put in place a mechanism to allow affected rights holders to lodge complaints about signatories’ non-compliance with the copyright chapter of the Code.

What has changed since the last draft?

Whilst the general structure of the copyright chapter remains largely intact, there have been a number of material changes to the content. Many of those changes are in favour of rights holders, but not entirely so. We’ve highlighted five key changes below.

Measure I.2.4 in the third draft of the Code - which concerned the mining of protected content not web-crawled by the signatory – has been deleted in its entirety. That measure required signatories to carry out certain due diligence on the content in question, in particular to try to find out whether that content was scraped using crawlers that respect the Robot Exclusion Protocol. With that measure removed, there are no longer any commitments around this.
There is much greater focus in the final version on the differences between compliance with the Code and compliance with EU copyright law. These are two separate things. Whilst compliance with the Code will help GPAI model providers to demonstrate their compliance with Article 53(1)(c) of the AI Act, it will not necessarily equate to compliance with EU copyright law. Indeed, the final version makes it clear that the Code has no effect on the application and enforcement of EU copyright law.
The strength of a number of the commitments being given by signatories has been upgraded from “best” or “reasonable” efforts to an absolute commitment. This includes the commitments to exclude certain piracy-focussed domains from web-crawling and to identify and comply with certain rights reservation protocols other than robots.txt.
The scope of the commitment to prohibit copyright infringing uses of a GPAI model has been extended, with signatories now being required to alert users of open source models to the fact that copyright infringing uses are prohibited. Previous drafts had expressly stated that this commitment didn’t apply to such models.
Signatories will give stronger commitments to act on any complaints received from rights holders. The third draft didn’t contain any express commitments to act on complaints but, under the final version, signatories must act “in a diligent, non-arbitrary manner and within a reasonable time, unless a complaint is manifestly unfounded or the Signatory has already responded to an identical complaint by the same rightsholder.”

Comment

As my colleague, Nat, has mentioned, the Code has had a somewhat bumpy journey so far. Whether these changes to the copyright chapter have sufficiently addressed the concerns raised by both model providers and rights holders remains to be seen. But initial indications suggest significant concerns remain and so, given the voluntary nature of the Code, it will be interesting to see how many model providers actually sign up.

Perhaps with that in mind, the Commission’s related FAQs appear to seek to allay concerns about model providers signing up and then being immediately penalised for not fully complying, by stating that if such providers “do not fully implement all commitments immediately after signing the Code, the AI Office will not consider them to have broken their commitments under the Code and will not reproach them for violating the AI Act. Instead, in such cases, the AI Office will consider them to act in good faith and will be ready to collaborate to find ways to ensure full compliance”. Given the importance to the Commission of model providers signing up to the Code, and concerns that some of the big providers may not, some commentators have suggested this may be being used as an incentive to bring model providers on board.

However, with a recent European Parliament report questioning whether generative AI training even qualifies as TDM, and arguing that the exception in Article 4(3) of the DSM Directive does not extend to generative AI training, we may be about to enter a whole new stage of this debate.