As we’ve previously reported on this blog, the third draft of the EU’s General-Purpose AI Code of Practice was published on 11 March.
Whilst the majority of the commitments set out in this voluntary Code only apply to providers of general-purpose AI (GPAI) models with systemic risk, there are two commitments that apply to all providers of GPAI models that are placed on the EU market. One of those commitments (Commitment I.2) focuses on copyright and aims to assist GPAI model providers in complying with their copyright-related obligations under Article 53(1)(c) of the EU AI Act. That Article requires such providers to put in place a policy to comply with EU copyright law, including to identify and comply with rightsholders’ opt-outs from the EU’s text and data mining (TDM) exception under Article 4(3) of the EU DSM Directive. See our earlier blog for further detail.
As we await the fourth and final version, which is due to be published by 2 May, we take a closer look at the latest version of the Code’s copyright commitment and how it differs from the previous draft (which we discussed here).
What does the latest draft say about copyright?
In the latest version, the copyright commitment has been streamlined, reducing the number of sub-measures from eleven to six.
- The first measure explains what it means to “put in place” a policy to comply with Union copyright law, highlighting that this means “draw up, keep up-to-date and implement” such a policy. It also clarifies that that policy must address the other five measures mentioned in the copyright commitment. Signatories are encouraged to make a summary of their copyright policy publicly available, although this is no longer mandatory.
- The second and third measures regulate the mining of web-crawled content. This includes setting out what signatories will need to do to identify and comply with rightsholders’ TDM opt outs (such as respecting the Robot Exclusion Protocol and making best efforts to identify and comply with other appropriate expressions of rights reservation); as well as obliging signatories not to circumvent technological measures (e.g. paywalls) and to make reasonable efforts to exclude piracy-focused domains from their web-crawling.
- The fourth measure concerns the mining of protected content not web-crawled by the signatory (e.g. where third party datasets are used).
- The fifth measure sets out commitments to mitigate the risk that a downstream AI system repeatedly generates infringing output. That includes making “reasonable efforts to mitigate the risk that a model memorizes copyrighted training content to the extent that it repeatedly produces copyright-infringing outputs”; and a commitment to prohibit copyright infringing uses in signatories’ acceptable use policies and terms and conditions (with a carve out for GPAI models that are released under a free and open-source licence).
- The final measure requires signatories to designate a point of contact for affected rightsholders to communicate with and to put a mechanism in place to allow affected rightsholders to lodge complaints about signatories’ non-compliance with their copyright commitments under the Code.
Key changes made between second and third drafts
The third draft is certainly more streamlined than the second, but does retain the majority of the core elements.
One thing that has been removed is the key performance indicators that were previously present in respect of each measure. Some of these have been incorporated into the main body of the measures, whilst others have been lost.
Another obvious change is that the recitals to the copyright section now explicitly state that this part of the Code is without prejudice to the application and enforcement of EU copyright law. Whilst this doesn’t change anything substantively, it does reinforce the point that compliance with the Code will not necessarily result in compliance with EU copyright law. To give just one example, under the third measure of the copyright section, signatories must comply with the Robot Exclusion Protocol and make “best efforts” to comply with “other appropriate machine readable protocols to express rights reservations”. The need to comply with TDM opt outs in Article 4(3) of the EU DSM Directive is, however, absolute, not “best efforts”. So if, for example, signatories are training their models within the EU, compliance with the Code may not be enough to enable them to rely on the Article 4(3) exception.
A number of other material changes have been made in the latest version, including:
- limiting the commitments relating to rightsholders’ ability to lodge complaints – previously this covered complaints relating to the unauthorised use of rightsholders’ works in training the signatory’s GPAI model; that has been limited to complaints concerning non-compliance with the copyright section of the Code; and
- limiting the scope of the commitments relating to the use of third party datasets. The second draft stated that signatories would make reasonable efforts to assess the copyright compliance of third party datasets, including by carrying out copyright due diligence before entering into contracts with third parties for use of data sets in developing GPAI models and, for private non-publicly accessible datasets, making reasonable and proportionate efforts to obtain assurances from each third party about their compliance with EU copyright law. This has been limited to making reasonable efforts to obtain “adequate information” on whether such content was collected using web-crawlers that respect the Robot Exclusion Protocol.
Next steps
Working group participants and observers of the Code of Practice Plenary had until 30 March to submit written feedback on the third draft.
The final version of the Code is due to be published by 2 May, in advance of the AI Act’s rules on GPAI models coming into force on 2 August.