Fill out the consultation here: https://ipoconsultations.citizenspace.com/ipo/consultation-on-copyright-and-ai/consultation/
4. Do you agree that option 3 - a data mining exception which allows right holders to reserve their rights, supported by transparency measures - is most likely to meet the objectives set out above?
No.
Rights holders should not be required to reserve their rights. Opt-out schemes like this don’t give meaningful control to rights holders: they can’t opt out downstream copies of their works, and models trained on their works often continue to be used for a long time after they have opted out. What’s more, it is hugely unfair to rights holders to require they opt out: most will not hear about the scheme or know how to use it, and even those who do will face a huge administrative burden. AI companies should be required to get opt-in consent from rights holders.
As an example of the issue of downstream copies, consider a composer whose works are distributed by a music publisher. A choir buys the composer’s sheet music and records it. That recording is broadcast on the radio. Can the composer opt out of the recording being used for AI training? Of course not. From the moment they publish the piece, they have no control whatsoever over who uses it, for what purpose. And this applies to a million pieces of media, across the creative industries. You can only opt out where you control your work – but, most of the time, you don’t. The best automatic content recognition systems don’t come close to solving this problem.
Even if you do somehow solve the downstream copy problem, requiring rights holders to opt out is hugely unfair on them. Generally, fewer than 10% of people eligible to opt out actually do so; this is the figure we have from one of the only generative AI opt-out schemes where opt-out numbers have been shared publicly (https://musically.com/2023/09/13/stable-diffusion-maker-launches-stable-audio-text-to-music-ai/). However, polls show that around 90% of creators object to their works being used without a license (e.g. 95% of artists say training requires consent, 94% that it requires payment - https://cdn.dacs.org.uk/uploads/documents/News/Artificial-Intelligence-and-Artists-Work-DACS.pdf; similar numbers in other countries - https://www.apraamcos.com.au/about/supporting-the-industry/research-papers/aiandmusic; these numbers are roughly the same in all polls of creators on this question). The only explanation for this huge difference is that people don’t realise they can opt out, and/or don’t understand how to.
There is more detail provided on the insurmountable problems of generative AI opt-outs in this essay: https://ed.newtonrex.com/optouts.
5. Which option do you prefer and why?
Option 1: Strengthen copyright requiring licensing in all cases
To be clear, I don’t think the law needs to be updated, as it is currently clear: commercial generative AI training requires licensing. The 1988 Copyright, Designs and Patents Act makes it clear that there is a text data mining (TDM) exception only for non-commercial research at 29A (https://www.legislation.gov.uk/ukpga/1988/48/section/29A). This is widely accepted in legal circles; for instance, when discussing exceptions from liability from copyright infringement in generative AI training, Clifford Chance says “UK law currently permits "text and data analysis" only for non-commercial research (s. 29A, CDPA)” (https://www.cliffordchance.com/insights/resources/blogs/talking-tech/en/articles/2023/04/ai-generated-music-and-copyright.html). And even British AI companies don't seem to argue it's legal: for instance, in the only UK copyright lawsuit over generative AI training, Getty vs. Stability, the defence made by Stability to the claim over copying during training is simply that they didn’t copy the images in question, or do the training, in the UK; not that doing so would be legal (https://drive.google.com/file/d/1HIS9jM6iMRYlTDmaz8LkL30phoF8hXjg/view).
But there is no harm in strengthening copyright law. The requirement to license must certainly not be weakened, because generative AI competes with the work it’s trained on, and the people behind that work. There is copious evidence of this. Data in the Harvard Business Review shows that the introduction of ChatGPT decreased writing jobs by 30% and coding jobs by 20%, and AI image generators decreased image creation jobs by 17% (https://hbr.org/2024/11/research-how-gen-ai-is-already-impacting-the-labor-market). Some artists’ income fell by 1/3 after Midjourney was trained on their work (https://podcasts.apple.com/us/podcast/104-generative-ai-is-it-creative-or-just-copying-the/id1225077306?i=1000643488947, 7:30). Image generation jobs have fallen rapidly compared to manual-intensive jobs since the introduction of AI image generators (https://www.ft.com/content/185e2e9d-2642-4b2b-b2e0-99751841b07a). Filmmakers are abandoning human-composed music in favour of AI music (https://techcrunch.com/2024/09/19/indian-filmmaker-ram-gopal-varma-abandons-human-musicians-for-ai-generated-music/). But data is hardly needed - the fact that generative AI will compete with its training data is self-evident. Introducing a copyright exception here of any sort would mean British creators’ work could legally be used to build highly scalable competitors to them, without their permission.
What’s more, it would damage the growing market for licensing training data, because AI companies would have so much new, legal access to training data - including many works not opted-out simply because of a lack of knowledge of the rights reservation mechanism - that they would have less need of acquiring further data through licensing. The data licensing market has grown rapidly. 21 agreements have been announced between AI companies building generative AI models and major rights holders that are known to cover generative AI training; a further eight have been announced with strong suggestions they cover generative AI training, despite specifics not being disclosed; yet more give at least some sign of covering generative AI training, even if specifics are closely guarded; Thomson Reuters is known to have licensed content to an unknown number of AI companies; and AI music company Jen has licensed at least 40 music catalogs for AI training.
In reality, the number of agreements covering AI training is likely far greater than listed above, since it is probable that many agreements exist that have not been announced; according to Reuters, in “the opaque AI data market, [...] companies often don’t disclose agreements” (https://www.reuters.com/technology/inside-big-techs-underground-race-buy-ai-training-data-2024-04-05/).
As well as media companies, individuals are also licensing their works to AI companies for training: Bloomberg reports that several tech companies are licensing unpublished videos from YouTube creators (https://www.bloomberg.com/news/articles/2025-01-10/youtubers-are-selling-their-unused-video-footage-to-ai-companies?embedded-checkout=true).
Beyond known licensing agreements, multiple rights holders are on record saying they are willing to license to AI companies on the right terms, or are known to have entered negotiations. This includes music companies Sony Music Group (https://www.sonymusic.com/sonymusic/declaration-of-ai-training-opt-out/), Warner Music Group (https://www.wmg.com/wp-content/uploads/2024/07/WMG-Statement-Regarding-AI-Technologies.pdf), and Merlin (https://merlinnetwork.org/merlins-position-on-ai/); book publisher Wiley (https://www.publishersweekly.com/pw/by-topic/industry-news/industry-deals/article/96248-wiley-creates-ai-partnership-program.html); image and video hosting website Photobucket (https://www.reuters.com/technology/inside-big-techs-underground-race-buy-ai-training-data-2024-04-05/); sound effects and music company Soundsnap (https://www.soundsnap.com/music-audio-dataset-machine-learning); Spanish-language podcast producer Sonoro (https://www.hollywoodreporter.com/business/business-news/media-ai-startup-avail-corpus-monetization-product-1235945378/); and short-form video network Mad Realities (https://www.hollywoodreporter.com/business/business-news/media-ai-startup-avail-corpus-monetization-product-1235945378/).
And, even where licensing agreements are not public, it is known that negotiations for licensing have at least been entered. Notably, Apple is known to have opened negotiations for multiyear AI training deals worth at least $50M in late 2023 (https://www.nytimes.com/2023/12/22/technology/apple-ai-news-publishers.html).
Finally, polls show willingness among creators to license their work to AI companies for training. In a poll of 1,000 artists in 2023 run by DACS, 84% said they would sign up for a licensing mechanism to be paid when their work is used by AI (https://cdn.dacs.org.uk/uploads/documents/News/Artificial-Intelligence-and-Artists-Work-DACS.pdf).
There is more detail on the AI training data licensing market in this essay: https://ed.newtonrex.com/ai-licensing-market
It is important to note that a number of AI companies in fact do license all of their training data. Some of these can be seen here: https://www.fairlytrained.org/certified-models
And licensing training data isn’t the only way to get it. Licensed data can be combined with work that’s in the public domain - that is, work that copyright doesn’t protect, such as older creative works whose copyright has expired. There’s a dataset of public domain text called Common Corpus that contains 500 billion words, which is roughly the size of the entire dataset that was used to train OpenAI’s GPT-3 model (https://thealliance.ai/blog/pleias-releases-common-corpus-open-multilingual-dataset-for-llm-training).
Overall, a data mining exception would be hugely detrimental to the country’s creators and creative industries.
6. Do you support the introduction of an exception along the lines outlined in section C of the consultation?
Please give us further comments.
No.
More than 40,000 people, including many of the UK’s leading figures in the arts, have signed a statement I organised saying that unlicensed generative AI training is a “major, unjust threat” to people’s livelihoods (https://www.aitrainingstatement.org/). Yet the proposed exception would do precisely that: it would legalise unlicensed generative AI training. This would allow AI companies to build highly scalable competitors to the country’s creators by using their work, without asking permission, which would be incredibly damaging to the creative industries. The proposed rights reservation is unworkable and extremely unfair on creators and rights holders. The proposed changes to copyright law should be dropped if the government values our creative industries.
7. If so, what aspects do you consider to be the most important?
Please give us your views.
n/a
8. If not, what other approach do you propose and how would that achieve the intended balance of objectives?
Please give us further comments.
The best approach to achieve the intended balance of objectives is the existing approach, which is a requirement that training data be licensed. It is perfectly possible for AI companies to license the data they need: a number of AI companies already license all their training data. Rights holders are willing to license to AI companies: there have been many licensing deals between rights holders and AI companies already. Additional data can be sourced via the public domain. Licensing is of course slower and more expensive for AI companies than taking work without permission, but it’s still the right approach. It’s the only approach that is fair to both sides. Training data is a key resource required by AI companies, just like AI talent and GPUs. They spend huge amounts on AI talent and GPUs - why should they get training data for free? Particularly as generative AI competes with the work it’s trained on, and the people behind that work. This is really the key reason training data must be licensed: generative AI competes with its training data. Any exception that allows this without licensing is unacceptable.
9. What influence, positive or negative, would the introduction of an exception along these lines have on you or your organisation? Please provide quantitative information where possible.
Please give us your views.
Generative AI competes with me as a creator. However imperfect generative AI models are, they are so quick and cheap to use that it is inevitable that they will compete with me. Many creators are already feeling the effects of this. Introducing an exception would let them build highly scalable competitors to me, using my works to do so. The rights reservation mechanism would not protect me: I don’t believe it would be possible to effectively opt out my works from training, given that I wouldn’t be able to opt out downstream copies of my works, and given that even after I opted out it is likely that AI companies would not retire or retrain their models immediately. Further to this, it’s likely they would use models trained on my work to create synthetic data, which they would then use to train future models even after I’d opted out. So I want to be clear: I view generative AI as a competitor, and any exception as enabling that competitor to be built using my work against my will.
10. What action should a developer take when a reservation has been applied to a copy of a work?
Please give us your views.
I object to the introduction of the exception and rights reservation. But if it came into law, developers should immediately retire any model that uses my work. They should also retire any synthetic data generated by models that used my work, and any models trained on that synthetic data. Of course, I understand that this would require a huge amount of retraining - this serves to stress how inappropriate a rights reservation scheme is as a solution. If we instead stick with existing law, under which AI companies are required to get opt-in consent before training on people’s works, they will not face issues of having to retrain models when people opt out.
11. What should be the legal consequences if a reservation is ignored?
Please give us your views.
I object to the introduction of the exception and rights reservation. But if it came into law, the consequences should include large fines per work used, and the immediate banning of any offending models, products and works created by those models. Again, this serves to stress how inappropriate a rights reservation scheme is as a solution. If we instead stick with existing law, under which AI companies are required to get opt-in consent before training on people’s works, we will not face the extraordinary difficulty of policing open models that become banned and taking offline works created using models that have since been banned.
12. Do you agree that rights should be reserved in machine-readable formats? Where possible, please indicate what you anticipate the cost of introducing and/or complying with a rights reservation in machine-readable format would be.
Please give us your views.
No.
It would already be hard enough to reserve your rights, and most people would already miss the opportunity to do so; this would make it even harder. Reserving your rights in machine-readable formats would in many cases require an understanding of systems that I’m not otherwise expert in. A good example is robots.txt, which is machine-readable - 60% of artists still haven’t heard of it, and of those who have, only a vanishingly small percentage have actually managed to use it (https://arxiv.org/html/2411.15091v1). The cost to me would be the significantly increased likelihood that I wouldn’t be able to opt out, as well as the significant time investment of opting out all of my works via a mechanism that would be hard to understand.
13. Is there a need for greater standardisation of rights reservation protocols?
Please give us your views
I object to any rights reservation being required to stop AI companies training on copyrighted work. There are certainly too many rights reservation schemes, and all of them have huge issues that make them unworkable. But greater standardisation will not solve any problems for creators and rights holders. The fundamental issues with opt-out schemes will remain: they cannot be used to reliably opt-out downstream copies of works, they will always be so little-known that most people eligible to use them will not realise they can, they unfairly shift a huge administrative burden onto rights holders, they are not immediately observed by AI companies, etc.
14. How can compliance with standards be encouraged?
Please give us your views.
The government should enforce existing law, under which it is illegal to train commercial generative AI models on copyrighted work without a licence.
15. Should the government have a role in ensuring this and, if so, what should that be?
Please give us your views.
The law is clear: commercial generative AI training on copyrighted work without a licence is illegal. The government must ensure this law is enforced.
16. Does current practice relating to the licensing of copyright works for AI training meet the needs of creators and performers?
Please give us your views
If existing law is observed, the answer is largely yes, at least as regards training on copyrighted works directly. UK law currently requires that commercial generative AI training on copyrighted work be licensed, and this is the only reasonable solution for creators and performers. Luckily it is what we already have. Anything less than this means allowing AI companies to train highly scalable models on people’s work without their permission, models that will then compete with them. This is clearly unacceptable.
However, existing law may not cover training on synthetic data that is itself generated using models trained on copyrighted work without a licence. If that’s the case, the law should be strengthened to forbid this, since it will have the same negative effect on creators and rights holders as training directly on their work. If companies train models on synthetic data, any data that was used to train the models that created that synthetic data should have to adhere to the same rules as standard training data.
17. Where possible, please indicate the revenue/cost that you or your organisation receives/pays per year for this licensing under current practice.
Please provide further evidence.
I have not yet licensed my works to generative AI companies. I believe demand for training data has been lower than it would otherwise have been because AI companies have gambled that the government will not enforce existing law, resulting in them making weaker efforts to license data than they otherwise would have made.
18. Should measures be introduced to support licensing good practice?
Please give us your views
The most important measure to introduce to support licensing good practice is enforcing existing copyright law. This would be greatly helped by requiring AI companies to be fully transparent about the training data they use.
19. Should the government have a role in encouraging collective licensing and/or data aggregation services?
Yes.
20. If so, what role should it play?
Please provide further comments
I don’t believe collective licensing is the right solution, but if it is adopted then the government should ensure licensing requirements are adhered to. And multiple data aggregation services are springing up that make licensing training data even easier; the government should do everything in its power to support these.
The government should enforce existing copyright law and should introduce requirements for AI companies to be transparent about the training data they use. This will support a range of independent licensing efforts, from direct licensing to collective licensing to data aggregation services. Conversely, any weakening of copyright law will undermine all of these.
21. Are you aware of any individuals or bodies with specific licensing needs that should be taken into account?
Please give us your views.
Any weakening of copyright will disproportionately negatively impact small creators - those not represented by large companies or groups. They will be less likely to realise they can opt out, and less likely to have the resources to opt out.
There is data to support this. When Cloudflare analysed the top 1 million websites by number of visits, they found that the percentage blocking AI crawlers increased with increasing website visits (https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/). Similarly, the Data Provenance Initiative found that while 25% of data from the ‘highest-quality sources’ in some AI training sets had been restricted, only 5% of all data in those training sets had been restricted (https://www.nytimes.com/2024/07/19/technology/ai-data-restrictions.html). These findings support the logical hypothesis that smaller rights holders such as individual creators are less likely to make use of opt-out schemes, and means they can be highly disadvantaged by such schemes.
22. Do you agree that AI developers should disclose the sources of their training material?
Please give us your views.
Yes. There is essentially no reason for AI developers to hide the sources of their training material except to avoid copyright lawsuits. Many AI developers seek to use as much training data as possible, which generally means they all end up using training data from essentially the same sources. This means there is no real competitive advantage between AI developers in keeping their data sources secret. Transparency would greatly diminish copyright infringement. But it is important to note that transparency is far less helpful if it’s packaged up with a broad copyright exception, since the copyright exception would make training on most of the country’s creative output without permission legal.
23. If so, what level of granularity is sufficient and necessary for AI firms when providing transparency over the inputs to generative models?
Please provide further comments
AI firms should disclose all their sources of data, in a manner that lets any third party fully understand the training data they used. This means they should disclose any crawlers and scrapers they used, any automated rules by which they operated, the timeframe they operated for, and a list of URLs accessed with timestamps; datasets compiled by them or by third parties; and the parties involved in any licensing deals. They should also detail any synthetic data they used, and, if they created it themselves, the models used to create it and the data that was used to to train those models; or, if a third party created it, where it was accessed and at what date.
The training data that an AI company uses should be public information, rather than simply being shared with some central authority; sharing only with a central authority would not enable rights holders to police the use of their works effectively. It should include synthetic data used for training, since otherwise rights holders’ works might be used to create synthetic data for training as a way to circumvent the government’s requirements.
24. What transparency should be required in relation to web crawlers?
Please give us your views.
Any instructions under which the web crawler operated, the timeframe under which it operated, and a record of the URLs visited and the timestamps of those visits. Or, to put it more simply: enough information to let a third party get an accurate view of whether a certain URL was crawled.
25. What is a proportionate approach to ensuring appropriate transparency?
Please give us your views.
Require generative AI companies to make public a list of all of their sources of training data, in a manner that lets any third party fully understand the training data they used.
There really is no reason not to require full transparency regarding training data. The raw data used to train a model is not a company’s secret sauce - everyone is using essentially the same data.
What secret sauce there is in training data relates to how the training data is used: how it’s treated (e.g. is it augmented, how is it filtered, etc.). There is no reason AI companies should have to disclose this. But they should be required to disclose the raw data they use.
26. Where possible, please indicate what you anticipate the costs of introducing transparency measures on AI developers would be.
Please indicate the anticipated costs of transparency measures.
They would be absolutely minimal, since they already keep a record of the training data they use. If they didn’t, it would be hard to improve their models between versions.
And there would be no real competitive cost to revealing their raw training data.
27. How can compliance with transparency requirements be encouraged, and does this require regulatory underpinning?
Please give us your views
It should be required by law.
28. What are your views on the EU’s approach to transparency?
Please give us your views.
The EU’s Code of Practice for the AI Act is incredibly disappointing on transparency. The last I saw, transparency was only to a central authority, meaning rights holders have no real visibility over training data; it was simply a summary of the data, again reducing rights holders’ understanding of what has been used; and there is no mention of synthetic data at all.
29. What steps can the government take to encourage AI developers to train their models in the UK and in accordance with UK law to ensure that the rights of right holders are respected?
Please give us your views
The government can encourage a healthy data licensing market by enforcing existing law. This will ensure that the AI industry and the creative industries are symbiotic, and will make us the home of responsible AI development. It will serve AI companies well because it will avoid turning the country’s creators and creative industries against them, instead enabling the two sides to forge deep partnerships and work together on new products that will boost the economy. In addition, the government could invest in data centers, and provide AI companies with access to them; invest in AI education and training; give grants and tax breaks to AI companies; and encourage the best researchers to come to the UK through visa schemes.
30. To what extent does the copyright status of AI models trained outside the UK require clarification to ensure fairness for AI developers and right holders?
Please give us your views
The US has the fair use copyright exception, a nuanced copyright exception that is much more fair to rights holders than the UK’s proposal. Critically, fair use decisions take into account the market effect on the work that is copied. In the first and only decision so far on a case regarding this question in the US, the rights holders won, and the fair use defence failed (https://storage.courtlistener.com/recap/gov.uscourts.ded.72109/gov.uscourts.ded.72109.770.0.pdf). What is clear is that the current UK proposals would be far more punitive to rights holders than is the case in the US.
The EU’s AI Act does not specifically create an exception for generative AI training; rather, it reinforces previous directives, including a 2019 directive relating to text data mining. Insofar as it is interpreted as creating an exception for generative AI training, this will be considered a huge blow to the creative industries. Ultimately nothing will be certain until the Act is implemented by member countries.
Thankfully, the international Berne Convention is clear: countries can permit unlicensed copying “provided that the reproduction does not conflict with the normal exploitation of the work and does not unreasonably prejudice the legitimate interests of the author”. Generative AI competes with the work it’s trained on and the people behind that work, so a broad copyright exception for training would pretty clearly fall foul of this. For the record, I believe the UK government’s proposal would be in contravention of the Berne Convention for this reason.
Crucially, though, the UK only sets its own laws. We should lead by example, rather than copying the laws of our most creator-unfriendly neighbour.
31. Does the temporary copies exception require clarification in relation to AI training?
Please give us your views
No. The many instances of copying that occur during generative AI training do not fall under this exception.
32. If so, how could this be done in a way that does not undermine the intended purpose of this exception?
Please provide further comments
n/a
33. Does the existing data mining exception for non-commercial research remain fit for purpose?
Please give us your views
Yes, as long as AI companies don’t use it as a loophole. If non-commercial research organisations gather data or train models, that gathered data and those trained models must not be used by commercial AI developers.
34. Should copyright rules relating to AI consider factors such as the purpose of an AI model, or the size of an AI firm?
Please give us your views
No: copyright law should continue to protect creators’ rights irrespective of these.
—
35. Are you in favour of maintaining current protection for computer-generated works? If yes, please explain whether and how you currently rely on this provision.
Please give us your views
No.
Many computer-generated works are created using a simple text prompt, using AI models that are trained on other people’s work, to compete with those people. People simply prompting a model should not be afforded copyright protection.
36. Do you have views on how the provision should be interpreted?
Please give us your views
The provision says the author is the person “by whom the arrangements necessary for the creation of the work are undertaken”. The providers of the training data are arguably the most critical people in the process; without training data, these models would not work. They should be considered as having undertaken arrangements necessary for the creation.
37. Would CGW legislation benefit from greater legal clarity, for example to clarify the originality requirement? If so, how should it be clarified?
Please give us your views
Yes.Computer-generated works that are the result simply of a text prompt should not be given copyright protection.
38. Should other changes be made to the scope of CGW protection?
Please give us your views
n/a
39. Would reforming the CGW provision have an impact on you or your organisation? If so, how? Please provide quantitative information where possible.
Please give us your views
Minor negative impact.
The reforms proposed are reforms in the wrong direction. Instead, if it were reformed, it should be reformed in a way that removes copyright from computer-generated works where user input was minimal, or assigns some share of the copyright to the creators of the training data. This would remove a key incentive for people to use AI models that may well be trained on my work to create works that compete with mine.
40. Are you in favour of removing copyright protection for computer-generated works without a human author?
Please give us your views
Yes.
It would remove a key incentive for people to use AI models, which may well be trained on my work, to create works that compete with mine. However, it is not very helpful if packaged up with a broad copyright exception for generative AI training, since such an exception would be so economically detrimental to the creative industries.
41. What would be the economic impact of doing this? Please provide quantitative information where possible.
Please provide further comments
It would remove a key incentive for people to use AI models, which may well be trained on my work, to create works that compete with mine. However, it is not very helpful if packaged up with a broad copyright exception for generative AI training, since such an exception would be so economically detrimental to the creative industries.
42. Would the removal of the current CGW provision affect you or your organisation? Please provide quantitative information where possible
Please give us your views
Minor negative effect.
I would expect removing the provision entirely to confuse matters. It would be better to reform it.
43. Does the current approach to liability in AI-generated outputs allow effective enforcement of copyright?
Please give us your views
No.
I disagree with the statement in the consultation that generative AI “is unlikely to generate a copy of a specific work that it was trained on”. There are many documented instances of precisely this happening, and it is a well-known phenomenon of generative models. For instance:
https://spectrum.ieee.org/midjourney-copyright
https://www.musicbusinessworldwide.com/yes-udios-output-resembles-copyrighted-music-too/
I also disagree with the statement that “it is not clear how effective [steps to avoid outputting protected works] are in practice.” It is in fact clear that they are ineffective, since I believe every major generative AI model has been shown to replicate its training data.
The current approach is generally good, in that users and AI developers may both be responsible for infringing outputs. However, it is important to be clear: if infringing output arises without the user intending it, liability should be squarely on the AI developer. Moreover, if the user does intend it but the AI developer did not license the replicated works in question, or did not put in place adequate safeguards to block infringing outputs, the AI developer should share liability.
44. What steps should AI providers take to avoid copyright infringing outputs?
Please give us your views
The simplest thing to do is not train on copyrighted works without a licence. On top of this, they should scan user’s prompts, put in place system prompts that deter infringing output, and scan the model’s output.
45. Do you agree that generative AI outputs should be labelled as AI generated? If so, what is a proportionate approach, and is regulation required?
Please give us your views
Yes.
Some labelling is clearly needed. Since it would be so complex to set some kind of threshold at which labelling is required, it is simplest if all outputs that include material from a generative AI model are labelled. Regulation is required for this since it is not being done voluntarily.
46. How can government support development of emerging tools and standards, reflecting the technical challenges associated with labelling tools?
Please give us your views
Require that all content that includes AI-generated output be labelled. This would get around most technical challenges of labelling.
47. What are your views on the EU's approach to AI output labelling?
Please give us your views
We should set our own standards and lead by example.
48. To what extent would the approach(es) outlined in the first part of this consultation, in relation to transparency and text and data mining, provide individuals with sufficient control over the use of their image and voice in AI outputs?
Please give us your views
They would not. Separate legislation may be required for controlling the use of people’s image and voice in AI outputs. This should be legislated separately - there is no reason to confuse the two.
49. Could you share your experience or evidence of AI and digital replicas to date?
Please give us your views
Close likenesses to specific creators of all types have been prevalent across many of the major generative AI platforms, including image and voice likenesses. What’s more, this content has regularly surfaced on social media.
50. Is the legal framework that applies to AI products that interact with copyright works at the point of inference clear? If it is not, what could the government do to make it clearer?
Please give us your views
I’m not sure what the existing legal framework says about this. But clearly any use of copyrighted works at inference must also be licensed.
51. What are the implications of the use of synthetic data to train AI models and how could this develop over time, and how should the government respond?
Please give us your views
If AI developers use synthetic data to train their models, and this synthetic data comes from models themselves trained on copyrighted work without a licence, AI developers are essentially laundering copyright. This must be stopped. The way to do this is to require that any data used to train models used to create synthetic data also be licensed, and ensure that transparency requirements cover sources of synthetic data.
52. What other developments are driving emerging questions for the UK’s copyright framework, and how should the government respond to them?
Please give us your views
I would recommend you heed the general public’s views on this topic. The general public doesn’t agree with AI companies’ views on what they can train on. One poll from the AI Policy Institute in April this year asked people about the common policy among AI companies of training on what they call ‘publicly available’ data. This means content that’s openly available online, of which a lot is of course copyrighted - it includes articles that aren’t paywalled, it includes pirated content. 60% of people said AI companies shouldn’t be allowed to train on this, versus only 19% who said they should. The same poll went on to ask whether AI companies should compensate data creators for training. 74% said they should, versus only 9% who said they shouldn’t. (The poll is here: https://theaipi.org/poll-biden-ai-executive-order-10-30-7-2-4-2-2-2/.) Time and time again, when the public is asked these questions, it shows support for requiring permission and payment for training on people’s work, and a rejection of the notion that something being publicly available makes it fair game.
I also think it’s important to remember that copyright reform isn’t needed for the UK to be a leader in AI. AI companies like to elide all of AI together, and suggest you need to deregulate all of it if you want any progress. But this simply isn’t true.
The major economic opportunity from AI doesn't come from exploiting the life’s work of the world’s creators without their permission. The AI work that Sir Demis Hassabis won the Nobel Prize in Chemistry for, AlphaFold, wasn't trained on creative work. There has not been a single important scientific discovery that has come from AI trained on creative work. You train on creative work you haven’t licensed if you want to replace the creative industries with AI without paying them, not if you want to cure cancer.
We should invest in data centers, and provide AI companies with access to them. We should invest in AI education and training. We should give grants and tax breaks to AI companies. We should encourage the best researchers to come to the UK through visa schemes.
We can be world leaders in AI for healthcare, defence, logistics, and science. As regards AI in the creative industries, we can be the home of responsible AI development and responsible AI companies. We can do all this without destroying our creative industries, which are also world-leading, by upending copyright law.