The Legislation Game:
Introduction to Legal Issues in Artificial Intelligence and Large Language Models
Paweł Kamocki
ESSAI Summer School 2024, 15-19.07.2024
Introducing CLARIN
�
https://www.youtube.com/watch?v=ut9wOIYWDfc
https://www.clarin.eu/content/clarin-in-a-nutshell
CLARIN
3
CLARIN for Open Science
4
FAIR Principles
5
Findable
Accessible
Interoperable
Reusable
Key elements
CLARIN’s Macroscope Potential
6
Source: Rosnay, 1979
CLARIN’s countries and centres
7
7
How CLARIN works?
8
Virtual Language Observatory
9
Upload a text to find a matching tool for NLP tasks. It can be accessed directly from the VLO.
10
CLARIN Resource Families https://www.clarin.eu/resource-families
11
“Legislation game”
Jeff Koterba/Cagle Cartoons
https://www.duluthnewstribune.com/opinion/columns/national-view-voters-fear-regulation-of-ai-so-far-is-insufficient
12
Legal reasoning
All men are mortal. [Major premise]
Socrates is a man. [Minor premise]
Therefore, Socrates is mortal [Conclusion]
Reproduction is a copyright-restricted act
Training AI models necessitates acts of reproduction
Therefore, training AI models is a copyright-restricted act
13
Course Outline
14
Copyright primer I
Q1: Why copyright?
Q2: Today, is copyright more important than before? Why?
15
Copyright primer II
What is protected? [subject matter]
How long does copyright protection last? [term]
16
Copyright: exclusive rights
Q1: Is internet scraping a copyright-relevant act?
Q2: Can AI be trained without making reproductions?
Q3: Is communication to the public relevant in AI training/use?
17
Permission (license)
18
Content licenses (CC)
19
Software licenses (FOSS)
20
Copyright in AI training
21
Copyright exceptions (in general)
22
US: the fair use doctrine
23
US: the fair use doctrine
24
EU: Text and Data Mining Exceptions in the DSM Directive
‘text and data mining’ means any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations (Article 2(2) DSM)
Q: Are exceptions for AI training a good thing? Why and why not?
25
Copyright and AI models
Q: Considering what you already know about copyright, are AI models (≠ AI systems) protected by copyright? Should they?
hint: idea vs. expression
nevertheless: licensing models is common practice
proprietary models available through APIs with Terms of Use
26
Copyright in AI outputs:
Are AI outputs copyright-protected works?
Q: If AI-generated works were protected by copyright, who would be the rightholder? (user? provider? AI itself?)
27
Copyright in AI outputs:
Position of the US Copyright Office I
“A recent entrance to paradise”
28
Copyright in AI outputs:
Position of the US Copyright Office II
“Théâtre d’opéra spatial”
29
Copyright in AI outputs:
Position of the US Copyright Office III
“Zarya of the Dawn” (graphic novel)
30
Copyright in AI outputs: grey areas
31
Copyright in AI outputs: the “ownership gap”
32
Sui generis right in AI outputs?
33
Proposed reform in France (2023)
34
Lawsuits concerning copyright in AI:
Getty Images vs. Stability AI (UK)
35
Lawsuits concerning copyright in AI:
New York Times vs. Open AI (US)
Meanwhile in China (February 8, 2024, Guangzhou Internet Court): a court found an AI provider guilty of copyright infringement after its text-to-image tool generated images of Ultraman (cartoon character) substantially similar to the original
36
PART II: Data Protection Issues in AI Training
37
Data protection: a primer
38
Data protection: terminology
39
Data protection principles (overview)
Article 5 GDPR
40
Data protection principles:
Lawfulness
41
Data protection principles:
Transparency
42
Data protection principles:
Purpose Limitation, Data Minimisation
43
Data protection principles:
Accuracy, Storage Limitation
44
Data protection principles:
Security (cf. also Articles 32-34 GDPR)
45
Data protection principles:
Accountability
46
Rights of data subjects (overview)
47
Freedom from automated individual decision-making (Article 22 GDPR)
48
GDPR compliance in AI training
49
50
Defining the legal status of various stakeholders 1/2
51
Defining the legal status of various stakeholders 2/2
52
Defining the legal basis 1/2
53
Defining the legal basis 2/2
54
Data Protection Impact Assessment (DPIA)
requires a DPIA if it consists of personal data processing
55
Data Protection by Design and by Default
56
AI Act
57
AI Act: AI governance
58
AI Act: classification of AI
59
AI system
AI model
Prohibited AI systems
60
High-risk AI systems
61
High-risk AI systems: obligations of providers
62
Transparency obligations (Chapter III)
must be designed in such a way that the natural persons concerned are informed that they are interacting with an AI system
unless it is obvious
the outputs of the AI system are marked in a machine-readable format and detectable as artificially generated or manipulated.
…as far as technically feasible…
deployers shall inform the exposed persons about the functioning of the system
-AI systems that generate or manipulate image, audio or video content constituting a deep fake
deployers shall disclose that the content has been artificially generated or manipulated
when content forms part of an evidently artistic, creative, satirical, fictional work: disclosure in a manner that does not hamper the display or enjoyment of the work.
-AI systems that generate or manipulate text which is published with the purpose of informing the public on matters of public interest
deployers shall disclose that the content has been artificially generated or manipulated
UNLESS content has undergone a process of human review or editorial control and where a natural or legal person holds editorial responsibility for the publication
63
General-purpose AI models (Chapter IV)
64
GPAI models with systemic risks
65
GPAI models: obligations of providers
66
Additional obligations of providers of GPAI models with systemic risks
67
68