1 of 68

The Legislation Game:

Introduction to Legal Issues in Artificial Intelligence and Large Language Models

Paweł Kamocki

ESSAI Summer School 2024, 15-19.07.2024

2 of 68

Introducing CLARIN

�

https://www.youtube.com/watch?v=ut9wOIYWDfc

https://www.clarin.eu/content/clarin-in-a-nutshell

3 of 68

CLARIN

Common Language Resources and Technology Infrastructure
ESFRI roadmap 2006, ESFRI ERIC status 2012, Landmark 2016
Easy and sustainable access for scholars in SSH
digital language data (written, spoken, video or multimodal)
tools to discover, analyse, combine data wherever they are located
single sign-on environment (you all can get an account)
Ecosystem for knowledge exchange
Some services integrated in EOSC

4 of 68

CLARIN for Open Science

Promotion of sharing & re-use of language data through sustainable data registries
Enhancement & deployment of interoperability of language data & services

common metadata framework
distributed network of FAIR certified data repositories for language data

Promotion of

comparative perspectives
multidisciplinary collaboration
transnational research
responsible data science

Support for linguistic diversity

data covering many languages
tools for many languages
language resources in all modalities
discipline- & language-agnostic

5 of 68

FAIR Principles

Findable

Accessible

Interoperable

Reusable

Key elements

Persistent Identifiers (PIDs)
Data Management Plan
Metadata
Licences
Repositories

6 of 68

CLARIN’s Macroscope Potential

Source: Rosnay, 1979

7 of 68

CLARIN’s countries and centres

A consortium of type ERIC

24 members
2 observers
1 linked party

A distributed network of 70 centres

21 CTS certified data centres
Strong focus on FAIRness & interoperability

Federated login
Central metadata harvesting for easy discovery
Chained services

25 Knowledge Centres

8 of 68

How CLARIN works?

9 of 68

Virtual Language Observatory

https://vlo.clarin.eu

Facet search
Links to landing pages
Download options
Details on licences
Details on technical features
Overview of tools that match the data
Citing LRs:

10 of 68

Language Resource Switchboard

https://switchboard.clarin.eu/

Upload a text to find a matching tool for NLP tasks. It can be accessed directly from the VLO.

11 of 68

CLARIN Resource Families https://www.clarin.eu/resource-families

12 of 68

“Legislation game”

Jeff Koterba/Cagle Cartoons

https://www.duluthnewstribune.com/opinion/columns/national-view-voters-fear-regulation-of-ai-so-far-is-insufficient

13 of 68

Legal reasoning

All men are mortal. [Major premise]

Socrates is a man. [Minor premise]

Therefore, Socrates is mortal [Conclusion]

Reproduction is a copyright-restricted act

Training AI models necessitates acts of reproduction

Therefore, training AI models is a copyright-restricted act

14 of 68

Course Outline

Copyright issues in AI (training, models, outputs)
Data protection issues (GDPR) in AI training
AI Act (+ European Strategy for Data)

15 of 68

An intellectual property right that protects creative works against uses (“copying”) not authorised by their authors

Sources of copyright:

International

Berne Convention 1886

European

Directive on Copyright in Information Society (InfoSoc) 2001
Directive on Copyright in Digital Single Market (DSM) 2019

National

national Copyright Acts

Q1: Why copyright?

Q2: Today, is copyright more important than before? Why?

16 of 68

What is protected? [subject matter]

Literary (incl. software), artistic and scientific works

Works (expressions), not ideas

taste of cheese is not protected (CJEU, Levola Hengelo)

Incorporeal asset (independent of the physical carrier)

Condition: originality

UK (historically): “labour, skill and judgement”
US: “original = not copied”
EU: “author’s own intellectual creation”

author’s personality is expressed in free and creative choices
expression not dictated by rules and constraints

BUT “multiplicity of shapes” (CJEU, Brompton bicycle 2020)

Collections (compilations), e.g. datasets
Exclusions (in some countries) for “official works”

How long does copyright protection last? [term]

EU, US, JP…: 70 years after the death of the author (Life+70)

minimum: Life + 50 (e.g. China, Belarus, Iran, ...)
maximum: Life + 100 (Mexico)

17 of 68

Exclusive rights: permission required!
Moral rights [Berne Convention, Article 6 bis]

Attribution
Integrity

Exploitation (or: economic) rights

EU: rights harmonised in the InfoSoc Directive:

Reproduction (copying) [Article 2]

direct or indirect
temporary or permanent
by any means and in any form
in whole or in part (CJEU (Infopaq): 11 consecutive words)

Communication to the public (sharing) [Article 3]

transmission (broadcast) OR
making available to the public

Distribution

Not harmonised at the EU level: Adaptation

Derivative work: original elements from a preexisting work + new original elements
“without prejudice” to copyright in the preexisting work

Q1: Is internet scraping a copyright-relevant act?

Q2: Can AI be trained without making reproductions?

Q3: Is communication to the public relevant in AI training/use?

18 of 68

Permission (license)

License vs. transfer
Individually granted licenses vs. public licenses

individually granted: from person A (author) to person B (e.g. a registered user, Terms of Service), usually purpose-specific
public: from person A (author) to the general public (e.g. CC)

Proprietary licenses vs. open licenses

Open: free to use by anyone and for any purpose

Licenses can be limited as to:

purpose
granted rights
territorial scope
duration (max: term of copyright)
exclusivity
sublicenseability

19 of 68

Content licenses (CC)

4 building blocks:

BY: attribution (in every license)
SA: share-alike (“viral”)
ND: no derivatives
NC: non-commercial (open)

BY, BY-SA (Wikipedia), BY-NC, BY-ND, BY-NC-SA, BY-NC-ND
other tools: CC0 (waiver), Public Domain Mark

20 of 68

Software licenses (FOSS)

Free/Open Software (FOSS) licenses

all: access to the source code and the right to modify it
copyleft (viral):

strong (GPL) or
weak (LGPL)

permissive (non-viral): MIT, BSD, Apache

21 of 68

AI training entails reproduction of data
no copyright in training data if, e.g.:

pure facts (e.g., data from measurements)
human creations that are not original (e.g.: too short, banal)
works expressly excluded from copyright (e.g. official works in some countries)
works in which copyright expired (author died +70 years ago)
AI-generated data

training data protected by copyright:

no copyright issues (the rights are executed by the rightholder)
e.g. data generated by employees

data are licensed, e.g. under CC licenses, terms of service

note: are NC and ND requirements in CC licenses compatible with AI training?

data are not licensed (e.g. scraped data)

exception needed

22 of 68

three-step test for exceptions (Article 9(2) of the Berne Convention)

Certain special cases
Do not conflict with a normal exploitation of the work
Do not unreasonably prejudice the legitimate interests of the authors

EU harmonisation of exceptions

Article 5 of the InfoSoc Directive (2001)

limitative list of exceptions, e.g. temporary copy, private copy, quotation, research, for libraries, for people with disabilities…

2019 DSM Directive: exceptions for Text and Data Mining (TDM)

US: fair use doctrine (§107 of the US Copyright Act)

23 of 68

US: the fair use doctrine

UK (historically): fair abridgement → fair dealing
4 criteria in §107 US Copyright Act (since 1976):

Purpose and character of the use

commercial vs. non-commercial → derivative vs. transformative

Nature of the work

unpublished vs. published, fiction vs. non-fiction

Amount and substantiality of the part used

the less the better, but use of entire work still possible

Effect on the (potential) market value of the work

risk of substitution

can be used as a “shield”, but also as a “sword” (Lenz v. Universal 2015)
“technology-friendly” applications:

mass digitisation (Google Books 2015, Hathi Trust 2012)

transforming text into digital data
new search capabilities → new methods of scientific inquiry

use of APIs in software development (Google v. Oracle 2021)

24 of 68

US: the fair use doctrine

UK (historically): fair abridgement → fair dealing
4 criteria in §107 US Copyright Act (since 1976):

Purpose and character of the use

commercial vs. non-commercial → derivative vs. transformative

Nature of the work

unpublished vs. published, fiction vs. non-fiction

Amount and substantiality of the part used

the less the better, but use of entire work still possible

Effect on the (potential) market value of the work

risk of substitutionÉ

can be used as a “shield”, but also as a “sword” (Lenz v. Universal 2015)
“technology-friendly” applications:

mass digitisation (Google Books 2015, Hathi Trust 2012)

transforming text into digital data
new search capabilities → new methods of scientific inquiry

use of APIs in software development (Google v. Oracle 2021)

25 of 68

EU: Text and Data Mining Exceptions in the DSM Directive

definition of TDM:

‘text and data mining’ means any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations (Article 2(2) DSM)

Article 4 DSM: TDM exception for the “general public”

condition 1: lawful access to the work

Recital 14: license/subscription OR free availability online

condition 2: use for TDM has not been expressly reserved in “appropriate manner” (e.g. with machine-readable means)
reproductions may be retained “for as long as necessary” for TDM purposes ( :( )

Article 3 DSM: TDM exception for scientific research

beneficiaries: research organisations, cultural heritage institutions
condition: lawful access to the work (cf. above)
reproductions may be retained “with appropriate level of security” for further research or validation of results
rightholders may apply technological protection measures “to protect the security and integrity of [their] networks and databases

both exceptions override contracts

Q: Are exceptions for AI training a good thing? Why and why not?

26 of 68

Q: Considering what you already know about copyright, are AI models (≠ AI systems) protected by copyright? Should they?

hint: idea vs. expression

nevertheless: licensing models is common practice

dedicated RAIL licenses

proprietary models available through APIs with Terms of Use

ToU (binding contracts) affect the use of the API, as well as the outputs of the model
e.g. OpenAI ToU: “You may not (...) Use Output to develop models that compete with OpenAI”.

27 of 68

Are AI outputs copyright-protected works?

Problem: no human authorship

requirement of the Berne Convention?

death, nationality, honor, reputation of the author

copyright term refers to the death of the author
BUT: “work for hire” doctrine in certain countries (US, UK)

EU Software Directive admits that a legal entity can be considered author of software

No human ownership = no originality (author’s own intellectual creation)
no originality if the expression of the work is dictated by technical considerations (CJEU)
skill (e.g. in prompting) is not enough to claim copyright (CJEU)

Q: If AI-generated works were protected by copyright, who would be the rightholder? (user? provider? AI itself?)

28 of 68

Position of the US Copyright Office I

“A recent entrance to paradise”

AI-generated
US Copyright Office refused registration (2022)
confirmed by District Court (2023)
“absent any human involvement”

29 of 68

Position of the US Copyright Office II

“Théâtre d’opéra spatial”

Award-winning
Generated by Midjourney
User input “at least 624 prompts” + modifications in Photoshop
registration refused (2023)
“insufficient authorship”

30 of 68

Position of the US Copyright Office III

“Zarya of the Dawn” (graphic novel)

written by Kris Kashtanova
illustrated by Midjourney
US Copyright Office (2023):

book as a whole (plot, dialogues) admitted for registration
individual images refused protection

31 of 68

Role of the user

Machine-generated vs. machine-assisted?

difference: the degree of human involvement
nowadays almost every work is machine-assisted

The concept of authorship is bound to evolve with technological progress

example: photography

US Copyright Office Policy Statement (2023)

criterion: have the traditional elements of authorship been conceived and executed by a man, or by a machine?
mere prompting is not enough to justify authorship
copyright protection possible if AI-generated outputs were creatively arranged or modified by a human

Relation with the input data (adaptation/derivative work?)

regurgitation and “training data extraction attacks”

32 of 68

ChatGPT: (co-)author of hundreds of books on Amazon

how many books are “secretly” AI-generated?

copyfraud – false copyright claim in public domain content
presumption of authorship (Article 15(1) Berne Convention, Article 5 of the EU Enforcement Directive) for those whose name “appear on the work in the usual manner”
BUT OpenAI ToU: “you are prohibited from (...) representing that Output was human-generated when it was not”

Article 50(2) AI Act: 2. Providers of AI systems, including general-purpose AI systems, generating synthetic audio, image, video or text content, shall ensure that the outputs of the AI system are marked in a machine-readable format and detectable as artificially generated or manipulated.

feasibile…?

33 of 68

Sui generis right in AI outputs?

H. Demsetz, Toward a Theory of Property Rights, 1967: technological development → new property rights

legal certainty in transactions
prevent market failure

European Parliament 2020: AI-outputs ‘must’ be protected under Intellectual Property Rights in order to encourage investment and improve legal certainty
European Commission 2023: “the issue of AI-generated works does not deserve a specific legislative intervention”
UK Copyright, Designs and Patents Act (CDPA) 1988

computer-generated works: works generated by computer in circumstances such that there is no human author of the work
sui generis IP right (called “copyright” in the CDPA), for 50 years after their creation
‘author’: the person by whom the arrangements necessary for the creation of the work are undertaken
almost never used in courts, ‘unclear and contradictory’
reform proposal 2022 (failed)

34 of 68

Proposed reform in France (2023)

Proposition de loi visant à encadrer l’intelligence artificielle par le droit d’auteur, No. 1630 (12 septembre 2023)
Author’s permission necessary to integrate a copyright-protected work in an AI system

opt-in instead of opt-out?
contradiction of the TDM exceptions?

Copyright in AI-generated outputs should belong to the authors of works that enabled its generation (names marked in the output)

practical application…?
violation of EU law!

Collective rights management for AI-generated works

designated organisation perceives remuneration and redistributes it among entitled authors
levy collected for works used in AI systems whose authorship cannot be determined

35 of 68

Lawsuits concerning copyright in AI:

Getty Images vs. Stability AI (UK)

Stability AI: London-based provider of generative AI tools (incl. Stable Diffusion)

claim I: unlawful use of scraped data for AI training

note: no “commercial TDM” exception in the UK

claim II: infringement of copyright in those images by reproduction of substantial parts

hearing expected in 2025

36 of 68

Lawsuits concerning copyright in AI:

New York Times vs. Open AI (US)

claim: unlawful use of NYT articles to train the GPT model
allegation 1: GPT can regurgitate near-verbatim copies of NYT articles
allegation 2: when prompted, GPT can produce long excerpts of NYT articles than search engines, allowing paywall circumvention (impact on the market value)
Open AI statement:

AI training is fair use (transformative);
memorisation (and regurgitation) is a bug (not a feature), and a result of intentional manipulation

Meanwhile in China (February 8, 2024, Guangzhou Internet Court): a court found an AI provider guilty of copyright infringement after its text-to-image tool generated images of Ultraman (cartoon character) substantially similar to the original

37 of 68

PART II: Data Protection Issues in AI Training

38 of 68

Data protection: a primer

Main source: General Data Protection Regulation (GDPR)

not new: data protection laws in Germany, France since mid-1970s
repealed the Data Protection Directive 1995
adopted 2016, applicable since 25 May 2018
became an international standard
applies to data processing if:

the controller is established in the EU OR
the processing is related to providing goods and services to or monitoring the behaviour of individuals in the EU

GDPR does not apply to processing for “prevention, investigation, detection or prosecution of criminal offences or the execution of criminal penalties” (e.g. video surveillance by the police)

Law Enforcement Directive 2016

ePrivacy Directive 2002 (amended 2009)

unsolicited emails, cookies, traffic data…
ePrivacy Regulation proposed in 2017 (stuck)

39 of 68

Data protection: terminology

Personal data:

any information… (fact/opinion, true/false, any format)
…related to… (by content / by purpose / by result)
…an identified or identifiable… (possible to single out by any means reasonably likely to be used)

reasonably likely, taking into account costs of and the amount of time required for identification (at the time of processing)

…natural person (living individual, a.k.a. data subject).

Sensitive data (racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, biometric data, genetic data, health, sex life and sexual orientation) — processing forbidden in principle [Article 9 GDPR]
Processing: any operation or set of operations on personal data
Anonymisation (permanent, irreversible) vs. pseudonymisation (reversible)
Controller: person (…) [or] body which, alone or jointly with others, determines the purposes and [essential] means of [processing]
Processor: person (…) [or] body which processes personal data on behalf of the controller

Data Processing Agreement (controller — processor)

Data Protection Officer: liaison between the controller and the supervisory authority

40 of 68

Data protection principles (overview)

Article 5 GDPR

Lawfulness, fairness and transparency
Purpose limitation
Data minimisation
Accuracy
Storage limitation
Integrity and confidentiality (a.k.a. Security)
Accountability

41 of 68

Data protection principles:

Lawfulness

legal basis needed (list in Article 6 GDPR), e.g.:

consent

any freely given, specific, informed and unambiguous indication of the data subject’s wishes by which he or she, by a statement or by a clear affirmative action, signifies agreement to the processing of personal data relating to him or her
can be withdrawn at any time (not retroactively)

legitimate interest

balancing test taking into account: the nature of the data, data subject’s reasonable expectations, impact on the data subject
right to object

performance of a contract, public interest (specific provision)

42 of 68

Data protection principles:

Transparency

Right to information

data subject has to be informed about e.g.

identity of the controller
purposes of the processing
legitimate interests pursued (if applicable)
his or her rights (incl. right to file a complaint)
recipients (persons or bodies who will have access to the data)
duration of storage or criteria used to determine it

derogation: if disproportionate effort required OR if provision of information may impair the objectives of research (the information should then be made publicly available)

Right of access

43 of 68

Data protection principles:

Purpose Limitation, Data Minimisation

Purpose limitation

Data should be collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes;
exception (purpose extension): further processing for research purposes is not to be considered incompatible with the initial purpose
E.g. archiving is a different purpose from research

Data minimisation

Data should be adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed
no exceptions

44 of 68

Data protection principles:

Accuracy, Storage Limitation

Accuracy

Data should be adequate, accurate and, where necessary, kept up to date
Right of rectification

Storage Limitation

Data should be kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed
exception: research and archiving in public interest (with safeguards and appropriate technical and organisational measures) — storage for longer periods possible (not: indefinite)

45 of 68

Data protection principles:

Security (cf. also Articles 32-34 GDPR)

Data should be processed in a manner that ensures appropriate security of the personal data, including protection against unauthorised or unlawful processing and against accidental loss, destruction or damage, using appropriate technical or organisational measures

backup copies, stress-tests
Data Breach Policy:

documentation of all data breaches
notification to the supervisory authority within 72 hours (if any risk for data subjects)
communication to the data subject (if high risk for data subjects)

46 of 68

Data protection principles:

Accountability

The controller shall be responsible for, and be able to demonstrate compliance with all data protection principles

Data protection ‘by design and by default’ [Article 25 GDPR]
Record of data processing activities [Article 30 GDPR]
Data Protection Impact Assessment (self-assessment) [Article 35 GDPR]

mandatory in some cases (new technologies, high risks, large quantities of data), always advisable (especially in research context)
documented; in consultation with the Data Protection Officer
assessment of necessity and proportionality
identification of risks (what if everything goes wrong?) and measures to address them

47 of 68

Rights of data subjects (overview)

Right to be provided with information (cf. Transparency principle)
Right of access

limitations possible for research purposes

Right to rectification (cf. Accuracy principle)
Right to erasure (‘Right to be forgotten’) or restriction

if processing violates GDPR principle(s)

Right to data portability

to receive the data he/she provided to the controller in a structured, commonly used and machine-readable format and transmit it to another controller (e.g. switching service providers)

Right to object (processing based on legitimate/public interest)

controller can still demonstrate “compelling legitimate grounds”

Freedom from automated individual decision-making, including profiling

48 of 68

Freedom from automated individual decision-making (Article 22 GDPR)

Principle: General prohibition on fully automated (without any human intervention) individual decision-making (including profiling) that has a legal or similarly significant effect (e.g. financial situation, health, employment, access to education)
Exception if:

necessary to perform or enter into a contract OR
expressly authorised by law OR
data subject gave explicit consent
+ safeguards, including at least:

the right to contest the decision and to obtain human intervention

Transparency requirements (incl. meaningful information about the logic involved)
WP29 Guidelines

49 of 68

GDPR compliance in AI training

(CNIL’s AI how-to sheets)

Define the purpose
Define the legal status of the stakeholders
Define the Legal basis
Data Protection Impact Assessment
Data Protection by Design and by Default

50 of 68

Defining a purpose

purpose should always be specified, explicit and legitimate
purpose limitation, transparency, data minimisation, storage limitation
when future applications can be defined from the development stage (purpose-specific systems): AI training and deployment can have one and the same purpose

e.g., monitoring of train traffic

future applications cannot be defined at the development stage: general-purpose AI

CNIL: correctly defined purpose refers to:

types of system developed AND
examples of potential applications (esp. high-risk ones)
functionalities excluded by design
conditions of use/distribution (e.g. Open Source)

Ex. (CNIL): an organisation wishes to develop a voice recognition model capable of identifying a speaker and his/her language in order to commercialise it for different operational uses in the production phase (e.g. tools for identifying people by voice assistants or voice translation applications on a mobile device, etc.).

AI developed for research purposes: lower degree of precision is acceptable

51 of 68

Defining the legal status of various stakeholders 1/2

Case-by-case analysis necessary
The controller:

decides why (purpose) and generally how (essential means) the system will be trained
is at the initiative of development AND
created the training dataset OR
entrusted this task to a service provider with sufficiently detailed documented instructions OR
decided to use a pre-existing dataset for training (developed by another controller) OR
decided to use a pre-existing model
Ex.: where social media data are used for AI-training, the SM platform provider is not the controller (even though it may lay down conditions for re-use)

52 of 68

Defining the legal status of various stakeholders 2/2

Joint controllers

e.g. consortium
processing for a common purpose or for own purpose

Q: who benefits from the processing?

Processor

Develops an AI system for a client (controller) OR
Collects training data according to documented instructions
signs a Data Processing Agreement with a controller
BUT: If uses the system/dataset also for own purposes -> separate processing (as controller)

53 of 68

Defining the legal basis 1/2

Case-by-case analysis
Consent?

validity criteria: freely given, specific, informed, unambiguous
the right to withdraw cannot be guaranteed = not an appropriate basis
granularity (per purpose)
unfeasible with scraped data

Legitimate interest?

is the pursued interest legitimate? (i.e., not illegal)
is the processing of personal data necessary? (can anonymised/synthetic data be used instead?)
is there a disproportionate impact on the rights and freedoms of data subjects?

Ex. training a system predicting one’s psychological profile on online data related to this person would be excessive

implement measures to minimize the impact: pseudonymisation, exclusion of sensitive data, elaborated selection criteria (data minimisation)

54 of 68

Defining the legal basis 2/2

Public interest?

only if based on a specific legal provision (normative text)
may be available for public research institutes

Necessary for the performance of a contract?

only available in very limited cases (e.g. subscription to a personalised email generation service)
Terms of Service of a social network are not an appropriate legal basis for reusing data for AI training, as such reuse is not necessary to perform the contract (CJEU, C-252/21)

55 of 68

Data Protection Impact Assessment (DPIA)

necessary if the developed system is likely to create a high risk to the rights and freedoms of natural persons

AI development often meets the EDPB criteria (e.g.: large-scale collection of personal data, crossing or combination of data sets, innovative uses or application of new technological or organisational solutions)

according to the CNIL:

the development of all systems identified as high-risk in the AI Act
the development of all foundation models
the development of all general-purpose AI systems

requires a DPIA if it consists of personal data processing

Risks to consider include:

misuse of training data, esp. in case of a data breach;
automated discrimination of certain users by the AI system;
‘hallucinations’ concerning real persons;
regurgitation of personal data in case of attacks;
loss of control over published online data (e.g. one’s tweets)

Measures to be taken on the basis of a DPIA include:

enhanced security (e.g. encryption of training data);
data minimisation (e.g. by replacing some personal data with synthetic data)
measures to reinforce the rights of data subjects (e.g., machine unlearning)
auditing

56 of 68

Data Protection by Design and by Default

stick to the defined purpose!
avoid over-collection and excessive annotation

are they necessary for the purpose? (if not: delete; data pruning)
find volume “sweet spot”

choose the least privacy-invasive method
CNIL: “the use of deep learning must be justified and should therefore not be systematic”...
recommended training protocols to consider:

decentralised training (e.g. federated learning) – allow greater control over datasets without combining them (BUT: security concerns)
cryptography (secure multiparty computation, homomorphic encryption)
keep an eye on most recent developments!

if possible, stay away from “sensitive data”
representativeness of data to avoid bias
define data retention periods (and stick to them)

automate deletion, if possible
some data can be archived (cf. national archiving laws)

document data (CNIL’s documentation model)

57 of 68

AI Act

21 April 2021: proposed by the European Commission
13 March 2024: accepted by the European Parliament
21 May 2024: approved by the EU Council
BREAKING! Publication 12.07.2024
will become applicable 24 months after publication, BUT:

for prohibited AI systems: 6 months after publication
for certain high-risk AI systems: 36 months after publication
for general-purpose AI systems: 12 months after publication

large territorial scope (systems developed in the EU, offered in the EU, outputs used in the EU…) – bound to become an international standard

58 of 68

AI Act: AI governance

AI Office

59 of 68

AI Act: classification of AI

by degree of risk

prohibited systems (Chapter II)
high-risk systems (Chapter III)
minimal risk (not regulated)

by purpose

general-purpose models (Chapter V)

transparency obligations (Chapter IV)

AI system

AI model

60 of 68

Prohibited AI systems

deploying subliminal, manipulative, or deceptive techniques to distort behaviour and impair informed decision-making, causing significant harm;
exploiting vulnerabilities related to age, disability, or socio-economic circumstances to distort behaviour, causing significant harm.
biometric categorisation systems inferring sensitive attributes (race, political opinions, sex life, sexual orientation…)
social scoring, i.e., evaluating or classifying individuals or groups based on social behaviour or personal traits, causing detrimental or unfavourable treatment of those people.
assessing the risk of an individual committing criminal offenses solely based on profiling or personality traits, except when used to augment human assessments based on objective, verifiable facts directly linked to criminal activity.
compiling facial recognition databases by untargeted scraping of facial images from the internet or CCTV footage.
inferring emotions in workplaces or educational institutions, except for medical or safety reasons.
‘real-time’ remote biometric identification (RBI) in publicly accessible spaces for law enforcement, except when:
searching for missing persons, victims,
preventing substantial and imminent threat to life, or foreseeable terrorist attack; or
identifying suspects in serious crimes

61 of 68

High-risk AI systems

used in areas regulated by EU law listed in Annex I

e.g. toys, recreational watercraft, lifts, radio equipment, cableway installations, medical devices, civil aviation security, motor vehicles, aircrafts…

corresponding to use cases listed in Annex III

identified areas: biometrics, critical infrastructure, education (access, evaluation, detection of prohibited behaviour during tests), access to and enjoyment of public services, law enforcement, migration, justice
UNLESS: only perform a narrow procedural task or a preparatory task, improve the results of human activity, detects decision-making patterns or deviations and is not meant to replace human decisions without review

burden of proof on the provider (documented assessment)

profiling is always high-risk (using personal data to automatically assess aspects of a person’s life)

62 of 68

High-risk AI systems: obligations of providers

63 of 68

Transparency obligations (Chapter III)

AI systems intended to interact directly with human:

must be designed in such a way that the natural persons concerned are informed that they are interacting with an AI system

unless it is obvious

AI systems, including general-purpose AI systems, generating synthetic audio, image, video or text content,

the outputs of the AI system are marked in a machine-readable format and detectable as artificially generated or manipulated.

…as far as technically feasible…

emotion recognition systems or biometric categorisation systems

deployers shall inform the exposed persons about the functioning of the system

-AI systems that generate or manipulate image, audio or video content constituting a deep fake

deployers shall disclose that the content has been artificially generated or manipulated

when content forms part of an evidently artistic, creative, satirical, fictional work: disclosure in a manner that does not hamper the display or enjoyment of the work.

-AI systems that generate or manipulate text which is published with the purpose of informing the public on matters of public interest

deployers shall disclose that the content has been artificially generated or manipulated

UNLESS content has undergone a process of human review or editorial control and where a natural or legal person holds editorial responsibility for the publication

transparency does not apply to crime detection and investigation
adoption of codes of practice encouraged (AI office)

64 of 68

General-purpose AI models (Chapter IV)

‘general-purpose AI model’ means an AI model, including where such an AI model is trained with a large amount of data using self-supervision at scale, that displays significant generality and is capable of competently performing a wide range of distinct tasks regardless of the way the model is placed on the market and that can be integrated into a variety of downstream systems or applications, except AI models that are used for research, development or prototyping activities before they are placed on the market; (Art. 3 (63))
Large generative AI models are a typical example for a general-purpose AI model, given that they allow for flexible generation of content, such as in the form of text, audio, images or video, that can readily accommodate a wide range of distinctive tasks. (Recital 99)

65 of 68

GPAI models with systemic risks

GPAI model with systemic risks if:

it has high impact capabilities (≥ the most advanced GPAIs)

presumed if > 10^25 floating point operations used for its training (https://ourworldindata.org/grapher/artificial-intelligence-training-computation)

provider can prove the contrary

provider has to notify the Commission

OR specific decision of the Commission

provider may request reassessment

list kept by the Commission

66 of 68

GPAI models: obligations of providers

technical documentation of the model

information on training, testing, evaluation
minimum: Annex XI
provided to AI Office and national authorities upon request

information and documentation for system providers

enable good understanding of capabilities and limitations of the model
enable compliance with the AI Act
minimum: Annex XII

a) and b) does not apply to providers of open source models with descriptions (parameters) publicly available UNLESS systemic risks

esp. complying with opt-outs in the general TDM exception

“sufficiently detailed” summary about the training data

to be made publicly available
template to be provided by the AI Office

codes of practice, harmonised standards (presumption of compliance)
providers from third countries should establish an authorised representative in the EU

67 of 68

Additional obligations of providers of GPAI models with systemic risks

perform model evaluation (incl. adversarial testing) to identify and mitigate systemic risks
assess and mitigate systemic risks stemming from the development, placing on the market, and the use of the model
monitor and report serious incidents (incl. corrective measures) to the AI Office
ensure appropriate level of security for the model and physical infrastructure

codes of practice, harmonised standards (presumption of compliance)

1 of 68

2 of 68

3 of 68

4 of 68

5 of 68

6 of 68

7 of 68

8 of 68

9 of 68

10 of 68

11 of 68

12 of 68

13 of 68

14 of 68

15 of 68

16 of 68

17 of 68

18 of 68

19 of 68

20 of 68

21 of 68

22 of 68

23 of 68

24 of 68

25 of 68

26 of 68

27 of 68

28 of 68

29 of 68

30 of 68

31 of 68

32 of 68

33 of 68

34 of 68

35 of 68

36 of 68

37 of 68

38 of 68

39 of 68

40 of 68

41 of 68

42 of 68

43 of 68

44 of 68

45 of 68

46 of 68

47 of 68

48 of 68

49 of 68

50 of 68

51 of 68

52 of 68

53 of 68

54 of 68

55 of 68

56 of 68

57 of 68

58 of 68

59 of 68

60 of 68

61 of 68

62 of 68

63 of 68

64 of 68

65 of 68

66 of 68

67 of 68

68 of 68