4 of 29

Introduction

Foundation model = Machine learning model that is trained on broad data such that it develops general capabilities which can be applied across a wide range of use cases

Image processing and generation

Language processing and generation

chatbots,�CV screening,

coding, education,�legal tech,�…

text-to-image creation,�image recognition,�face recognition, medical imaging,

…�

Structured data processing

Audio processing and

generation

voice recognition,

voice generation,

music generation,

…

conventional data analytics,

predictive analytics,

…

Capability

Applications

…

5 of 29

Introduction

The foundation model supply chain

source: https://www.adalovelaceinstitute.org/resource/foundation-models-explainer/

6 of 29

Introduction

The foundation model supply chain

source: https://www.adalovelaceinstitute.org/resource/foundation-models-explainer/

7 of 29

Introduction

Categories of risk

Performance

Robustness

Bias

Discrimination

Privacy

Cybersecurity

Transparency &�Accountability

Misuse

Inappropriate use

Performance:

Face recognition system misclassifying
Journalistic text generation system that misrepresents facts and thus generates misinformation

Privacy & Cybersecurity:

Leaking data the systems were trained on or the systems were exposed to during deployment, user-supplied input data

Bias & discrimination:

Generated texts contain gender biases
Image recognition, e.g. not able to recognize disabled individuals

Misuse & Inappropriate use

Image generation tools for generating misleading photos or pornographic deepfakes

Transparency & accountability: can’t explain outputs or decisions, can’t identify who is responsible for some specific form of undesired performance

Depending on the circumstances, risks can be mitigated on the application layer, but depending on the circumstances risks might also trickle down to application layer

8 of 29

Introduction

Categories of risk

Performance

Robustness

Bias

Discrimination

Privacy

Cybersecurity

Transparency &�Accountability

Misuse

Inappropriate use

Performance:

Face recognition system misclassifying
Journalistic text generation system that misrepresents facts and thus generates misinformation

Privacy & Cybersecurity:

Leaking data the systems were trained on or the systems were exposed to during deployment, user-supplied input data

Bias & discrimination:

Generated texts contain gender biases
Image recognition, e.g. not able to recognize disabled individuals

Misuse & Inappropriate use

Image generation tools for generating misleading photos or pornographic deepfakes

Transparency & accountability: can’t explain outputs or decisions, can’t identify who is responsible for some specific form of undesired performance

Depending on the circumstances, risks can be mitigated on the application layer, but depending on the circumstances risks might also trickle down to application layer

9 of 29

Introduction

Categories of risk

Performance

Robustness

Bias

Discrimination

Privacy

Cybersecurity

Transparency &�Accountability

Misuse

Inappropriate use

Performance:

Face recognition system misclassifying
Journalistic text generation system that misrepresents facts and thus generates misinformation

Privacy & Cybersecurity:

Leaking data the systems were trained on or the systems were exposed to during deployment, user-supplied input data

Bias & discrimination:

Generated texts contain gender biases
Image recognition, e.g. not able to recognize disabled individuals

Misuse & Inappropriate use

Image generation tools for generating misleading photos or pornographic deepfakes

Transparency & accountability: can’t explain outputs or decisions, can’t identify who is responsible for some specific form of undesired performance

Depending on the circumstances, risks can be mitigated on the application layer, but depending on the circumstances risks might also trickle down to application layer

10 of 29

Introduction

Categories of risk

Performance

Robustness

Bias

Discrimination

Privacy

Cybersecurity

Transparency &�Accountability

Misuse

Inappropriate use

Performance:

Face recognition system misclassifying
Journalistic text generation system that misrepresents facts and thus generates misinformation

Privacy & Cybersecurity:

Leaking data the systems were trained on or the systems were exposed to during deployment, user-supplied input data

Bias & discrimination:

Generated texts contain gender biases
Image recognition, e.g. not able to recognize disabled individuals

Misuse & Inappropriate use

Image generation tools for generating misleading photos or pornographic deepfakes

Transparency & accountability: can’t explain outputs or decisions, can’t identify who is responsible for some specific form of undesired performance

Depending on the circumstances, risks can be mitigated on the application layer, but depending on the circumstances risks might also trickle down to application layer

11 of 29

Introduction

Categories of risk

Performance

Robustness

Bias

Discrimination

Privacy

Cybersecurity

Transparency &�Accountability

Misuse

Inappropriate use

Performance:

Face recognition system misclassifying
Journalistic text generation system that misrepresents facts and thus generates misinformation

Privacy & Cybersecurity:

Leaking data the systems were trained on or the systems were exposed to during deployment, user-supplied input data

Bias & discrimination:

Generated texts contain gender biases
Image recognition, e.g. not able to recognize disabled individuals

Misuse & Inappropriate use

Image generation tools for generating misleading photos or pornographic deepfakes

Transparency & accountability: can’t explain outputs or decisions, can’t identify who is responsible for some specific form of undesired performance

Depending on the circumstances, risks can be mitigated on the application layer, but depending on the circumstances risks might also trickle down to application layer

12 of 29

Introduction

Categories of risk

Performance

Robustness

Bias

Discrimination

Privacy

Cybersecurity

Transparency &�Accountability

Misuse

Inappropriate use

Performance:

Face recognition system misclassifying
Journalistic text generation system that misrepresents facts and thus generates misinformation

Privacy & Cybersecurity:

Leaking data the systems were trained on or the systems were exposed to during deployment, user-supplied input data

Bias & discrimination:

Generated texts contain gender biases
Image recognition, e.g. not able to recognize disabled individuals

Misuse & Inappropriate use

Image generation tools for generating misleading photos or pornographic deepfakes

Transparency & accountability: can’t explain outputs or decisions, can’t identify who is responsible for some specific form of undesired performance

Depending on the circumstances, risks can be mitigated on the application layer, but depending on the circumstances risks might also trickle down to application layer

13 of 29

Introduction

Categories of risk

Performance

Robustness

Bias

Discrimination

Privacy

Cybersecurity

Transparency &�Accountability

Misuse

Inappropriate use

Performance:

Face recognition system misclassifying
Journalistic text generation system that misrepresents facts and thus generates misinformation

Privacy & Cybersecurity:

Leaking data the systems were trained on or the systems were exposed to during deployment, user-supplied input data

Bias & discrimination:

Generated texts contain gender biases
Image recognition, e.g. not able to recognize disabled individuals

Misuse & Inappropriate use

Image generation tools for generating misleading photos or pornographic deepfakes

Transparency & accountability: can’t explain outputs or decisions, can’t identify who is responsible for some specific form of undesired performance

Depending on the circumstances, risks can be mitigated on the application layer, but depending on the circumstances risks might also trickle down to application layer

14 of 29

2. Obligations for FM under the AI Act

15 of 29

Obligations for Foundation Models

The central distinction

(Normal) General Purpose AI model

General Purpose AI model with systemic risk

16 of 29

Obligations for Foundation Models

The central distinction

A GPAI model shall be classified as systemic risk …
“if it has high impact capabilities evaluated on the basis of appropriate technical tools and methodologies”

presumed to be the case when the amount of compute used for training is greater than 10^25 floating point operations (FLOPs)
Commission shall adopt delegated acts to amend thresholds and to determine new thresholds in light of evolving technological developments, such as increased hardware efficiency

“based on a decision of the Commission”

Criteria (i.a.): number of parameters, training compute, capabilities, impact on the internal market due to reach, registered users

17 of 29

Obligations for Foundation Models

Obligations for GPAI models with systemic risk

Providers of GPAI models with systemic risk shall …
perform model evaluation reflecting the state of the art, including conducting adversarial testing, with a view to identify and mitigate�systemic risk;
assess and mitigate possible systemic risks at Union level
document and report without undue delay serious incidents
ensure an adequate level of cybersecurity protection
Specification through codes of practice and harmonised standards�(⇒ presumption of compliance)

No definition in regard to what the model should be evaluated

Systemic risk = “actual or reasonably foreseeable negative effects on public health, safety, public security, fundamental rights, or the society as a whole”

Serious incidents: “ ‘serious incident’ means any incident or malfunctioning of an AI system that directly or indirectly leads to any of the following: (a) the death of a person or serious damage to a person’s health; (b) a serious and irreversible disruption of the management and operation of critical infrastructure; (c) breach of obligations under Union law intended to protect fundamental rights”

Codes of Practice

“The AI Office may invite the providers of general purpose AI models, as well as relevant national competent authorities, to participate in the drawing up of codes of practice. Civil society organisations, industry, academia and other relevant stakeholders, such as downstream providers and independent experts, may support the process.”

Standards

Risks in the standardisation process
But also risks that standards are not binding, but rather that providers can also argue to fulfill duties differently

- The actors involved in the standardisation process shall seek to promote investment and innovation in AI, including through increasing legal certainty, as well as competitiveness and growth of the Union market, and contribute to strengthening global cooperation on standardisation and taking into account existing international standards in the field of AI that are consistent with Union values, fundamental rights and interests, and enhance multistakeholder governance ensuring a balanced representation of interests and effective participation of all relevant stakeholders in accordance with Articles 5, 6, and 7 of Regulation (EU) No 1025/2012.

18 of 29

Obligations for Foundation Models

Obligations for (normal) GPAI models

Providers of (normal) GPAI models shall …
draw up technical documentation and provide it upon request to authorities
make available to downstream providers information on the GPAI’s capabilities and limitations, enabling them to comply with their own duties
put in place a policy to respect Union copyright law
make publicly available a sufficiently detailed summary about the data used for training of the GPAI, according to a template by the AI Office
Specification through codes of practice and harmonised standards�(⇒ presumption of compliance)

Technical documentation is mostly trivial, as it only contains general descriptions of a system that any provider will make public anyways; exception: “information on the data used for training, testing and validation, where applicable, including type and provenance of data and curation methodologies (e.g. cleaning, filtering etc), the number of data points, their scope and main characteristics; how the data was obtained and selected as well as all other measures to detect the unsuitability of data sources and methods to detect identifiable biases, where applicable”

Information to other providers trivial, too, again only general descriptions and how to integrate the system; exception: “information on the data used for training, testing and validation, where applicable, including, type and provenance of data and curation methodologies”

Codes of Practice will be developed, are already being developed by the AI Office

19 of 29

3. Copyright issues

20 of 29

source: https://petapixel.com/2023/02/07/getty-images-are-suing-stable-diffusion-for-a-staggering-1-8-trillion/

21 of 29

Rights and obligations under EU copyright law (EU DSM Copyright Directive)

Reference in AI Act clarifies that Art. 4 DSM-CR-D applies to foundation model training
Recital clarifies that this applies regardless of where a model was trained

Art. 4(1): Member States shall provide for an exception or limitation to copy-�rights for reproductions and extractions of lawfully accessible works and other subject matter for the purposes of text and data mining.

Art. 4(3): The exception or limitation provided for in paragraph 1 shall apply on condition that the [copyright] has not been expressly reserved by their rightholders in an appropriate manner, such as machine-readable means in the case of content made publicly available online.

Research organisations and cultural heritage institutions, including the persons attached thereto, should be covered by the text and data mining exception with regard to content to which they have lawful access. Lawful access should be understood as covering access to content based on an open access policy or through contractual arrangements between rightholders and research organisations or cultural heritage institutions, such as subscriptions, or through other lawful means. For instance, in the case of subscriptions taken by research organisations or cultural heritage institutions, the persons attached thereto and covered by those subscriptions should be deemed to have lawful access. Lawful access should also cover access to content that is freely available online.

Opt-out system
In difference to e.g. the U.S. (fair use), the substantive issue is resolved, there is a clear legal framework, clarity

22 of 29

Operationalizing opt-outs

What should/will machine-readable means to communicate opt-outs�look like?
Currently, a number of such protocols are developed (spawning.ai, C2PA, Google initiative, ….)

All of these operate through website metadata (similar to robots.txt)
All of these (currently) rely on voluntary observance

Problems (i.a.)

What about the actual copies of a work?
How to include works where creators have no control of website (e.g. works shared on platforms such as YouTube, TikTok or Soundcloud)

23 of 29

Training data transparency

Recital: “While taking into due account the need to protect trade secrets and confidential business information, this summary should be generally comprehensive in its scope instead of technically detailed to facilitate parties with legitimate interests, including copyright holders, to exercise and enforce their rights under Union law, for example by listing the main data collections or sets that went into training the model, such as large private or public databases or data archives, and by providing a narrative explanation about other data sources used.”

Art. 52c AI Act: “Providers of general purpose AI models shall: (d) draw up and make publicly available a sufficiently detailed summary about the content used for training of the general-purpose AI model, according to a template provided by the AI Office.

24 of 29

4. Data Protection issues

25 of 29

Data protection issues

Possible complaints

Misrepresen-�tation

Memorization and leaking of private data

Unconsented appropriation

Privacy violations through inference

26 of 29

Data protection issues

The EU’s General Data Protection Regulation

Article 6(1): Lawfulness - Legitimate interest	Relevant interests? Outcome?
Article 6(1): Lawfulness - Consent	Consent possible? Consent infrastructure?
Article 15: Right to access	Obligation to render training data set fully accessible?
Articles 17 & 18: Rights to objection and erasure	“Overriding legitimate grounds”?
Article 16: Right to rectification	When is data “inaccurate” or “incomplete” in LLM development contexts?

When does LLM training data relate to an�”identified or identifiable natural person”, Art 4(1),�rendering the GDPR applicable?

27 of 29

Data protection issues

GDPR - Processing on the basis of legitimate interests

Doubts expressed by Italian, Polish, French DPAs
Precedence

Clearview (Italian, UK and Greek DPAs)
…?

What interests are legitimate?

Art. 6(1)(f): Processing shall be lawful if [...] processing is necessary for the purposes of the legitimate interests pursued by the controller or by a third party, except where such interests are overridden by the interests or fundamental rights and freedoms of the data subject [...]

28 of 29

Data protection issues

GDPR - The right to object

Possible implementation: Data removal request
But: ultimately reliable, pro-active consent/objection infrastructure needed�(“data protection by design”)

Art. 21(1): The data subject shall have the right to object, on grounds relating to his or her particular situation, at any time to processing of personal data concerning him or her which is based on point (e) or (f) of Article 6(1)[...]. The controller shall no longer process the personal data unless the controller demonstrates compelling legitimate grounds for the processing which override the interests, rights and freedoms of the data subject or for the establishment [...].

1 of 29

2 of 29

3 of 29

4 of 29

5 of 29

6 of 29

7 of 29

8 of 29

9 of 29

10 of 29

11 of 29

12 of 29

13 of 29

14 of 29

15 of 29

16 of 29

17 of 29

18 of 29

19 of 29

20 of 29

21 of 29

22 of 29

23 of 29

24 of 29

25 of 29

26 of 29

27 of 29

28 of 29

29 of 29