1 of 29

Foundation Models under the EU AI Act

Paul Friedl

Universidad Pontificia Comillas, Madrid

3 April 2024

2 of 29

Outline

  1. A short introduction to Foundation Models
  2. Obligations for Foundation Models under the AI Act
  3. Copyright issues
  4. Data protection issues
  5. Conclusion

3 of 29

  1. Introduction

4 of 29

Introduction

  • Foundation model = Machine learning model that is trained on broad data such that it develops general capabilities which can be applied across a wide range of use cases

Image processing and generation

Language processing and generation

chatbots,�CV screening,

coding, education,�legal tech,�…

text-to-image creation,�image recognition,�face recognition, medical imaging,

…�

Structured data processing

Audio processing and

generation

voice recognition,

voice generation,

music generation,

conventional data analytics,

predictive analytics,

Capability

Applications

5 of 29

Introduction

The foundation model supply chain

source: https://www.adalovelaceinstitute.org/resource/foundation-models-explainer/

6 of 29

Introduction

The foundation model supply chain

source: https://www.adalovelaceinstitute.org/resource/foundation-models-explainer/

7 of 29

Introduction

Categories of risk

Performance

&

Robustness

Bias

&

Discrimination

Privacy

&

Cybersecurity

Transparency &�Accountability

Misuse

&

Inappropriate use

8 of 29

Introduction

Categories of risk

Performance

&

Robustness

Bias

&

Discrimination

Privacy

&

Cybersecurity

Transparency &�Accountability

Misuse

&

Inappropriate use

9 of 29

Introduction

Categories of risk

Performance

&

Robustness

Bias

&

Discrimination

Privacy

&

Cybersecurity

Transparency &�Accountability

Misuse

&

Inappropriate use

10 of 29

Introduction

Categories of risk

Performance

&

Robustness

Bias

&

Discrimination

Privacy

&

Cybersecurity

Transparency &�Accountability

Misuse

&

Inappropriate use

11 of 29

Introduction

Categories of risk

Performance

&

Robustness

Bias

&

Discrimination

Privacy

&

Cybersecurity

Transparency &�Accountability

Misuse

&

Inappropriate use

12 of 29

Introduction

Categories of risk

Performance

&

Robustness

Bias

&

Discrimination

Privacy

&

Cybersecurity

Transparency &�Accountability

Misuse

&

Inappropriate use

13 of 29

Introduction

Categories of risk

Performance

&

Robustness

Bias

&

Discrimination

Privacy

&

Cybersecurity

Transparency &�Accountability

Misuse

&

Inappropriate use

14 of 29

2. Obligations for FM under the AI Act

15 of 29

Obligations for Foundation Models

The central distinction

(Normal) General Purpose AI model

General Purpose AI model with systemic risk

16 of 29

Obligations for Foundation Models

The central distinction

  • A GPAI model shall be classified as systemic risk …
  • “if it has high impact capabilities evaluated on the basis of appropriate technical tools and methodologies”
    1. presumed to be the case when the amount of compute used for training is greater than 10^25 floating point operations (FLOPs)
    2. Commission shall adopt delegated acts to amend thresholds and to determine new thresholds in light of evolving technological developments, such as increased hardware efficiency
  • “based on a decision of the Commission
      • Criteria (i.a.): number of parameters, training compute, capabilities, impact on the internal market due to reach, registered users

17 of 29

Obligations for Foundation Models

Obligations for GPAI models with systemic risk

  • Providers of GPAI models with systemic risk shall …
  • perform model evaluation reflecting the state of the art, including conducting adversarial testing, with a view to identify and mitigate�systemic risk;
  • assess and mitigate possible systemic risks at Union level
  • document and report without undue delay serious incidents
  • ensure an adequate level of cybersecurity protection
  • Specification through codes of practice and harmonised standards�(⇒ presumption of compliance)

18 of 29

Obligations for Foundation Models

Obligations for (normal) GPAI models

  • Providers of (normal) GPAI models shall …
  • draw up technical documentation and provide it upon request to authorities
  • make available to downstream providers information on the GPAI’s capabilities and limitations, enabling them to comply with their own duties
  • put in place a policy to respect Union copyright law
  • make publicly available a sufficiently detailed summary about the data used for training of the GPAI, according to a template by the AI Office
  • Specification through codes of practice and harmonised standards�(⇒ presumption of compliance)

19 of 29

3. Copyright issues

20 of 29

Copyright issues

source: https://petapixel.com/2023/02/07/getty-images-are-suing-stable-diffusion-for-a-staggering-1-8-trillion/

21 of 29

Copyright issues

Rights and obligations under EU copyright law (EU DSM Copyright Directive)

  • Reference in AI Act clarifies that Art. 4 DSM-CR-D applies to foundation model training
  • Recital clarifies that this applies regardless of where a model was trained

Art. 4(1): Member States shall provide for an exception or limitation to copy-�rights for reproductions and extractions of lawfully accessible works and other subject matter for the purposes of text and data mining.

Art. 4(3): The exception or limitation provided for in paragraph 1 shall apply on condition that the [copyright] has not been expressly reserved by their rightholders in an appropriate manner, such as machine-readable means in the case of content made publicly available online.

22 of 29

Copyright issues

Operationalizing opt-outs

  • What should/will machine-readable means to communicate opt-outs�look like?
  • Currently, a number of such protocols are developed (spawning.ai, C2PA, Google initiative, ….)
    • All of these operate through website metadata (similar to robots.txt)
    • All of these (currently) rely on voluntary observance
  • Problems (i.a.)
    • What about the actual copies of a work?
    • How to include works where creators have no control of website (e.g. works shared on platforms such as YouTube, TikTok or Soundcloud)

23 of 29

Copyright issues

Training data transparency

  • Recital: “While taking into due account the need to protect trade secrets and confidential business information, this summary should be generally comprehensive in its scope instead of technically detailed to facilitate parties with legitimate interests, including copyright holders, to exercise and enforce their rights under Union law, for example by listing the main data collections or sets that went into training the model, such as large private or public databases or data archives, and by providing a narrative explanation about other data sources used.”

Art. 52c AI Act: “Providers of general purpose AI models shall: (d) draw up and make publicly available a sufficiently detailed summary about the content used for training of the general-purpose AI model, according to a template provided by the AI Office.

24 of 29

4. Data Protection issues

25 of 29

Data protection issues

Possible complaints

Misrepresen-�tation

Memorization and leaking of private data

Unconsented appropriation

Privacy violations through inference

26 of 29

Data protection issues

The EU’s General Data Protection Regulation

Article 6(1): Lawfulness - Legitimate interest

  • Relevant interests?
  • Outcome?

Article 6(1): Lawfulness - Consent

  • Consent possible?
  • Consent infrastructure?

Article 15: Right to access

  • Obligation to render training data set fully accessible?

Articles 17 & 18: Rights to objection and erasure

  • “Overriding legitimate grounds”?

Article 16: Right to rectification

  • When is data “inaccurate” or “incomplete” in LLM development contexts?

When does LLM training data relate to an�”identified or identifiable natural person”, Art 4(1),�rendering the GDPR applicable?

27 of 29

Data protection issues

GDPR - Processing on the basis of legitimate interests

  • Doubts expressed by Italian, Polish, French DPAs
  • Precedence
    • Clearview (Italian, UK and Greek DPAs)
    • …?
  • What interests are legitimate?

Art. 6(1)(f): Processing shall be lawful if [...] processing is necessary for the purposes of the legitimate interests pursued by the controller or by a third party, except where such interests are overridden by the interests or fundamental rights and freedoms of the data subject [...]

28 of 29

Data protection issues

GDPR - The right to object

  • Possible implementation: Data removal request
  • But: ultimately reliable, pro-active consent/objection infrastructure needed�(“data protection by design”)

Art. 21(1): The data subject shall have the right to object, on grounds relating to his or her particular situation, at any time to processing of personal data concerning him or her which is based on point (e) or (f) of Article 6(1)[...]. The controller shall no longer process the personal data unless the controller demonstrates compelling legitimate grounds for the processing which override the interests, rights and freedoms of the data subject or for the establishment [...].

29 of 29

Thank you!

https://paul-friedl.github.io/

Link to presentation