LAION-5B and beyond: datasets, models, and…?
Robert Kaczmarczyk, LAION, TUM
Datasets
Models
???
Environment
Education
LAION - a history of successful community collaborations
Datasets
Models
LAION-5B and beyond: datasets, models, and…
I wonder… 😉
Datasets
Models
Tools
Environment
Education
???
LAION-5B and beyond: datasets, models, and… tools!
(with) the dataset!
Tools to…
WHY?
Current LAION tools
Fundamental part of our previous efforts
Opt-out feature in img2dataset
BIAS?
Dataset → Tools → (improved) models
Efficient downloading (img2dataset, …)
Subset creation (CLIP retrieval, …)
Understanding (kNN, modified tSNE, …)
What else?
Outlook
Tools in development
�*HF already provides both to some degree for HF datasets
https://github.com/LAION-AI/GIF �https://github.com/rom1504/cc2imgcap�https://github.com/iejMac/video2dataset�https://github.com/rom1504/img2dataset/issues/82
CC2imgcap „Creating a dataset“-pipeline
General Inference Pipeline
→ Output of the dataset including embeddings / additional columns as specified
Any Dataset
(.tar, .parquet, .npy, …)
Any inference model(s)
+
Start general inference pipeline
Files get automatically uploaded to output directory, e.g., aws s3
GPU0
GPU1
GPU2
GPU0
GPU1
GPU2
Improve understanding of our datasets…
Conclusion
LAION is just the starting…
Join the community!
Large
Aritifical
Intelligence
Open
Network