Request edit access
Vaani Exploration & Feedback
Thank you for exploring Project Vaani.  Launched in 2022 by IISc/ARTPARK and Google, Project Vaani aims to create an open-source multi-modal dataset truly representing India's linguistic diversity. It aims to collect over 775,000 images; 150,000 hours of speech & text data from 1 million people across all 773 districts, capturing diversity in language, dialects, and demographics. The geo-centric approach, instead of language centric, allows capturing dialects and languages spoken in remote areas, though making it extremely operationally intensive and challenging.
From this, Phase 1 covering 80 districts has currently been open sourced.
Dataset: https://huggingface.co/datasets/ARTPARK-IISc/Vaani
Sign in to Google to save your progress. Learn more
Email *
Where have you/are planning to use this dataset?  *
Have you used/Are you planning to use the data for multimodal use cases, such as fine-tuning multimodal LLMs? *
Have you used the data in a district-specific manner? *
Next
Clear form
Never submit passwords through Google Forms.
This form was created inside of ARTPARK.

Does this form look suspicious? Report