CAMeL Lab Registration Form: "ZAEBUC-Spoken: A Multilingual Multidialectal Arabic-English Speech Corpus"
ZAEBUC-Spoken Corpus is a multilingual, multidialectal Arabic-English speech corpus. The corpus comprises twelve hours of Zoom meetings involving multiple speakers role-playing a work situation. The corpus is multilingual, including two languages (Arabic and English) with Arabic spoken in multiple variants (Modern Standard Arabic, Gulf Arabic, and Egyptian Arabic) and English used with various accents. Adding to the complexity of the corpus, there is also frequent code-switching between these languages and dialects. The corpus includes manual transcriptions of the recordings and dialectness level annotations for the portion containing code-switching between Arabic variants, in addition to automatic morphological annotations, including tokenization, lemmatization, and part-of-speech tagging.

A note on privacy: All of the provided information will solely be used for internal reporting purposes on resource usage.  No information will be shared with third parties. No emails will be added to mailing lists.
Sign in to Google to save your progress. Learn more
Email *
First Name *
Last Name *
Affiliation *
Website (optional)
We provide two download options for the corpus, one with (3.5GB) and one without (3MB) the audio files. Please specify which option you will be downloading:
*
What do you plan to use this resource for? *
License - please read the following license:
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// License for ZAEBUC-Spoken: A Multilingual Multidialectal Arabic-English Speech Corpus
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

This work is licensed under a Creative Commons
Attribution-NonCommercial-ShareAlike 4.0 International License.
(https://creativecommons.org/licenses/by-nc-sa/4.0/)

///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

By clicking "Yes" you agree to the terms of this license. *
Citing Guide
If you use this corpus, please cite the ZAEBUC-Spoken paper:

@inproceedings{hamed2024zaebuc,
  title={{ZAEBUC-Spoken}: A Multilingual Multidialectal {A}rabic-{E}nglish Speech Corpus},
  author={Hamed, Injy and Eryani, Fadhl and Palfreyman, David and Habash, Nizar},
  booktitle={Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
  pages={17770--17782},
  year={2024}
}
By clicking "Yes" you agree to use this citing guide. *
Publications
Injy Hamed, Fadhl Eryani, David Palfreyman, and Nizar Habash. ZAEBUC-Spoken: A Multilingual Multidialectal Arabic-English Speech Corpus. In Proceedings of the Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 17770-17782. 2024.
Submit
Clear form
Never submit passwords through Google Forms.
This form was created inside of New York University. Report Abuse