A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Status | GH Issue | Location of Processed Dataset | Code Location | Owner | License | Notes | |||||||||||||||||||
2 | summary <-> fulltext | Done | https://huggingface.co/datasets/MyloBishop/reverse_summarization_dataset | Jordine/Andreas/Christoph/Mylo | ||||||||||||||||||||||
3 | - /r/ChangeMyView | Need to run and convert | https://github.com/LAION-AI/Open-Assistant/issues/727 | https://huggingface.co/datasets/kjl3080/OA_CMV_Arguments | https://github.com/LAION-AI/Open-Assistant/blob/main/notebooks/data-augmentation/changemyview-builder/data_processor.ipynb | KayJay | Apache | There are only 100 entries on huggingface. someone with a more powerful CUDA should augment more data using the builder attached in the repo | ||||||||||||||||||
4 | soda-dialog | Need to convert and upload | https://drive.google.com/file/d/1TOGQfr419n8wpzJpYLLw4nB3tSKD8zXV/view?usp=sharing | Huu | ||||||||||||||||||||||
5 | joke explanation | Need to convert and upload | https://drive.google.com/file/d/1-laFweqp676HxEzse1k63owo5h3BcbYi/view?usp=sharing | Huu/Theblackcat | ||||||||||||||||||||||
6 | kilt background contextual tasks | In progress | Lintang | |||||||||||||||||||||||
7 | OpenBugger | In progress | https://github.com/LAION-AI/Open-Assistant/issues/317 | https://github.com/furlat/OpenBugger | Iriden | 04.02: Big updates in the next few days | ||||||||||||||||||||
8 | - code explainer | Need a new assignee | Graverman | We just need compute to run this. Code done | ||||||||||||||||||||||
9 | math | In progress | hecko | |||||||||||||||||||||||
10 | - sympy | In progress | https://github.com/LAION-AI/Open-Assistant/issues/776 | hecko/King | ||||||||||||||||||||||
11 | debate conversation dialog dataset | In progress | ||||||||||||||||||||||||
12 | - Hippocorpus | Need to run, convert and upload | https://github.com/LAION-AI/Open-Assistant/issues/728 | https://github.com/LAION-AI/Open-Assistant/blob/main/notebooks/data-augmentation/hippocorpus/hippocorpus.ipynb | MightyAlex/Taylor | O-UDA v1.0 | ||||||||||||||||||||
13 | medical (note generation) | Need to run, convert and upload | https://github.com/LAION-AI/Open-Assistant/issues/721 | https://github.com/LAION-AI/Open-Assistant/tree/main/openassistant/datasets/mt_note_generation | YP | |||||||||||||||||||||
14 | Chatgpt prompts with our model responses | In progress | Huu & ?? | We just need compute to run this. Code done | ||||||||||||||||||||||
15 | Story Generation From Summary | In progress | Huu/Finitearth/Christoph | We just need compute to run this. Code done | ||||||||||||||||||||||
16 | Essay writer | In progress | Graverman & Finitearth | We just need compute to run this. Code done | ||||||||||||||||||||||
17 | Wikihow Q&A | In progress | Local machine | kenhktsui & b_mc2 | Collection done. Cleaning in progress. Then Augment. Currently local file only | |||||||||||||||||||||
18 | Wikihow | In progress | kenhktsui & b_mc2 | Notebook started, working on augmentation process. https://colab.research.google.com/drive/1oOI1XGqCtKTScrNjUhNsZz3CIkGsDBS7?usp=drive_open | ||||||||||||||||||||||
19 | In progress | Proteusiq/SriPrarabdha | ||||||||||||||||||||||||
20 | In progress | https://github.com/LAION-AI/Open-Assistant/issues/126 | Jmete | archive.org data collection scripts merged to main. Requires filtering to get useful entries. Waiting for 143 model for the filtering. Working on separate method to scrape twitter threads for increased quality. Notebook done and I already scraped 46,650 threads. I need help to run these through another model to create Q&A pairs. | ||||||||||||||||||||||
21 | stackexchange | In progress | https://github.com/LAION-AI/Open-Assistant/issues/191 | https://github.com/LAION-AI/Open-Assistant/blob/main/notebooks/data-augmentation/stackexchange-builder/stackexchange-builder.ipynb | bmc2/flying_madman | Initial notebook merged, can ingest different stackexchange data dumps and format to OA format. https://github.com/LAION-AI/Open-Assistant/pull/355 | ||||||||||||||||||||
22 | UnifiedQA->Instructions | Need to run, convert and upload | https://github.com/LAION-AI/Open-Assistant/issues/712 | https://github.com/LAION-AI/Open-Assistant/blob/main/notebooks/data-augmentation/unified-qa/unified-qa.ipynb | vale95ntino | Notebook added to GH repository, data can be downloaded from there | ||||||||||||||||||||
23 | Wriring Prompts | In progress | https://huggingface.co/datasets/fabraz/writingPromptAug | fabraz | ||||||||||||||||||||||
24 | Github-Unit test | In progress | bitplane/theol-git | |||||||||||||||||||||||
25 | Github Issues | Need a new assignee | https://github.com/LAION-AI/Open-Assistant/issues/319 | https://drive.google.com/file/d/1NOAgf2bhwQYnnjjrpiWWV_5Adl9xwzUJ/view | doroshroman | |||||||||||||||||||||
26 | Youtube Transcript -> Instruciton Dialog | In progress | Shtoner/totuta/marianna13 | |||||||||||||||||||||||
27 | Microsoft CodeT Datasets (HumanEval, APPS, CodeContests, MBPP) | In review | https://huggingface.co/datasets/OllieStanley/humaneval-mbpp-codegen-qa https://huggingface.co/datasets/OllieStanley/humaneval-mbpp-testgen-qa | https://github.com/LAION-AI/Open-Assistant/pull/1895 | Ollie | MIT | https://github.com/microsoft/CodeT/tree/main/CodeT | |||||||||||||||||||
28 | Microsoft DIVERSE Datasets | Need a new assignee | https://github.com/LAION-AI/Open-Assistant/issues/746 | https://github.com/LAION-AI/Open-Assistant/blob/main/notebooks/diverse/diverse.ipynb | Ollie/Thierryderuyttere | MIT | https://github.com/microsoft/CodeT/tree/main/DIVERSE | |||||||||||||||||||
29 | 2D table reasoning | In progress | hecko | |||||||||||||||||||||||
30 | Wikidata knowledge graph | Need to run, convert and upload | https://github.com/LAION-AI/Open-Assistant/pull/1075 | https://github.com/LAION-AI/Open-Assistant/pull/1075 | sedthh | |||||||||||||||||||||
31 | Project Gutenberg Books | Done | https://github.com/LAION-AI/Open-Assistant/issues/1110 | https://huggingface.co/datasets/sedthh/gutenberg_english and https://huggingface.co/datasets/sedthh/gutenberg_multilang | sedthh | |||||||||||||||||||||
32 | Wikilingua | In progress | https://github.com/LAION-AI/Open-Assistant/issues/892 | YP | ||||||||||||||||||||||
33 | - python add libraries | Need to convert and upload | https://www.kaggle.com/code/christopherdking/notebook0548e4d207/data | King (based on "checked python" project) | Apache | |||||||||||||||||||||
34 | python code dataset - need complete list | In progress | ||||||||||||||||||||||||
35 | p3/xp3 dataset | Done | Lintang/Markcheeky | |||||||||||||||||||||||
36 | rallio's QA data | Need to convert and upload | https://github.com/Rallio67/language-model-agents | Rallio | ||||||||||||||||||||||
37 | antrhopic dataset helpful/harmless (not redteam) | Done | ||||||||||||||||||||||||
38 | - checked python | Done | Huu/Rallio/King | |||||||||||||||||||||||
39 | Character description instruction from video game wikipedia | Need to convert and upload | Local machine | Rallio | ||||||||||||||||||||||
40 | Khan Academy | In progress | https://github.com/LAION-AI/Open-Assistant/issues/184 | |||||||||||||||||||||||
41 | WikiHow Lists | Need to convert | https://huggingface.co/datasets/b-mc2/wikihow_lists | b_mc2 | CC BY-NC-SA 3.0 | Wikihow titles to ordered list summaries, unordered list ingredients, items needed for X | ||||||||||||||||||||
42 | Storygeneration | Need to convert | https://huggingface.co/datasets/qwedsacf/story-generation | Vechtomov#1142 | ||||||||||||||||||||||
43 | Competition Math | Need to convert | https://huggingface.co/datasets/qwedsacf/competition_math | Vechtomov#1142 | ||||||||||||||||||||||
44 | Yahoo QA | Need to convert and upload | https://huggingface.co/datasets/yahoo_answers_qa | https://github.com/LAION-AI/Open-Assistant/pull/1984 | Shadowner | |||||||||||||||||||||
45 | Essay writing | Need to convert | https://huggingface.co/datasets/ChristophSchuhmann/essays-with-instructions | Christoph | ||||||||||||||||||||||
46 | Q A | Need to convert | https://huggingface.co/datasets/marianna13/random_dataset | marianna13 | ||||||||||||||||||||||
47 | Recipes | Done | https://github.com/LAION-AI/Open-Assistant/issues/1031 | https://huggingface.co/datasets/dctanner/oa_recipes | https://github.com/LAION-AI/Open-Assistant/pull/1836 | dctanner | ||||||||||||||||||||
48 | Radiology QA | In progress | https://github.com/LAION-AI/Open-Assistant/issues/1044 | Luab | ||||||||||||||||||||||
49 | Poetry instructions | Need a review | https://github.com/LAION-AI/Open-Assistant/issues/731 | https://huggingface.co/datasets/isaacrehg/poetry-instructions https://huggingface.co/datasets/isaacrehg/poetry-summary https://huggingface.co/datasets/isaacrehg/poetry-detailed-analysis | https://github.com/LAION-AI/Open-Assistant/pull/1009 | IsaacRe | ||||||||||||||||||||
50 | Cornell Movies Dialogs | Need to covert | https://github.com/LAION-AI/Open-Assistant/issues/1163 | https://huggingface.co/datasets/shahules786/OA-cornell-movies-dialog | https://github.com/LAION-AI/Open-Assistant/pull/1319 | iamikka(gh: shahules786) | ||||||||||||||||||||
51 | xp3 Code instructions | In progress | https://github.com/LAION-AI/Open-Assistant/issues/1266 | epicx(gh: bennman) | ||||||||||||||||||||||
52 | Free ChatGPT prompts generated answers | In progress | https://github.com/LAION-AI/Open-Assistant/issues/1379 | iamikka(gh: shahules786) | are we sure we can use these? | |||||||||||||||||||||
53 | Cocktail recipes | Need to convert | https://github.com/LAION-AI/Open-Assistant/issues/1286 | https://huggingface.co/datasets/brianarbuckle/cocktail_recipes | BrianArbuckle | |||||||||||||||||||||
54 | Korean datasets | In progress | https://github.com/LAION-AI/Open-Assistant/issues/1157 | jeonsworld | ||||||||||||||||||||||
55 | Hungarian literature datasets (?) | In progress | https://github.com/LAION-AI/Open-Assistant/issues/1872 | sedthh | ||||||||||||||||||||||
56 | Factoid q/a pairs with difficulty ratings from Wikipedia articles. | Done | https://github.com/LAION-AI/Open-Assistant/issues/1873 | sedthh | GFDL and CC BY-SA 3.0 | http://www.cs.cmu.edu/~ark/QA-data/ | ||||||||||||||||||||
57 | Multilangual subtitle datasets (?) | PR | https://github.com/LAION-AI/Open-Assistant/issues/1875 | sedthh | need to cite "P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016)" | https://opus.nlpl.eu/OpenSubtitles-v2018.php | ||||||||||||||||||||
58 | ForverDreaming transcripts | In progress | sedthh | |||||||||||||||||||||||
59 | Ubuntu dialogue Corpus | Done | https://github.com/LAION-AI/Open-Assistant/issues/1874 | sedthh | https://www.kaggle.com/datasets/rtatman/ubuntu-dialogue-corpus | |||||||||||||||||||||
60 | Zhihu QA | Done | https://github.com/LAION-AI/Open-Assistant/issues/1459 | wangrui6/MLMonkATGY | ||||||||||||||||||||||
61 | Biomedical Dialog | Need to convert | https://github.com/LAION-AI/Open-Assistant/issues/1029 | https://huggingface.co/datasets/ericyu3/openassistant_inpainted_dialogs_5k_biomedical | https://github.com/LAION-AI/Open-Assistant/pull/1092 | uyhcire | ||||||||||||||||||||
62 | ||||||||||||||||||||||||||
63 | ||||||||||||||||||||||||||
64 | ||||||||||||||||||||||||||
65 | ||||||||||||||||||||||||||
66 | ||||||||||||||||||||||||||
67 | ||||||||||||||||||||||||||
68 | ||||||||||||||||||||||||||
69 | ||||||||||||||||||||||||||
70 | ||||||||||||||||||||||||||
71 | ||||||||||||||||||||||||||
72 | ||||||||||||||||||||||||||
73 | ||||||||||||||||||||||||||
74 | ||||||||||||||||||||||||||
75 | ||||||||||||||||||||||||||
76 | ||||||||||||||||||||||||||
77 | ||||||||||||||||||||||||||
78 | ||||||||||||||||||||||||||
79 | ||||||||||||||||||||||||||
80 | ||||||||||||||||||||||||||
81 | ||||||||||||||||||||||||||
82 | ||||||||||||||||||||||||||
83 | ||||||||||||||||||||||||||
84 | ||||||||||||||||||||||||||
85 | ||||||||||||||||||||||||||
86 | ||||||||||||||||||||||||||
87 | ||||||||||||||||||||||||||
88 | ||||||||||||||||||||||||||
89 | ||||||||||||||||||||||||||
90 | ||||||||||||||||||||||||||
91 | ||||||||||||||||||||||||||
92 | ||||||||||||||||||||||||||
93 | ||||||||||||||||||||||||||
94 | ||||||||||||||||||||||||||
95 | ||||||||||||||||||||||||||
96 | ||||||||||||||||||||||||||
97 | ||||||||||||||||||||||||||
98 | ||||||||||||||||||||||||||
99 | ||||||||||||||||||||||||||
100 |