| A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | AA | AB | AC | AD | AE | AF | AG | AH | AI | AJ | AK | AL | AM | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Category | DatasetName | SubsetName | Is SFT | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | Used (B) | Total Tokenized (B) | Remained (B) | Notes | Statistics | ||||
2 | 01_20241017_013512 | 02_20241017_013401 | 03_20241020_001556.json | 04_20241021_170901.json | 05_20241022_221453.json | 06_20241024_013137 | 07_20241025_022032 | 08_20241026_151354 | 09_20241027_190948 | 10_20241028_225112 | 11_20241030_124814 | 12_20241101_002827 | 13_20241102_160534 | 14_20241104_000454 | 15_20241105_023029 | 16_20241106_180613 | 17_20241108_004951 | 18_20241113_034017 | 19_20241114_115241 | 20_20241115_234357 | 21_20241117_021115 | 22_20241118_155407 | 23_20241120_033942 | 24_20241121_133110 | 25_20241123_030124 | 26_20241211_015209 | 27_20241213_051741 | Pre-Train | SFT | |||||||||||
3 | Chinese | Webpage | https://huggingface.co/datasets/opencsg/chinese-fineweb-edu | cci2 | 0.391 | 0.391 | 0.391 | 0.391 | 0.391 | 0.391 | 0.391 | 0.391 | 0.391 | 0.391 | 0.391 | 0.391 | 0.391 | 0.391 | 0.391 | 0.391 | 0.391 | 0.391 | 0.416 | 0.416 | 0.416 | 0.416 | 0.416 | 0.416 | 0.424 | 0.424 | 0.324 | 4.2824 | 5.48 | 1.1976 | 建议使用新版chinese-fineweb-edu-v2.1 | 4.2824 | 0 | |||
4 | IndustryCorpus | 1.841 | 1.841 | 1.841 | 1.841 | 1.841 | 1.841 | 1.841 | 1.841 | 1.841 | 1.841 | 1.841 | 1.841 | 1.841 | 1.841 | 1.841 | 1.841 | 1.841 | 1.841 | 1.841 | 1.841 | 1.841 | 1.841 | 1.841 | 1.841 | 1.881 | 1.881 | 1.681 | 19.8508 | 24.88 | 5.0292 | 19.8508 | 0 | |||||||
5 | Skypile | 0.971 | 0.971 | 0.971 | 0.971 | 0.971 | 0.971 | 0.971 | 0.971 | 0.971 | 0.971 | 0.971 | 0.971 | 0.971 | 0.971 | 0.971 | 0.971 | 0.971 | 0.971 | 0.971 | 0.971 | 0.971 | 0.971 | 0.971 | 0.971 | 0.825 | 9.6516 | 9.77 | 0.1184 | 9.6516 | 0 | |||||||||
6 | tele | 2.751 | 2.751 | 2.751 | 2.751 | 2.751 | 2.751 | 2.751 | 2.751 | 2.751 | 2.751 | 2.751 | 2.751 | 2.751 | 2.751 | 2.801 | 2.751 | 2.751 | 2.751 | 2.851 | 2.851 | 2.851 | 2.851 | 2.851 | 2.851 | 2.9 | 2.9 | 2.5 | 29.9896 | 33.18 | 3.1904 | 29.9896 | 0 | |||||||
7 | map | 1.071 | 1.071 | 1.071 | 1.071 | 1.071 | 1.071 | 1.071 | 1.071 | 1.071 | 1.071 | 1.071 | 1.071 | 1.071 | 1.221 | 1.321 | 1.321 | 1.321 | 1.321 | 2.021 | 2.021 | 2.021 | 2.021 | 2.021 | 2.021 | 2.07 | 2.07 | 1.77 | 15.3856 | 31.38 | 15.9944 | 15.3856 | 0 | |||||||
8 | Encyclopedia | https://huggingface.co/datasets/TMZN/baidubaike | (Filtered) | 0.875 | 0.875 | 0.875 | 0.875 | 0.875 | 0.875 | 0.875 | 0.675 | 0.675 | 0.675 | 0.675 | 0.675 | 0.675 | 0.675 | 0.675 | 0.675 | 0.675 | 0.675 | 0.05 | 5.44 | 2.72 | -2.72 | Epoch=2 | 5.44 | 0 | ||||||||||||
9 | https://zh.wikipedia.org/wiki/Wikipedia | (Filtered) | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.025 | 0.33 | 0.33 | 0 | Related issue: https://github.com/RUC-GSAI/YuLan-Mini/issues/11 | 0.33 | 0 | |||||||||||||||||||||||
10 | https://huggingface.co/datasets/LLM360/TxT360 | wiki-zh-1 | 0.2 | 0.2 | 0.2 | 0.1 | 0.1 | 0.05 | 0 | 0.34 | 0.34 | 0 | 0.34 | 0 | ||||||||||||||||||||||||||
11 | wiki-zh-2 | 0.075 | 0.1 | 0.2 | 0.2 | 0.25 | 0.15 | 0.39 | 0.39 | 0 | 0.39 | 0 | ||||||||||||||||||||||||||||
12 | QA | https://www.zhihu.com/ | (Filtered) | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.125 | 0.125 | 3.1 | 3.1 | 0 | 3.1 | 0 | |||||
13 | Book | bestsellers | (Filtered) | 0.125 | 0.125 | 0.125 | 0.125 | 0.125 | 0.125 | 0.125 | 0.125 | 0.125 | 0.125 | 0.125 | 0.125 | 0.125 | 0.125 | 0.125 | 0.125 | 0.125 | 0.125 | 0.125 | 0.125 | 0.125 | 0.125 | 0.125 | 0.125 | 0.075 | 1.23 | 1.32 | 0.09 | 1.23 | 0 | |||||||
14 | https://github.com/FudanNLPLAB/CBook-150K | (Filtered) | 0.425 | 0.425 | 0.425 | 0.425 | 0.425 | 0.425 | 0.425 | 0.425 | 0.425 | 0.425 | 0.425 | 0.425 | 0.425 | 0.425 | 0.425 | 0.425 | 0.425 | 0.425 | 0.425 | 0.45 | 0.45 | 0.45 | 0.45 | 0.45 | 0.5 | 1.14 | 2.25 | 5.686 | 13.39 | 7.704 | 5.686 | 0 | ||||||
15 | textbooks-book-cn/v1 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.075 | 0.075 | 0.075 | 0.075 | 0.075 | 0.075 | 0.2 | 0.1 | 0.68 | 0.77 | 0.09 | 0.68 | 0 | |||||||
16 | exam-book-cn/v1 | 1 | 1.15 | 1.15 | 0.92 | 0.96 | 0.04 | 0 | 0.92 | |||||||||||||||||||||||||||||||
17 | Law | legal_case-law-cn/v2 | (Filtered) | 0.9 | 0.9 | 0.9 | 0.9 | 0.9 | 0.9 | 0.9 | 0.9 | 0.9 | 0.9 | 0.9 | 0.9 | 0.9 | 0.9 | 0.9 | 0.9 | 0.9 | 0.9 | 0.9 | 0.9 | 0.9 | 0.9 | 0.9 | 0.9 | 0.9 | 9 | 81.44 | 72.44 | 9 | 0 | |||||||
18 | Government | https://huggingface.co/datasets/liwu/MNBVC | gov/20230172/XueXiQiangGuo_cleaned | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.025 | 0.21 | 0.21 | 0 | 0.21 | 0 | |||||||||||||||||||||
19 | gov/20230172/GovReport | 0.01 | 0.004 | -0.004 | 0.004 | 0 | ||||||||||||||||||||||||||||||||||
20 | News | news/20230196/news_peoples_daily_cleaned | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 1.44 | 0.72 | -0.72 | Epoch=2 | 1.44 | 0 | ||||||||||||||
21 | Knowlege about Renmin University of China | aa_mini/rucweb | 0.01 | 0.004 | 0.004 | 0 | 0.004 | 0 | ||||||||||||||||||||||||||||||||
22 | ||||||||||||||||||||||||||||||||||||||||
23 | Code | Code | https://huggingface.co/datasets/ise-uiuc/Magicoder-OSS-Instruct-75K | 1 | 0.15 | 0.175 | 0.13 | 0.13 | 0 | 0 | 0.13 | |||||||||||||||||||||||||||||
24 | https://huggingface.co/datasets/vikp/textbook-quality-programming | 1 | 0.15 | 0.125 | 0.11 | 0.11 | 0 | 0 | 0.11 | |||||||||||||||||||||||||||||||
25 | https://huggingface.co/datasets/yulan-team/YuLan-Mini-Text-Datasets | code-the_stack_v2_python_cleaned_scored_dedup-score_1 | 2.3 | 0.965 | 1.306 | 7.23 | 5.924 | 1.306 | 0 | |||||||||||||||||||||||||||||||
26 | code-the_stack_v2_python_cleaned_scored_dedup-score_2 | 0.61 | 0.61 | 0.61 | 0.735 | 0.76 | 0.91 | 1.01 | 1.11 | 1.51 | 2.01 | 2.11 | 2.21 | 2.21 | 2.21 | 3.465 | 5.015 | 5.015 | 5.325 | 5.54 | 5.6225 | 5.8275 | 6.3275 | 5.7775 | 5.7775 | 5.8775 | 0.8775 | 0.725 | 31.915 | 32.05 | 0.135 | 31.915 | 0 | |||||||
27 | code-the_stack_v2_python_cleaned_scored_dedup-score_3 | 1.58 | 1.455 | 1.355 | 1.355 | 1.43 | 1.58 | 1.605 | 1.755 | 1.705 | 1.705 | 1.705 | 1.705 | 1.705 | 1.705 | 0.55 | 9.158 | 9.27 | 0.112 | 9.158 | 0 | |||||||||||||||||||
28 | code-the_stack_v2_python_cleaned_scored_dedup-score_4 | 0.3 | 0.4 | 0.5 | 0.5 | 0.4 | 0.3 | 0.2 | 0.2 | 1.12 | 1.12 | 0 | 1.12 | 0 | ||||||||||||||||||||||||||
29 | code-the_stack_v2_python_cleaned_scored_dedup-score_5 | 0.025 | 0.01 | 0 | -0.01 | 0.01 | 0 | |||||||||||||||||||||||||||||||||
30 | code-the-stack-v2-Shell | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.275 | 0.25 | 0.25 | 0.91 | 0.93 | 0.02 | 0.91 | 0 | ||||||||||||||||||||||||||
31 | code-the-stack-v2-SQL | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.6 | 0.6 | 0.6 | 0.6 | 0.6 | 0.6 | 0.6 | 0.54 | 0.54 | 7.872 | 13.72 | 5.848 | 7.872 | 0 | |||||||
32 | code-the-stack-v2-Java | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.5175 | 6.287 | 6.4 | 0.113 | 6.287 | 0 | ||||||||||||||
33 | code-the-stack-v2-JavaScript | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.525 | 0.525 | 0.525 | 0.25 | 2.93 | 2.97 | 0.04 | 2.93 | 0 | |||||||||||||||||||
34 | code-the-stack-v2-TypeScript | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.475 | 0.475 | 0.475 | 0.475 | 0.475 | 0.475 | 0.09 | 3.376 | 3.47 | 0.094 | 3.376 | 0 | ||||||||||||||||
35 | code-the-stack-v2-Go | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.32 | 0.32 | 4.256 | 5.97 | 1.714 | 4.256 | 0 | |||||||
36 | code-the-stack-v2-Rust | 1.25 | 1.25 | 1.25 | 1 | 1 | 0.75 | 0.75 | 0.5 | 0.5 | 0.5 | 0.3 | 3.62 | 3.62 | 0 | 3.62 | 0 | |||||||||||||||||||||||
37 | code-the-stack-v2-R | 0.75 | 0.75 | 0.75 | 0.75 | 0.75 | 0.75 | 0.75 | 0.75 | 0.75 | 0.75 | 0.75 | 0.75 | 0.75 | 0.75 | 0.75 | 0.75 | 0.75 | 0.825 | 0.75 | 0.75 | 0.75 | 0.75 | 0.31 | 6.754 | 6.85 | 0.096 | 6.754 | 0 | |||||||||||
38 | code-the-stack-v2-HTML | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.24 | 0.2 | 3.176 | 5.36 | 2.184 | 3.176 | 0 | |||||||
39 | code-the-stack-v2-Coq | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.425 | 0.97 | 1 | 0.03 | 0.97 | 0 | ||||||||||||||||||||||||||||
40 | code-the-stack-v2-Lean | 0.05 | 0.05 | 0.05 | 0.05 | 0.08 | 0.08 | 0 | 0.08 | 0 | ||||||||||||||||||||||||||||||
41 | code-the-stack-v2-Jupyter_Notebook-md_scored_classified-score_1 | 0.15 | 0.15 | 0.15 | 0.15 | 0.15 | 1.32 | 1.41 | 1.41 | 1.56 | 1.56 | 1.56 | 2.06 | 2.375 | 2.475 | 2.48 | 3.48 | 1 | 9.376 | 9.48 | 0.104 | 9.376 | 0 | |||||||||||||||||
42 | code-the-stack-v2-Jupyter_Notebook-md_scored_classified-score_2 | 1.985 | 1.585 | 1.685 | 1.785 | 1.88 | 1.975 | 2.085 | 2.185 | 2.66 | 2.66 | 2.21 | 2.21 | 2.26 | 2.26 | 2.26 | 0.09 | 12.71 | 12.71 | 0 | 12.71 | 0 | ||||||||||||||||||
43 | code-the-stack-v2-Jupyter_Notebook-md_scored_classified-score_3 | 2 | 2.3 | 2.1 | 0.12 | 1.72 | 1.5 | 1.3 | 1.1 | 0.7 | 0.2 | 5.216 | 6.05 | 0.834 | 5.216 | 0 | ||||||||||||||||||||||||
44 | code-the-stack-v2-Jupyter_Notebook-md_scored_classified-score_4 | 0.2 | 0.3 | 0.3 | 0.3 | 0.2 | 0.2 | 0.1 | 0.1 | 0.025 | 0.025 | 0.7 | 0.73 | 0.03 | 0.7 | 0 | ||||||||||||||||||||||||
45 | code-the-stack-v2-Jupyter_Notebook-md_scored_classified-score_5 | 0.01 | 0.004 | 0 | -0.004 | 0.004 | 0 | |||||||||||||||||||||||||||||||||
46 | code-the-stack-v2-Jupyter_Notebook-md2_scored_classifier-score_1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.15 | 0.15 | 0.15 | 0.15 | 0.425 | 0.625 | 0.625 | 0.625 | 0.625 | 0.625 | 0.625 | 0.135 | 2.204 | 2.29 | 0.086 | 2.204 | 0 | ||||||||||||||||
47 | code-the-stack-v2-Jupyter_Notebook-md2_scored_classifier-score_2 | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.55 | 0.55 | 0.575 | 0.575 | 0.575 | 0.575 | 0.2 | 2.84 | 2.9 | 0.06 | 2.84 | 0 | ||||||||||||||||||
48 | code-the-stack-v2-Jupyter_Notebook-md2_scored_classifier-score_3 | 0.2 | 0.3 | 0.375 | 0.5 | 0.5 | 0.4 | 0.3 | 0.2 | 0.1 | 1.15 | 1.17 | 0.02 | 1.15 | 0 | |||||||||||||||||||||||||
49 | code-the-stack-v2-Jupyter_Notebook-md2_scored_classifier-score_4 | 0.025 | 0.075 | 0.075 | 0.075 | 0.075 | 0.075 | 0.16 | 0.16 | 0 | 0.16 | 0 | ||||||||||||||||||||||||||||
50 | code-the-stack-v2-Jupyter_Notebook-md2_scored_classifier-score_5 | 0.01 | 0.004 | 0 | -0.004 | 0.004 | 0 | |||||||||||||||||||||||||||||||||
51 | code-starcoderdata-c | 1.8 | 1.8 | 1.8 | 1.8 | 1.8 | 1.8 | 1.8 | 1.8 | 1.8 | 1.8 | 1.8 | 1.8 | 1.8 | 1.8 | 1.8 | 3 | 3 | 3 | 3 | 3 | 3 | 2 | 2 | 2 | 2 | 0.4375 | 21.375 | 21.5 | 0.125 | 21.375 | 0 | ||||||||
52 | code-starcoderdata-cpp | 3.2 | 3.2 | 3.2 | 3.2 | 3.2 | 3.2 | 3.2 | 3.2 | 3.2 | 3.2 | 3.2 | 3.2 | 3.2 | 3.2 | 1.7 | 18.6 | 18.64 | 0.04 | 18.6 | 0 | |||||||||||||||||||
53 | code-ioccc | 1 | 0.01 | 0.004 | 0 | -0.004 | 0 | 0.004 | ||||||||||||||||||||||||||||||||
54 | code-starcoder_smollm_dedup-starcoder_dedup | 2.3 | 2.3 | 2.3 | 2.3 | 2.3 | 2.3 | 2.3 | 2.3 | 2.3 | 2.3 | 2.3 | 2.3 | 2.3 | 2.3 | 2.3 | 2.3 | 2.3 | 2.3 | 2.3 | 2.3 | 2.3 | 0.75 | 19.62 | 19.62 | 0 | 19.62 | 0 | ||||||||||||
55 | code-starcoder_smollm_dedup-smollm_dedup | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 4.4 | 4.46 | 0.06 | 4.4 | 0 | ||||||||||||
56 | code-MNBVC-code-googlecode-filter_copyright-cc | 1 | 1 | 1 | 1 | 1 | 0.15 | 2.06 | 2.07 | 0.01 | 2.06 | 0 | ||||||||||||||||||||||||||||
57 | code-MNBVC-code-googlecode-filter_copyright-cpp | 1.5 | 1 | 1 | 0.5 | 0.275 | 1.71 | 1.73 | 0.02 | 1.71 | 0 | |||||||||||||||||||||||||||||
58 | code-MNBVC-code-googlecode-filter_copyright-c | 0.5 | 0.725 | 1 | 1.525 | 2.5 | 1.4 | 1.4 | 1.0265 | 4.0306 | 4.05 | 0.0194 | 4.0306 | 0 | ||||||||||||||||||||||||||
59 | code-MNBVC-code-googlecode-filter_copyright-h | 0.2 | 0.2 | 0.2 | 0.2 | 0.1 | 0.36 | 4 | 3.64 | 0.36 | 0 | |||||||||||||||||||||||||||||
60 | code-MNBVC-code-googlecode-filter_copyright-java | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.3175 | 0.3175 | 0.3175 | 0.3175 | 0.5175 | 0.4 | 0.0625 | 1.1 | 1.12 | 0.02 | 1.1 | 0 | ||||||||||||||||||||||
61 | code-MNBVC-code-googlecode-filter_copyright-js | 0.275 | 0.525 | 0.525 | 0.525 | 0.525 | 0.04 | 0.966 | 0.97 | 0.004 | 0.966 | 0 | ||||||||||||||||||||||||||||
62 | code-MNBVC-code-googlecode-filter_copyright-lua | 0.0025 | 0.001 | 0.001 | 0 | 0.001 | 0 | |||||||||||||||||||||||||||||||||
63 | code-MNBVC-code-googlecode-filter_copyright-php | 0.025 | 0.01 | 0.01 | 0 | 0.01 | 0 | |||||||||||||||||||||||||||||||||
64 | code-MNBVC-code-googlecode-filter_copyright-rs | 0.1 | 0.2 | 0.2 | 0.2 | 0.1 | 0.32 | 0.32 | 0 | 0.32 | 0 | |||||||||||||||||||||||||||||
65 | code-MNBVC-code-googlecode-filter_copyright-sh | 0.1 | 0.04 | 0.04 | 0 | 0.04 | 0 | |||||||||||||||||||||||||||||||||
66 | code-MNBVC-code-googlecode-filter_copyright-swift | 0.0025 | 0.001 | 0.001 | 0 | 0.001 | 0 | |||||||||||||||||||||||||||||||||
67 | code-MNBVC-code-googlecode-filter_copyright-ts | 0.075 | 0.025 | 0.025 | 0.05 | 0.04 | -0.01 | 0.05 | 0 | |||||||||||||||||||||||||||||||
68 | code-MNBVC-code-googlecode-filter_copyright-go | 0.2 | 0.2 | 0.2 | 0.2 | 0.25 | 0.42 | 0.42 | 0 | 0.42 | 0 | |||||||||||||||||||||||||||||
69 | code-MNBVC-code-googlecode-py-score-1 | 0.2 | 0.15 | 0.14 | 0.15 | 0.01 | 0.14 | 0 | ||||||||||||||||||||||||||||||||
70 | code-MNBVC-code-googlecode-py-score-2 | 0.625 | 0.3 | 0.37 | 0.37 | 0 | 0.37 | 0 | ||||||||||||||||||||||||||||||||
71 | code-MNBVC-code-googlecode-py-score-3 | 0.1 | 0.075 | 0.07 | 0.07 | 0 | 0.07 | 0 | ||||||||||||||||||||||||||||||||
72 | code-MNBVC-code-googlecode-py-score-4 | 0.025 | 0.01 | 0.01 | 0 | 0.01 | 0 | |||||||||||||||||||||||||||||||||
73 | code-MNBVC-code-matlab | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.125 | 0.53 | 0.56 | 0.03 | 0.53 | 0 | |||||||||||||||||||||||||||
74 | code-wjp_scored-code-score_1 | 1 | 0.4 | 0.25 | 0.26 | 0.26 | 0 | 使用the-stack-v2的4~5分数据筛选出代码相关的作为种子数据,使用llama3/mistral采用oss方法生成的 | 0 | 0.26 | ||||||||||||||||||||||||||||||
75 | code-wjp_scored-code-score_2 | 1 | 0.6 | 0.55 | 0.46 | 0.46 | 0 | 0 | 0.46 | |||||||||||||||||||||||||||||||
76 | code-wjp_scored-code-score_3 | 1 | 0.2 | 0.2 | 0.2 | 1.325 | 1 | 1.17 | 1.17 | 0 | 0 | 1.17 | ||||||||||||||||||||||||||||
77 | code-wjp_scored-code-score_4 | 1 | 0.075 | 0.075 | 0.43 | 0.43 | 0.63 | 0.43 | 0.43 | 0.43 | 0.23 | 0.005 | 1.266 | 1.32 | 0.054 | 0 | 1.266 | |||||||||||||||||||||||
78 | code-wjp_scored-leetcode-score_0 | 0.025 | 0.01 | 0.01 | 0 | 0.01 | 0 | |||||||||||||||||||||||||||||||||
79 | code-wjp_scored-leetcode-score_1 | 0.025 | 0.15 | 0.15 | 0.13 | 0.13 | 0 | 0.13 | 0 | |||||||||||||||||||||||||||||||
80 | code-wjp_scored-leetcode-score_2 | 0.075 | 0.15 | 0.25 | 0.05 | 0.21 | 0.21 | 0 | 0.21 | 0 | ||||||||||||||||||||||||||||||
81 | code-wjp_scored-leetcode-score_3 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.3 | 2.2 | 2.2 | 0 | 2.2 | 0 | ||||||||||||||||||||
82 | code-wjp_scored-leetcode-score_4 | 0.065 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.2 | 0.06 | 1.57 | 1.57 | 0 | 1.57 | 0 | |||||||||||||||||||
83 | code-wjp_scored-leetcode-score_5 | 0.01 | 0.004 | 0 | -0.004 | 0.004 | 0 | |||||||||||||||||||||||||||||||||
84 | https://huggingface.co/datasets/OpenCoder-LLM/opc-sft-stage1 | (English only) | 1 | 0.5 | 0.5 | 0.635 | 0.635 | 1.55 | 0.41 | 1.692 | 1.73 | 0.038 | 0 | 1.692 | ||||||||||||||||||||||||||
85 | https://huggingface.co/datasets/OpenCoder-LLM/opc-sft-stage2 | (Regenerate using Qwen2.5-7B-Instruct) | 1 | 0.375 | 0.19 | 0.21 | 0.31 | 0.31 | 0 | 0 | 0.31 | |||||||||||||||||||||||||||||
86 | https://huggingface.co/datasets/yulan-team/YuLan-Mini-Text-Datasets | code-code_gen_qwen-coder-score3 | 0.05 | 0.02 | 0.02 | 0 | 使用the-stack-v2的4~5分数据筛选出代码相关的作为种子数据,使用qwen2.5-inst和qwen2.5-coder-inst采用oss方法生成的 | 0.02 | 0 | |||||||||||||||||||||||||||||||
87 | code-code_gen_qwen-coder-score4 | 0.15 | 0.06 | 0.15 | 0.09 | 0.06 | 0 | |||||||||||||||||||||||||||||||||
88 | code-code_gen_qwen-inst-score3 | 0.075 | 0.03 | 0.03 | 0 | 0.03 | 0 | |||||||||||||||||||||||||||||||||
89 | code-code_gen_qwen-inst-score4 | 0.225 | 0.09 | 0.2 | 0.11 | 0.09 | 0 | |||||||||||||||||||||||||||||||||
90 | code-code_gen_qwen-filter-batch1-score3 | 1 | 0.86 | 0.25 | 0.444 | 0.46 | 0.016 | 0 | 0.444 | |||||||||||||||||||||||||||||||
91 | code-code_gen_qwen-filter-batch1-score4 | 1 | 1.7 | 0.95 | 1.06 | 1.06 | 0 | 0 | 1.06 | |||||||||||||||||||||||||||||||
92 | code-code_gen_qwen-filter-batch1-score5 | 1 | 0.425 | 0.2 | 0.25 | 0.25 | 0 | 0 | 0.25 | |||||||||||||||||||||||||||||||
93 | code-code_gen_qwen-filter-batch2-score3 | 1 | 0.4 | 0.1 | 0.2 | 0.21 | 0.01 | 0 | 0.2 | |||||||||||||||||||||||||||||||
94 | code-code_gen_qwen-filter-batch2-score4 | 1 | 1.1125 | 0.3125 | 0.57 | 0.57 | 0 | 0 | 0.57 | |||||||||||||||||||||||||||||||
95 | code-code_gen_qwen-filter-batch2-score5 | 1 | 0.2 | 0.125 | 0.13 | 0.13 | 0 | 0 | 0.13 | |||||||||||||||||||||||||||||||
96 | code-code_gen_qwen-filter-batch3-score3 | 1 | 0.725 | 0.29 | 0.81 | 0.52 | 0 | 0.29 | ||||||||||||||||||||||||||||||||
97 | code-code_gen_qwen-filter-batch3-score4 | 1 | 4.15 | 1.66 | 1.66 | 0 | 0 | 1.66 | ||||||||||||||||||||||||||||||||
98 | code-code_gen_qwen-filter-batch3-score5 | 1 | 0.5 | 0.2 | 0.2 | 0 | 0 | 0.2 | ||||||||||||||||||||||||||||||||
99 | https://huggingface.co/datasets/OpenCoder-LLM/opc-annealing-corpus | synthetic_qa-lang/javascript | 1 | 0.05 | 0.05 | 0.15 | 0.05 | 0.12 | 0.12 | 0 | 按照语言分类 | 0 | 0.12 | |||||||||||||||||||||||||||
100 | synthetic_qa-lang/cpp | 1 | 0.05 | 0.05 | 0.15 | 0.15 | 0.16 | 0.16 | 0 | 0 | 0.16 | |||||||||||||||||||||||||||||