Welcome to the Chinese language database!
This spreadsheet consists of a lot of lists that can be divided into two somewhat unconnected parts. The first half consists of general information about Chinese characters and words that you can use for research or for your own statistics. The database lists the following information for the characters: frequency and percentage in the corpus (where applicable), general standad number (if any), HSK level (if any), radical, number of strokes, pronounciation, meaning. For the words, there are frequency and percentage in the corpuse, and HSK level (if any).
The DatabaseCharactersDataFrequencyThe list contains top 9933 most frequent Chinese characters based on Jun Da's Modern Chinese Character Frequency List, in the order of decreasing frequency. The number of encounters in the corpus is recalculated into a percentage. The additional and missing information was gathered on HanziDB, Wikidictionary, as well as filled in by hand.
General StandardThe list contains the General Standard Chinese Characters, all 8105 of them, in their numbered order. The first 3500 characters are considered "frequent", the first 6500 are considered "common", the other 1605 are considered "rare". The additional and missing information was gathered on HanziDB, Wikidictionary, as well as filled in by hand.
MergedThe list contains the merging of the most frequent Chinese characters and the General Standard Chinese Characters, total of 11062 characters, in the dictionary-like radical order + frequency order. The additional and missing information was gathered on HanziDB, Wikidictionary, as well as filled in by hand.
HSK 2.0The list contatins all the characters that are included into Hanyu Shuiping Kaoshi 2.0 (pre-2021), a total of 2663 characters, divided into six blocks by level and in the order of decreasing frequency within each of them. The additional and missing information was gathered on HanziDB, Wikidictionary, as well as filled by hand.
HSK 3.0The list contatins all the characters that are included into Hanyu Shuiping Kaoshi 2.0 (post-2021), a total of 3000 characters, divided into nine blocks by band and in the order of decreasing frequency within each of them. The additional and missing information was gathered on HanziDB, Wikidictionary, as well as filled by hand.
StatisticsThe list presents various statistics of the database. Frequency-independent statistics, like syllables, tones, stroke counts, are based on the merged data; the frequency-based statistics, naturally, only takes into account the data in the frequency list. This includes the listings of all syllables in the corpus and various statistical figures and graphs.
WordsFrequencyThe list contains 93279 most frequent multi-character words based on the BLCU Chinese Corpus, in order of decreasing frequency. Single-character words, words with English letters and fewer than 2000 encounters are filtered out. The number of encounters in the corpus is recalculated into a percentage.
HSK 2.0The list contatins all the multi-character words that are included into Hanyu Shuiping Kaoshi 2.0 (pre-2021), a total of 4287 words, divided into six blocks by level and alphabetically within each of them.
HSK 3.0The list contatins all the multi-character words that are included into Hanyu Shuiping Kaoshi 3.0 (post-2021), a total of 9433 words, divided into nine bands by level and alphabetically within each of them.
The second part of the spreadsheet is aimed at keeping track of my own learning of the language. The first four lists are comprised of pieces of language that I have already learned and their statistics, while the last seven lists keep track of my progress in the goals that I have set for myself.
My learningDataCharactersThe list is dedicated to keeping progress of my own learning of Chinese characters. The dictionary itself is looking up statistics for the characters learned, the cumulative language percent is also calculated. The list also shows various statistics specifically for the characters that I learned, like frequency, HSK levels and language coverage.
WordsThe list is dedicated to keeping progress of my own learning of Chinese words. The dictionary itself is looking up statistics for the words learned, the cumulative language percent is also calculated. The list also shows various statistics specifically for the words that I learned, like frequency, HSK levels and language coverage.
SyllablesThe list shows all various syllables that I have encountered, both with tone and without it. For each case, there is an alphabetic list and a frequency list, and for each syllable, an example character is provided, the first to have been learned.
GoalsCharactersMost frequentThe list shows a representitive table of all characters that fall under my first goal: "Learn the 3000 most frequent Chinese characters". The learned characters are shown and the percentage is calculated.
"Frequent"The list shows a representitive table of all characters that fall under my second goal: "Learn 3500 frequent characters from Chinese General Standard". The learned characters are shown and the percentage is calculated.
HSK 2.0 Levels 1–6The list shows a representitive table of all characters that fall under my third goal: "Learn all 2663 characters from HSK levels 1–6". The learned characters are shown and the percentage is calculated.
HSK 3.0 Bands 1–9The list shows a representitive table of all characters that fall under my fourth goal: "Learn all 3000 characters from HSK bands 1–9". The learned characters are shown and the percentage is calculated.
WordsMost frequentThe list shows a representitive table of all characters that fall under my first goal: "Learn the 5000 most frequent Chinese words". The learned characters are shown and the percentage is calculated.
HSK 2.0 Levels 1–6The list shows a representitive table of all characters that fall under my second goal: "Learn all 4287 words from HSK levels 1–6". The learned characters are shown and the percentage is calculated.
HSK 3.0 Bands 1–6The list shows a representitive table of all characters that fall under my third goal: "Learn all 9433 words from HSK bands 1–9". The learned characters are shown and the percentage is calculated.