ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
General info
2
3
This file is a series of Pulau Bahasa Word Lists (PBWL) prepared by MsFixer (the primary author).
4
5
Make sure that the file you are now referring to is the latest version by checking the official website at
6
https://pulaubahasa.wordpress.com/vocab-builders/pbwl
7
8
Do not share the file with others by email, etc. Rather, encourage them to download the latest version directly from the website.
9
10
This file is released under the Creative Commons BY-NC-SA-4.0 license: basically free for your personal educational purpose
11
Learn more about the license and other legal matters on the FAQ page of the PBWL official website at:
12
https://pulaubahasa.wordpress.com/vocab-builders/pbwl/#faqs
13
14
The PBWL author may give additional permission to commercial users on a request basis. Please contact the author at:
15
https://pulaubahasa.wordpress.com/contact/
16
17
Series Info
18
19
Pulau Bahasa Word Lists (PBWL) consist of the following Microsoft EXCEL files and worksheets:
20
21
(1) Master
22
Contains 440,000+ word tokens as a comprehensive raw data set to process the other five files.
23
Distributed on a request basis only mainly for application developers and linguistic data scientists (i.e. tech geeks). Contact the author at:
24
https://pulaubahasa.wordpress.com/contact/
25
26
(2) Root
27
Excludes "garbage" tokens from the (1) Master file.
28
Picks up 8,400+ meaningful root words (equivalent to 26K lemmas on the KBBI calculation method).
29
Categorizes root words by CEFR level; Each root word has up to top 3 common lemmas.
30
Marks which root words taught by Duolingo/Clozemaster.
31
Enables you to assess your vocab size.
32
33
(3) Acronym
34
Appendix of the (2) Root file.
35
36
(4) Country
37
Appendix of the (2) Root file.
38
Compiles names of countries and their currencies as well as of regions, ethnic groups and languages across multi-counties.
39
Excludes cities and local ethnic groups within a single country.
40
41
YOU'RE HERE ==>
(5) Lemma
42
Long list of 28K meaningful words (lemmas) not grouped by root word but simply sorted by the LCC ranking data.
43
Excludes "garbage" tokens from the (1) Master file.
44
Includes all acronyms and countries names (i.e. not need to additionally download (3) Acronym and (4) Country files)
45
Suitable for learners who hate the "bulk" memorizing method.
46
47
(6) Unchecked
48
List of word tokens from the (1) Master file that have not been checked by the PBWL author whether they are garbage or meaningful lemmas.
49
Learn more about the latest checking progress report at:
50
https://pulaubahasa.wordpress.com/vocab-builders/pbwl/#progress-summary
51
52
53
Legend of Each Column
54
55
Spelling
56
Y (meaningful word tokens)
57
BK
Meaning: "bentuk baku" or standard spelling defined by KBBI under the initiative of the Ministry of Education
58
TB
Meaning: "bentuk tidak baku" or non-standard spelling
59
BG
Meaning: "bahasa gaul" or slang/informal broken language rejected by KBBI but frequently used especially on social media
60
SS
Meaning: "shortened form from a single word" (e.g., "yang" --> "yg")
61
SC
Meaning: "shortened form by compounding several words" or "acronym" (e.g., "SD" = "Sekolah Dasar", elementary school)
62
SCT
SC (acronym) and TB (non-standard spelling)
63
SM
Meaning: "shortened form with multiple meanings" (e.g., "PB" = 1) "perserikatan bangsa-bangsa" (the United Nations), 2) "pajak bumi dan bangunan" (real estate tax), or 3) "peraturan baris-berbaris" (regulation on marching/parades))
64
N (garbage)
65
AG
Meaning: "agglutinative" form (e.g., "apel-apel" in a plural form of "apel" (apple), "apelku" with a possesive noun (my apple))
66
LF
Meaning: "loan word from a foreign language" that are not converted into the standard Indonesian spelling (e.g., "software")
67
LFT
LF (loan word) and TB (non-standard spelling) (e.g., "vodka" is LF and "wodka" is LFT)
68
UK
Meaning: "unknown" because the word token is not in KBBI, SEAlang, and IndoDic (top three reliable dictionaries) but seems to be meaningful
69
GB
Meaning: "garbage" such as typos, unique nouns and ones written in different writing scripts
70
W (work in progress)
71
PG
Meaning: "probably garbage" because not listed on LCC but only OpenSubsitle (OS) frequency list
72
Blank
Not checked at all -- could be garbage or meaningful
73
74
Note: (1) Master file contains all of the spelling categories while (2) Root, (3) Acronym, (4) Country, and (5) Lemma exclude "N", "W" and their sub-categories
75
76
Covered by Duolingo/Clozemaster/Pulau Bahasa
77
Duolingo MT (main tree or "Indonesian course for English speakers)
78
YE
Meaning: "Yes, Duolingo MT teaches you the exact word token" (e.g., "diduga" (believed; speculated) in a passive form from the root word "duga")
79
Y
Meaning: "Yes, Duolingo MT teaches you the lemma" (e.g., "menduga" in an active form is not taught but the passive form is by Duolingo MT)
80
R
Meaning: "Duolingo MT teaches its related lemma under the same word family" (e.g., "penduga" as a gerund or noun verb of "menduga" is not taught but easily guessed from the root word or related words)
81
N
Meaning: "No, Duolingo does not teach you any of this word and its related words
82
Blank
Meaning: Not checked yet but very likely to be "N"
83
84
All words taught by Duolingo MT (YE) are listed on:
85
https://forum.duome.eu/viewtopic.php?t=7012-indonesian-from-english-word-list-44-units
86
87
Clozemaster
88
This column consists of two elements: Number and Alphabetical character
89
Number
Meaning: the name of the Most Common Words Collection (e.g., 1,000 means "the 1,000 Most Common Words Collection on Clozemaster)
90
C
Meaning: "taught as a cloze-word that you need to fill in" in the text type-in mode
91
S
Meaning: "not a cloze-word, but yes, you can learn from a sentence" especialy in the full-sentence transcribe mode
92
Y
Same as Duolingo
93
R
Same as Duolingo
94
N
Same as Duolingo
95
Blank
Same as Duolingo
96
97
Suppose that a sentence "He {{chose}} a big apple" is in the 2,000 Most Common Words Collection where {{chose}} is the cloze-word deleted from the sentence and you need to type in.
98
"chose" (past form) is marked "2,000C"
99
"choose" (present base form), "chosen" (past participle) and "chooses" (3rd person pronoun) are marked "2,000Y"
100
"he", "a", "big" and "apple" are marked "2,000S"