A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | General info | |||||||||||||||||||||||||
2 | ||||||||||||||||||||||||||
3 | This file is a series of Pulau Bahasa Word Lists (PBWL) prepared by MsFixer (the primary author). | |||||||||||||||||||||||||
4 | ||||||||||||||||||||||||||
5 | Make sure that the file you are now referring to is the latest version by checking the official website at | |||||||||||||||||||||||||
6 | https://pulaubahasa.wordpress.com/vocab-builders/pbwl | |||||||||||||||||||||||||
7 | ||||||||||||||||||||||||||
8 | Do not share the file with others by email, etc. Rather, encourage them to download the latest version directly from the website. | |||||||||||||||||||||||||
9 | ||||||||||||||||||||||||||
10 | This file is released under the Creative Commons BY-NC-SA-4.0 license: basically free for your personal educational purpose | |||||||||||||||||||||||||
11 | Learn more about the license and other legal matters on the FAQ page of the PBWL official website at: | |||||||||||||||||||||||||
12 | https://pulaubahasa.wordpress.com/vocab-builders/pbwl/#faqs | |||||||||||||||||||||||||
13 | ||||||||||||||||||||||||||
14 | The PBWL author may give additional permission to commercial users on a request basis. Please contact the author at: | |||||||||||||||||||||||||
15 | https://pulaubahasa.wordpress.com/contact/ | |||||||||||||||||||||||||
16 | ||||||||||||||||||||||||||
17 | Series Info | |||||||||||||||||||||||||
18 | ||||||||||||||||||||||||||
19 | Pulau Bahasa Word Lists (PBWL) consist of the following Microsoft EXCEL files and worksheets: | |||||||||||||||||||||||||
20 | ||||||||||||||||||||||||||
21 | (1) Master | |||||||||||||||||||||||||
22 | Contains 440,000+ word tokens as a comprehensive raw data set to process the other five files. | |||||||||||||||||||||||||
23 | Distributed on a request basis only mainly for application developers and linguistic data scientists (i.e. tech geeks). Contact the author at: | |||||||||||||||||||||||||
24 | https://pulaubahasa.wordpress.com/contact/ | |||||||||||||||||||||||||
25 | ||||||||||||||||||||||||||
26 | (2) Root | |||||||||||||||||||||||||
27 | Excludes "garbage" tokens from the (1) Master file. | |||||||||||||||||||||||||
28 | Picks up 8,400+ meaningful root words (equivalent to 26K lemmas on the KBBI calculation method). | |||||||||||||||||||||||||
29 | Categorizes root words by CEFR level; Each root word has up to top 3 common lemmas. | |||||||||||||||||||||||||
30 | Marks which root words taught by Duolingo/Clozemaster. | |||||||||||||||||||||||||
31 | Enables you to assess your vocab size. | |||||||||||||||||||||||||
32 | ||||||||||||||||||||||||||
33 | (3) Acronym | |||||||||||||||||||||||||
34 | Appendix of the (2) Root file. | |||||||||||||||||||||||||
35 | ||||||||||||||||||||||||||
36 | (4) Country | |||||||||||||||||||||||||
37 | Appendix of the (2) Root file. | |||||||||||||||||||||||||
38 | Compiles names of countries and their currencies as well as of regions, ethnic groups and languages across multi-counties. | |||||||||||||||||||||||||
39 | Excludes cities and local ethnic groups within a single country. | |||||||||||||||||||||||||
40 | ||||||||||||||||||||||||||
41 | YOU'RE HERE ==> | (5) Lemma | ||||||||||||||||||||||||
42 | Long list of 28K meaningful words (lemmas) not grouped by root word but simply sorted by the LCC ranking data. | |||||||||||||||||||||||||
43 | Excludes "garbage" tokens from the (1) Master file. | |||||||||||||||||||||||||
44 | Includes all acronyms and countries names (i.e. not need to additionally download (3) Acronym and (4) Country files) | |||||||||||||||||||||||||
45 | Suitable for learners who hate the "bulk" memorizing method. | |||||||||||||||||||||||||
46 | ||||||||||||||||||||||||||
47 | (6) Unchecked | |||||||||||||||||||||||||
48 | List of word tokens from the (1) Master file that have not been checked by the PBWL author whether they are garbage or meaningful lemmas. | |||||||||||||||||||||||||
49 | Learn more about the latest checking progress report at: | |||||||||||||||||||||||||
50 | https://pulaubahasa.wordpress.com/vocab-builders/pbwl/#progress-summary | |||||||||||||||||||||||||
51 | ||||||||||||||||||||||||||
52 | ||||||||||||||||||||||||||
53 | Legend of Each Column | |||||||||||||||||||||||||
54 | ||||||||||||||||||||||||||
55 | Spelling | |||||||||||||||||||||||||
56 | Y (meaningful word tokens) | |||||||||||||||||||||||||
57 | BK | Meaning: "bentuk baku" or standard spelling defined by KBBI under the initiative of the Ministry of Education | ||||||||||||||||||||||||
58 | TB | Meaning: "bentuk tidak baku" or non-standard spelling | ||||||||||||||||||||||||
59 | BG | Meaning: "bahasa gaul" or slang/informal broken language rejected by KBBI but frequently used especially on social media | ||||||||||||||||||||||||
60 | SS | Meaning: "shortened form from a single word" (e.g., "yang" --> "yg") | ||||||||||||||||||||||||
61 | SC | Meaning: "shortened form by compounding several words" or "acronym" (e.g., "SD" = "Sekolah Dasar", elementary school) | ||||||||||||||||||||||||
62 | SCT | SC (acronym) and TB (non-standard spelling) | ||||||||||||||||||||||||
63 | SM | Meaning: "shortened form with multiple meanings" (e.g., "PB" = 1) "perserikatan bangsa-bangsa" (the United Nations), 2) "pajak bumi dan bangunan" (real estate tax), or 3) "peraturan baris-berbaris" (regulation on marching/parades)) | ||||||||||||||||||||||||
64 | N (garbage) | |||||||||||||||||||||||||
65 | AG | Meaning: "agglutinative" form (e.g., "apel-apel" in a plural form of "apel" (apple), "apelku" with a possesive noun (my apple)) | ||||||||||||||||||||||||
66 | LF | Meaning: "loan word from a foreign language" that are not converted into the standard Indonesian spelling (e.g., "software") | ||||||||||||||||||||||||
67 | LFT | LF (loan word) and TB (non-standard spelling) (e.g., "vodka" is LF and "wodka" is LFT) | ||||||||||||||||||||||||
68 | UK | Meaning: "unknown" because the word token is not in KBBI, SEAlang, and IndoDic (top three reliable dictionaries) but seems to be meaningful | ||||||||||||||||||||||||
69 | GB | Meaning: "garbage" such as typos, unique nouns and ones written in different writing scripts | ||||||||||||||||||||||||
70 | W (work in progress) | |||||||||||||||||||||||||
71 | PG | Meaning: "probably garbage" because not listed on LCC but only OpenSubsitle (OS) frequency list | ||||||||||||||||||||||||
72 | Blank | Not checked at all -- could be garbage or meaningful | ||||||||||||||||||||||||
73 | ||||||||||||||||||||||||||
74 | Note: (1) Master file contains all of the spelling categories while (2) Root, (3) Acronym, (4) Country, and (5) Lemma exclude "N", "W" and their sub-categories | |||||||||||||||||||||||||
75 | ||||||||||||||||||||||||||
76 | Covered by Duolingo/Clozemaster/Pulau Bahasa | |||||||||||||||||||||||||
77 | Duolingo MT (main tree or "Indonesian course for English speakers) | |||||||||||||||||||||||||
78 | YE | Meaning: "Yes, Duolingo MT teaches you the exact word token" (e.g., "diduga" (believed; speculated) in a passive form from the root word "duga") | ||||||||||||||||||||||||
79 | Y | Meaning: "Yes, Duolingo MT teaches you the lemma" (e.g., "menduga" in an active form is not taught but the passive form is by Duolingo MT) | ||||||||||||||||||||||||
80 | R | Meaning: "Duolingo MT teaches its related lemma under the same word family" (e.g., "penduga" as a gerund or noun verb of "menduga" is not taught but easily guessed from the root word or related words) | ||||||||||||||||||||||||
81 | N | Meaning: "No, Duolingo does not teach you any of this word and its related words | ||||||||||||||||||||||||
82 | Blank | Meaning: Not checked yet but very likely to be "N" | ||||||||||||||||||||||||
83 | ||||||||||||||||||||||||||
84 | All words taught by Duolingo MT (YE) are listed on: | |||||||||||||||||||||||||
85 | https://forum.duome.eu/viewtopic.php?t=7012-indonesian-from-english-word-list-44-units | |||||||||||||||||||||||||
86 | ||||||||||||||||||||||||||
87 | Clozemaster | |||||||||||||||||||||||||
88 | This column consists of two elements: Number and Alphabetical character | |||||||||||||||||||||||||
89 | Number | Meaning: the name of the Most Common Words Collection (e.g., 1,000 means "the 1,000 Most Common Words Collection on Clozemaster) | ||||||||||||||||||||||||
90 | C | Meaning: "taught as a cloze-word that you need to fill in" in the text type-in mode | ||||||||||||||||||||||||
91 | S | Meaning: "not a cloze-word, but yes, you can learn from a sentence" especialy in the full-sentence transcribe mode | ||||||||||||||||||||||||
92 | Y | Same as Duolingo | ||||||||||||||||||||||||
93 | R | Same as Duolingo | ||||||||||||||||||||||||
94 | N | Same as Duolingo | ||||||||||||||||||||||||
95 | Blank | Same as Duolingo | ||||||||||||||||||||||||
96 | ||||||||||||||||||||||||||
97 | Suppose that a sentence "He {{chose}} a big apple" is in the 2,000 Most Common Words Collection where {{chose}} is the cloze-word deleted from the sentence and you need to type in. | |||||||||||||||||||||||||
98 | "chose" (past form) is marked "2,000C" | |||||||||||||||||||||||||
99 | "choose" (present base form), "chosen" (past participle) and "chooses" (3rd person pronoun) are marked "2,000Y" | |||||||||||||||||||||||||
100 | "he", "a", "big" and "apple" are marked "2,000S" |