Slavic Languages and the Cyrillic Alphabet for GeoGuessr
By Harry Bowman (GeoFessr)
ABSTRACT
Learning the Cyrillic alphabet is a great way to expand GeoGuessing capabilities, as it is not very difficult for users of the Roman alphabet and is used in many countries. I argue that the Cyrillic alphabet should be learned first from Serbo-Croatian, a fully bialphabetic language with a one-to-one correspondence between Roman and Cyrillic glyphs. Many signs even have kerning adjustments to the two versions so that the corresponding letters line up. In addition to the one-to-one correspondence, Serbian and Macedonian Cyrillic contains the contribution of the early nineteenth century linguist Vuk Karadžić to the Cyrillic alphabet, the “Cyrillic J”, which greatly simplifies spelling and eliminates the need for several Cyrillic letters used in other languages. After explaining the Serbo-Croatian forms, then I discuss the differences between Serbo-Croatian and other Slavic languages, and the use of Cyrillic in non-Slavic languages.
INTRODUCTION
Slavic languages are divided into three categories, West Slavic, East Slavic, and South Slavic. West Slavic, which contains Polish, Czech, and Slovak, uses the Roman alphabet. East Slavic, which contains Russian, Ukrainian, and Belarusian, uses the Cyrillic alphabet. South Slavic contains Slovene, which uses the Roman alphabet, Serbo-Croatian and Montenegrin, which are bialphabetic, and Macedonian and Bulgarian, which use the Cyrillic alphabet.
Roman and Cyrillic alphabets for Slavic languages can be further divided into styles, with Roman being divided into the Polish style, and the Czech style, which is used for Czech, Slovak, and South Slavic. Cyrillic can be divided into the conservative style, used for East Slavic and Bulgarian, and the Karadžić style, which uses the “Cyrillic J”, which greatly simplifies things.
Cyrillic was developed by the Orthodox missionaries Cyril and Methodius, two brothers of Greek ethnicity from Thessaloniki, in the ninth century AD. They were concerned with writing an early South Slavic language which is called Church Slavonic. Church Slavonic was spoken by Slavs near Thessaloniki in the present-day North Macedonia. Early Cyrillic scripts contained a complete copy of the Greek alphabet and so most of the letters are similar to Greek.
South Slavic languages had somewhat simpler phonetics because they had originated from barbarian invasions of the Roman and Byzantine Empires and had been modified by contact with a variety of languages. For this reason, they are a clear candidate to be explained first and then complications found in other Slavic languages can then be explained as modifications. In particular, in the west there was influence from Latin. Slovene and Serbo-Croatian were written in the Latin alphabet, in a form modified from Czech. In 1818, the linguist Vuk Karadžić developed a standardized form of the Serbo-Croatian language based on the dialect of Herzegovina (the area around Mostar) with a radical new form of Cyrillic, containing the “Cyrillic J”, borrowed from Latin. Latin forms corresponding to each of Karadžić’s letters were then developed by Ljudevit Gaj in 1835 for use in Croatia. In modern Serbo-Croatian usage, the Roman form is used in Croatia except in a few areas with Serbian populations such as Knin and Vukovar, in Bosnia-Hercegovina on the Bosnian-Croatian side, and sometimes in Serbia. The Cyrillic form is used in Serbia and in the Serbo-Croatian speaking area of Kosovo around Mitrovica.
Cyrillic script was developed for many non-Slavic languages by the Soviet Union, and there is a discussion of several of those at the end.
Links in the text are to audio files.
ROMAN SERBO-CROATIAN
In Serbo-Croatian every Cyrillic letter has one Roman equivalent. Here “lj” is one letter.
The Roman form of Serbo-Croatian uses the Roman alphabet excluding the letters Q, W, X, and Y, which are occasionally found in foreign words, and adding the letters Č, Ć, Dž, Đ, Lj, Nj, Š, and Ž, making a total of 30 letters. Note that Dž, Lj, and Nj are considered to be single letters in Serbo-Croatian. This is done so that Cyrillic has exactly one letter for each letter in the Latin form.
The pronunciation of the five vowels is similar to Italian. E is the DRESS vowel, I is the FLEECE vowel, O is the CLOTH vowel, and U is the GOOSE vowel. Note that the E and O are NOT the ones used in Spanish. The I and U are not precisely equivalent to the English vowels, because they are not diphthongized: they are the I and U in Spanish or I and OU in French. Note also that, unlike the English DRESS and CLOTH vowels, E and O do not have to be followed by consonants. These five vowels are basically the same in all Slavic languages.
A is the Germanic A found around the Great Lakes in the USA. Some sources such as the Wikipedia entry on English phonology, call this the PALM vowel. Most North Americans don’t pronounce “palm” that way. A diagnostic for if you use this vowel is that if you have the same vowel in SPA, FATHER, and LOT, but NOT in CLOTH, you have the “Germanic A”. If, like many English speakers, you don’t have this vowel, SPA or FATHER is probably the closest. The vowel is the same as the A in German “Vater”, or the A in Spanish and Italian.
All syllables are pronounced distinctly, without the strong reduction of unstressed syllables found in English. In addition, there is a tonal accent not indicated in writing. For precise pronunciations, there is sometimes audio in Wikipedia entries for places.
The letter J after a vowel and before a consonant or after a vowel at the end of a word produces diphthongs. These are “aj”, pronounced as the PRICE vowel, “ej”, pronounced as the FACE vowel, “oj”, pronounced as the CHOICE vowel, and “uj”, which is not found in English. It is somewhat like “phooey”, except without a syllable breaking. Note that “ij” is not one of the diphthongs. In all cases “ij” is followed by a vowel and the j goes to the next syllable as the sound of the English consonantal “Y”, and so the place Rijeka (HR), which means “river” in Serbo-Croatian, is pronounced “re-YECK-a”. Also note that Lj and Nj are separate letters and so their pronunciation is discussed below.
The consonants found in English are mostly pronounced as in English. The exceptions are C, H, J, and R.
C is pronounced like a sequence of T and S, but as a single sound without any syllable break between them in all cases. The sound exists marginally in English in some foreign words. For instance, it is the Z in “pizza” and “Nazi”, which are not pronounced in exactly the same way as “pete saw” and “not see”. Note that as well as in all Slavic languages using the Roman alphabet, the letter “c” is used in this way in the Pinyin romanization of Chinese, for instance in the word 从 cóng “from”.
H is the German “ch” sound in Bach “brook”, or Tochter “daughter”. To English speakers it sounds like an H but with more friction, which is produced between the center of the tongue constricted toward the roof of the mouth in the same place as the English K or G sounds. The sound is occasionally used in foreign words in English, like “Loch Ness” or “Hanukkah”. It is fairly natural for an English speaker to produce this sound when confronted with a word with an H in a position that is illegal in English, like the Turkish name “Mehmet”. This was actually done in English in the Middle Ages, but, recognizing the difference in pronunciation, it came to later be spelled as “gh”. Then, the “gh” sound disappeared or became an “f” sound in different words, resulting in the chaos of English spelling of words with “gh”. In romanizations of some languages not using the Roman alphabet, including East Slavic languages and Bulgarian, this sound is written with “kh”, as in the name Khrushchev.
J when at the beginning of a word or between two vowels is not in a diphthong. It is the consonant value of English Y, as in “yellow”. This follows German, which pronounces J this way in all cases, such as the word Jahr “year”.
R is the “rolled” R, found in Spanish at the beginning of words or spelled as “rr” between vowels. In some Serbo-Croatian words, the R can be found between two consonants making a syllable with no vowel in it, as in “Hrvatska” meaning Croatia, or “trg”, which means “plaza”. It is still rolled in these words, unlike in American English vowelless syllables like “church” or “bird”. Serbo-Croatian does not differentiate between the “tap” and “trill” R, as in the Spanish words “pero” and “perro”, and both are commonly found, with the difference in pronunciation being one of phonetic convenience.
Next we have the letters with a háček (pronounced HA-check), Č, Dž, Š, and Ž. These resemble the “ch” in “church”, the “j” in “joke”, the “sh” in “show”, and the “s” in “measure”. However, they are not the same. Letters with háčky in Serbo-Croatian are what linguists call “retroflex” consonants and are produced with the tip of the tongue curled back like it is in the American English R sound (which is NOT used in Serbo-Croatian). The difference is difficult to describe in words and I would recommend that anyone interested look for audio. The word “háček” is a Czech and Slovak word for this diacritic and is not used in Serbo-Croatian.
The retroflex Č contrasts with what linguists call the “postalveolar” consonant Ć, written with an acute accent diacritic instead of a háček. Both sounds sound similar to English “ch”, but are not the same as it. However, Ć and other postalveolars are pronounced with the center of the tongue tight against the alveolar ridge, which is the bump on the roof of the mouth behind the front teeth, and the tip pointing downward toward the lower front teeth.
The only major non-Slavic language I know of that has this distinction between retroflex and postalveolar is standard Chinese, which uses the retroflex in words with ch like 吃 chī “eat” and the postalveolar in words with q like 汽 qì “steam, Daoist mystical energy”.
The retroflex Dž contrasts with the postalveolar Đ. Chinese has the retroflex in words with zh like 竹子 zhúzi “bamboo” and the postalveolar in words with j like the place 北京 běijīng. Note that this is written with a j instead of a d in the Pinyin romanization of Chinese, because it can sound more like a J. Similarly Macedonian uses a different letter ǵ in the Roman alphabet for this sound as mentioned below.
Chinese has the retroflex Š in words with sh like 少 shǎo “few”. Note that this contrasts in Chinese with the postalveolar x sound, which is not found in Serbo-Croatian.
Chinese has the retroflex Ž in words with r like the place 日本 rìběn “Japan”. It does not contrast with a postalveolar in Chinese. We will again encounter the use of R to represent this sound in Polish below.
This leaves Lj and Nj, which are sounds found in Italian and some of the other Romance languages. To North American English speakers, the more familiar of the two is Nj, which is pronounced the same way as Spanish ñ or Italian or French gn. It resembles the sequence of n with the English consonant y. Lj is the sound of Italian gl. It is used in Spain as the “lleísmo” variant of the “ll” sound in northern Castile, Navarre, and Aragon and in areas of Peru, Bolivia, Paraguay, and Argentina where the same sound is used in Quechua, Aymara, and Guaraní. It resembles the sequence of l with the English consonant y.
However, there is a problem not found with these Romance sounds. Nj and lj, unlike their Romance counterparts, are not always found before a vowel. For instance, there is a place called Rovinj on the Croatian coast on the Istra peninsula. The lack of a vowel after Nj and Lj makes it difficult to produce the “y” offglide, and so they don’t do it. The consonants still sound different from n and l without the offglide into the vowel, and the reason for the difference is that Nj and Lj are postalveolars. That is, instead of being “alveolar” or “dental” like n and l, with the tip of the tongue making contact with the roof of the mouth, the tongue is tight up against the roof of the mouth behind the postalveolar ridge with the tip down. Again, I recommend that people particularly interested in the difference find audio.
THE CYRILLIC ALPHABET, AS USED IN SERBO-CROATIAN
Вук Стефановић Караџић
Vuk Stefanović Karadžić (1787-1864), standardizer of the Serbo-Croatian language
As mentioned before, the Cyrillic alphabet originally had a complete copy of the Greek alphabet in it. The unnecessary Greek letters have been eliminated in all modern languages using Cyrillic, but it is still useful to mention the Greek letters which the Cyrillic is derived from. As in the Greek and Roman alphabets, there are two forms of each letter, upper and lower case. In addition, there are variations of font in Cyrillic, and in the cases of a few letters it is useful to mention the forms used in what in Russian is called “kursivny” font, which more closely resembles Cyrillic cursive handwriting than the usual forms. Kursivny font is often used in the same places as italics in the Roman alphabet, and so it is common on some maps such as the classic Atlas SSSR produced by the Soviet Union.
Cyril’s original alphabet had quite a few vowel letters, but the rejection of unnecessary letters and the use of “Cyrillic J” reduces these to only five vowel letters, just as in the Roman alphabet. The vowel letters are:
Аа = roman letter A. This is from Greek alpha. The kursivny form is Аа.
Ее = roman letter E. This is from Greek epsilon.
Ии = roman letter I. This is from Greek eta. Note that the Greek iota is not used. The kursivny form is Ии. Note that this kursivny form is an I, not a U.
Оо = roman letter O. This is from Greek omicron. Note that the Greek omega is not used.
Уу = roman letter U. This is from Greek upsilon.
Vuk Karadžić introduced the “Cyrillic J” to represent both the diphthongs aj, ej, oj, and uj, and the consonant J in analogy with the Roman script. The diphthongs are then Ај, Еј, Ој, and Уј. The capital form of the Cyrillic J, like the roman Ј, has no dot.
The complete set of consonants is as follows, including J, in Roman alphabetical order:
Бб = Roman letter B. This is from Greek beta. Note that the Cyrillic equivalent of V is Вв. The kursivny form is Бб.
Цц = Roman letter C. This is from Hebrew tsade. Note that the Cyrillic equivalent of S is Сс.
Чч = Roman letter Č.
Ћћ = Roman letter Ć. Note that the Cyrillic resembles the equivalent of Đ.
Дд = Roman letter D. This is from Greek delta. The kursivny form is Дд.
Џџ = Roman letter Dž. Note that the Cyrillic resembles the equivalent of C.
Ђђ = Roman letter Đ. Note that the Cyrillic resembles the equivalent of Ć.
Фф = Roman letter F. This is from Greek phi.
Гг = Roman letter G. This is from Greek gamma. The kursivny form is Гг.
Хх = Roman letter H. This is from Greek chi.
Јј = Roman letter J. This is from Roman J.
Кк = Roman letter K. This is from Greek kappa.
Лл = Roman letter L. This is from Greek lambda. Sometimes a pointy Λ is seen in Cyrillic fonts, which is the same as capital Greek lambda.
Љљ = Roman letter Lj. This was invented by Vuk Karadžić as the fusion of Ль, eliminating the need for ь in Serbo-Croatian.
Мм = Roman letter M. This is from Greek mu. Note that the lower case does not look like m: if you see that, it is the kursivny font lowercase T.
Нн = Roman letter N. This is from Greek nu, although the capital looks like a Greek eta.
Њњ = Roman letter Nj. This was invented by Vuk Karadžić as the fusion of Нь, eliminating the need for ь in Serbo-Croatian.
Пп = Roman letter P. This is from Greek pi.
Рр = Roman letter R. This is from Greek rho. Note that it is identical to Roman P.
Сс = Roman letter S. This is from Greek sigma, in the form used at the end of a word. Note that it is identical to Roman C.
Шш = Roman letter Š. This is from Hebrew shin.
Тт = Roman letter T. This is from Greek tau. Note that the lower case form is not the same as for Roman T. The kursivny form is Тт. Note the cursivny lower case is equivalent to t and NOT to m.
Вв = Roman letter V. This, like Бб, is also from Greek beta. The kursivny form is Вв.
Зз = Roman letter Z. This is from Greek zeta. Note that it resembles a Roman cursive Z.
Жж = Roman letter Ž.
SOUTH SLAVIC IN THE KARADŽIĆ STYLE
Proclamation of the Presidium of the Anti-Fascist Assembly for the National Liberation of Macedonia establishing the Macedonian alphabet. They used a Bulgarian typewriter and drew in some letters not found in Bulgarian by hand. Смрт на фашизмот, слобода на народот!
The other South Slavic languages are Montenegrin, Slovene, Macedonian, and Bulgarian.
Montenegrin is so close to Serbo-Croatian that before the 2006 independence of Montenegro, it was generally considered to be a dialect of Serbo-Croatian. It differs alphabetically from Serbian by having two uncommon additional Cyrillic letters, С́ and З́, which are not in standard Unicode and have to be made with a combining acute diacritic which doesn’t quite look right on most computers. Following Serbo-Croatian, there are standard Roman forms for these letters, which are Ś and Ź. The Roman forms are found in Polish and are thus standard Unicode characters. These letters are the postalveloar partners of the Serbo-Croatian unpaired retroflexes Š and Ž.
Slovene is less close to Serbo-Croatian than Montenegrin is. The main way in which it differs from Serbo-Croatian is that the retroflex/postalveolar pairs do not exist as separate sounds in Slovene. Instead of Č/Ć, Slovene has Č, which is pronounced identically with English “ch”. Dž/Đ is replaced with a sequence of two letters, D and ž, pronounced identically with English “j”, and Š and Ž are pronounced identically with the sh in English “show” and the s in English “measure”. In addition, nj and lj before a consonant in Slovene are not postalveolars: instead they are pronounced the same way as n and l. As a result of these differences with Serbo-Croatian, every consonant sound in Slovene is found in the neighboring Italian language except for H. Slovene vowels are also different from Serbo-Croatian but this is not indicated in spelling.
In the eastern part of the South Slavic language area, the non-Slavic influences on the language were different. Latin was not used in that part of the Roman and Byzantine Empires, and so instead the principal language encountered by the barbarian invaders was Greek. In addition, the barbarian invaders there were different: they contained a group of people known as the Bulgars. The Bulgars were not Slavs, and instead spoke a Turkic language. The name Bulgar refers to the Volga, and the closest surviving relative of the Bulgar language is Chuvash, spoken in the Chuvash Republic of Russia. The Chuvash Republic’s capital is the city of Cheboksáry on the Volga several hundred kilometers east of Moscow.
The mixture of languages demolished the entire case structure of Slavic languages except for the pronouns, as happened in medieval English due to mixture with French. In the Macedonian “Смрт на фашизмот, слобода на народот!” above, the ending -ot “the” appears on the nouns фашизмот/fašizmot and народот/narodot. There is no equivalent of “the” in other Slavic languages. The Serbo-Croatian and Slovene for “Death to fascism, long live the people!” is instead “Smrt fašizmu, sloboda narodu!” which uses the ending -u, indicating that the nouns are in the dative case. The dative case means that these nouns are an indirect object, eliminating the need for the preposition на/na found in the Macedonian.
In modern times the language split between a western form, Macedonian, which adopted the Karadžić style of Cyrillic with the Cyrillic J, and an eastern form, Bulgarian, which retained the conservative Cyrillic style of East Slavic languages.
So, first off, Macedonian. The vowels are exactly the same as Serbo-Croatian, and the “Cyrillic J” is used with the same sounds. Like in Slovene, the Slavic retroflexes disappear, and the retroflex letter represents the consonants used in English like in Slovene. So, they have Ч/Č representing the English “ch” sound, Џ/Dž representing the English “j” sound, Ш/Š representing the English “sh” sound, and Ж/Ž representing the “s” in English “measure”. However, two postalveolar consonants exist in Macedonian, which are Ѓѓ, transcribed in the Roman alphabet as ǵ, and Ќќ, transliterated in the Roman alphabet as ḱ. Ѓ is the same sound as Ђ/Đ in Serbo-Croatian and Ќ is the same sound as Ћ/Ć. One more additional consonant, Ѕѕ, exists in Macedonian, which looks exactly like the Roman letter S. However, its Roman equivalent is Dz (NOT Dž), a sound similar to Ц/C. The Ц/C sound of Serbo-Croatian is like T plus S, with no syllable break between them, and similarly, Ѕ/Dz is like D plus Z, but with no syllable break.
Which brings us to Bulgarian, which requires a discussion of conservative Cyrillic.
BULGARIAN: SOUTH SLAVIC WITH A CONSERVATIVE CYRILLIC ALPHABET
Bulgarian highway sign. Д and Л are often pointier than in other countries, with Л looking identical to a Greek lambda (Λ).
And now we come to explaining Bulgarian. Bulgarian uses the conservative Cyrillic style like East Slavic languages. Its character set is identical to Russian except that it lacks the Russian letter Ё, which in Russian sometimes has the dots left off of it, making it indistinguishable from Е, and the Russian letter ы. However, the usage of some of the letters is different from Russian.
The conservative Cyrillic style lacks the Cyrillic J invented by Vuk Karadžić. However, the sound made by the Cyrillic J is found in all Slavic languages, and how it is represented instead in conservative Cyrillic is complex.
Above I mentioned that Karadžić, in inventing the symbols for Љ/Lj and Њ/Nj, fused a letter ь to the letters Л and Н. This is one of the two Cyrillic letters known as “yers”.
There are two yers, the “soft yer” ь and the “hard yer” ъ. In the Church Slavonic language, these represented short vowels represented in romanizations of Church Slavonic as ĭ and ă. Bulgarian, unlike all the other South Slavic languages, has the vowel ă in addition to the five Serbo-Croatian vowels and writes it just as in Church Slavonic with the letter Ъъ. Note that the Russian and Belarusian letter ы, called “yeri”, is different from the yers.
The hard yer is uncommonly used in Russian in words where they were not liquidated by the Bolsheviks as class enemies, and has been entirely exterminated in Belarusian and Ukrainian, leaving behind an apostrophe after a few letters as a memorial. The remnant of the hard yer in Russian is called the “hard sign”. However, in Russian it is debatably not considered a “vowel”, it is never found at the beginning of a word, and always comes after a consonant and before a vowel. As for how Russian uses it, that will be explained in the East Slavic section below. So, if a Slavic language uses a lot of this letter, it is Bulgarian and NOT Russian.
Bulgarian Ъ is pronounced somewhat like “schwa”, the dreaded black hole of English spelling, into which all of the medieval English short vowels have disappeared leaving no trace of which original vowel it was except in the speech of people in Scotland and Ireland that keep the old pronunciations. Schwa is the a in “about”, e in “taken”, i in “pencil”, o in “memory”, u in “supply”, y in “Sibyl”, and nothing at all in “rhythm”. However, the English schwa is only found in unstressed syllables, while ă is as fully pronounced as the other vowels. It is the sound found in Chinese as “e” in words like the place 黄河 huánghé “Yellow River” and is similar but not identical to e in French je or eu in French bleu.
It would make sense to write Bulgarian Ъ as ă, for instance writing the place Слънчев бряг, known in English as “Sunny Beach” as Slănchev Bryag, but Google Maps doesn’t do this despite using the exact same letter next door in Romania. Instead it doesn’t differentiate between the two. Another convention that is sometimes used is to write Ъ as “U” and У as “OU”, which would result in Bulgaria having towns called “Bourgas”, “Rouse”, and “Veliko Turnovo”.
Bulgarian, unlike other South Slavic languages, has a strong distinction between stressed and unstressed syllables as in East Slavic languages or in English. Some Bulgarians, especially toward the Black Sea coast, pronounce A and Ă the same way in unstressed syllables. They also merge unstressed vowels U and O into an O pronounced like the the O used in Spanish, which resembles the GOAT vowel of American English without being diphthongized, and not like the CLOTH vowel found in stressed Bulgarian syllables and in Serbo-Croatian. For those interested in which syllable is stressed in place names, the stress is sometimes noted in WIkipedia, especially in the Bulgarian versions of the pages, with an acute accent on the Cyrillic. For changing the language on Wikipedia, the Bulgarian language is called Български Bălgarski in the language list.
OK, so what about the soft yer? In Bulgarian, the soft yer is found only in the combination ьо, which is also found in Ukrainian but NOT in Russian or Belarusian. The pronunciation here, it turns out, is identical to the Cyrillic J except they didn’t listen to Vuk Karadžić and write it with the Cyrillic J. Furthermore, Bulgarian doesn’t have an official romanization convention like Serbo-Croatian and other South Slavic languages and in romanization, they romanize ьо as “yo” and not as “jo”.
If you see soft yers in anyplace other than ьо, the language is East Slavic. Oh, and in one other place: the original Church Slavonic sense of it being a short vowel was exiled to Mongolia. For instance, Мандалговь/Mandalgovĭ and other places with -говь “desert” use it. Mongolian, of course, is not Slavic.
Now what happens if we need the Cyrillic J sound in someplace other than in ьо? Well, remember that Cyril incorporated all the Greek alphabet into Cyrillic. He ended up not needing the iota, since he wrote the I sound with an eta, as И. Before a vowel, the iota was attached to the vowel, making a second series of Cyrillic vowels. Yes, that’s a bit cumbersome. However for Bulgarian they use it before A and U, and so there are Cyrillic letters Яя “ya” and Юю “yu”. Yes, the yu stuck the iota to an omicron and not an upsilon, and then yo had to be written ьо. Nobody said this makes sense. Furthermore, some words start with yo. There’s no capital form of ьо.
Which brings us to the THIRD way that the sound represented by “Cyrillic J” is represented in conservative Cyrillic: the “short” И, written as Й. So, New York is called Ню Йорк/Nyu York in Bulgarian. However, mostly the й is found after vowels to write the diphthongs, as Ай/ay, Ей/ey, Ой/oy, and Уй/uy.
So, in summary, “Cyrillic J” is written before the vowels a, o, and u as Я,Йо (but lowercase ьо), and Ю, and after a, e, o and u as Ай, Ей, Ой, and Уй.
Now it remains to explain a few consonant peculiarities of Bulgarian. As in Macedonian, there is no distinction between retroflexes and postalveolars and the retroflex letters Ч, Ш, and Ж are used for the sounds of English ch, sh, and the S in “measure”. They don’t need Serbo-Croatian Nj and Lj, because they only exist in Ня,Ньо, and Ню (Nya, Nyo, Nyu) and Ля,Льо, and Лю (Lya, Lyo, Lyu) and not at the ends of words like they do in Serbo-Croatian. Romanization is done differently than in Serbo-Croatian. Ч is Ch and not Č, Х is Kh and not H, Ш is Sh and not Š, Ц is Ts and not C, and Ж is Zh and not Ž. There is an additional letter Щщ, not found in Serbo-Croatian, romanized as “Sht” and pronounced like Sh + T.
And finally, Bulgarian has a phenomenon known as “final obstruent devoicing”, which is found in Polish, Belarussian and Russian and in some non-Slavic languages such as German and Korean. In order to regularize certain consonant shifts that take place in the grammar when suffixes are added, some final consonants are “devoiced”. Devoicing means the consonant is pronounced without any vibration of the vocal cords. It affects the voiced stops B, D, and G, which become unvoiced P, T, and K at the end of a word and the voiced fricatives V, Z, and Zh which become unvoiced F, S, and Sh.
Finally, Bulgarian uses more “kursivny” lowercase forms even in non-kursivny fonts, and uses the “pointy” Greek-style Л, that is, it uses Λ. Bulgarian Д is also more pointy than the Russian version. This graphic has Russian-style and Bulgarian typefaces for the Bulgarian alphabet, with Russian above and Bulgarian below, and in each, the default typeface above and kursivny below. Especially note the distinctive forms of lowercase Д/D, Ж/Zh, and Ю/Yu.
So, now we can move on to the West Slavic languages, Polish, Czech, and Slovak. These are all written in the Roman alphabet.
CZECH AND SLOVAK
Czech highway sign in Czech Silesia. Český Těšín is Czech, first because the letter Ě is specific to Czech, and second because it’s “Český”. The town next to it in Poland with the “same” name is Cieszyn. This could also be worked out using PL and SK. It isn’t the Ukrainian border of those two countries because the sign isn’t Cyrillic and that’s nowhere near Katowice.
So, before I referred to the “Czech style” and the “Polish style” for Roman Slavic alphabets. The Czech style, used for all Slavic languages in the Roman alphabet except Polish, traces back to the religious rebel Jan Hus (1372-1415) who invented the háček and the marking of long vowels in his book De Orthographia Bohemica. The Czech style is marked by a principle that one symbol should represent one sound, and abhors the use of symbols with multiple letters. The Czech style in all languages uses háčky. Polish entirely lacks háčky, replacing many of them with multicharacter symbols including the letter Z, and this results in Polish having a whole lot of Z’s in it.
So, in Czech and Slovak, they have the same five vowels as Serbo-Croatian, A, E, I, O, and U, and their long versions Á, É, Í, Ó, and Ú. Ó is an uncommon letter found mostly in foreign words in both languages, and this is true of É in Slovak except that it is very common as a neuter nominative singular adjective ending, such as -ské in placenames. In Czech short and long I differ in pronunciation as well as duration, with the short I being the KIT vowel and the long I being the FLEECE vowel.
However, there is s sixth vowel letter, Y, which can also be marked as long. It isn’t a different vowel, though: the difference between I and Y tells you if the previous consonant is “hard” or “soft”.
We saw these terms earlier in reference to the hard and soft yers of Church Slavonic. Linguistics analyzes this West and East Slavic consonant distinction as being produced largely by the dropping of yers from previous stages of Slavic languages leaving traces in the consonants. In West Slavic, soft consonants are defined as consonants that appear before I and not before Y. Consonants that can appear before either I or Y are neutral.
In English and Romance languages, there is a similar rule for the pronunciation of the consonants C and G before vowels. Some readers may also be familiar with the similar distinction found in Irish, in which hard consonants are called “broad” and soft consonants are called “slender”. For instance, in the names Seán and Siobhán , the I and E indicate that the S is soft and so the names are pronounced “SHAWN” and “Sho-WAN”.
In the South Slavic section, I mentioned that Serbo-Croatian has a distinction between retroflex and postalveolar consonants. The postalveolars are soft. The postalveolar consonants are written in Slovak with Ď (with lowercase ď) instead of Đ, Ľ instead of Lj, Ň instead of Nj, and Ť (with lowercase ť) instead of Ć. Unless they come before the “I” vowel. However, Đ, Ň, and Ť must be written as the versions without the diacritic, D, N, and T, if they come before I, E, or the combinations IA and IE which have the I pronounced like the English consonant Y. So, Di, De, Dia, and Die are soft and Dy is hard. C, J, and the retroflexes Č, Dž, Š, and Ž are also soft, while the velars K, G, Ch, H, and the letter R are hard as well as D, N, and T. Note that every consonant with a diacritic on it is soft. The neutral consonants are the labials B, P, V, F, and M, and the letters L, S, and Z. In the grammatical rules of both languages, a consonant can change from hard to soft with change of a suffix and in that case, S and Z soften to Š and Ž but Ch softens to S and H and G both soften to Z. An example given in Wikipedia is the addition of the case ending e: Prague is “Praha” but “in Prague” is “v Praze”.
The same setup is used in Czech, except Czech uses Ě instead of IE as the soft version, the letter Ř, not used in Slovak, is soft, because it has a diacritic, and “mě” is pronounced as M plus the soft postalveolar Ň plus E. However, some words in Czech can still contain IE, where it is pronounced like a sequence I plus English consonant Y plus E.
Now that we’ve gotten hard and soft out of the way, there are a few more things. In the area of vowels, Czech and Slovak have the diphthong OU, which is pronounced like the American English GOAT vowel, unlike the unusual Ó, mostly found in foreign words, which is a long version of the Czech O, which is like the American CLOTH vowel. OU is found often as the instrumental case ending after “nad” in place names, meaning “on”, as in Čierna nad Tisou on the Tisza and Jablonec nad Nisou on the Neisse. Note however that Ústí nad Labem on the Elbe has a different ending.
Slovak additionally has Ô, which is pronounced like the sequence W plus O, giving the sound in the English word “walk”. Similarly, Slovene, which avoids É as well as Ó, has an additional long E-like diphthong Ä, which is a sequence of short E with a schwa.
Czech, but NOT Slovak, replaces most instances of Ú with Ů. They are pronounced identically.
Both Czech and Slovak have the diphthongs AU and EU in a few foreign words like “auto” and “euro”. These are pronounced like a sequence of the first vowel plus English W.
Czech, like Slovene, is non-retroflex and pronounces Č, Dž, Š, and Ž with the sounds of English Ch, J, Sh, and the S in “measure”. Dž and dz, unlike in Slovak, are unusual in Czech and used mostly to write foreign words. Additionally F and G are used in Czech mostly for foreign words but there are so many of them, particularly from German, that this is not very diagnostic.
Both languages, unlike Serbo-Croatian, contrast Ch with H. Ch is the sound made by H in Serbo-Croatian, following German spelling conventions. H is the voiced form of the same sound, which is found in Spanish as the G in the word “agua”.
And then there is the Ř, a sound that is notorious for only being producible by Czechs. It is something like a simultaneous trilled R and Z. Most people that have heard of this have seen it in the context of how to pronounce the name of the composer Antonín Dvořák. August Dvorak, the inventor of the Dvorak keyboard, was American and so that one is pronounced “de-VOR-ack”.
And Slovak has the long consonants Ĺ and Ŕ. They are simply held longer than L and R, like the double letters of Italian or Japanese. Ĺ is easy to confuse with Ľ.
There is one remaining matter that is the same in Czech and Slovak. Previously I mentioned the final obstruent devoicing of Bulgarian. Czech and Slovak instead have voicing harmony, in which a string of consecutive consonants which can be either voiced or unvoiced takes the voicing of the last consonant. Listing the unvoiced partner first, there are P and B, T and D (both hard and soft), K and G, C and Dz (both hard and soft), F and V, S and Z (both hard and soft), and Ch and H. So, as an example, the Czech word fotbal is pronounced “FOD-bal”, with a D sound.
POLISH
“W Szczebrzeszynie chrząszcz brzmi w trzcinie” means “In Szczebrzeszyn a beetle chirps in the reeds”. It is claimed to be the most difficult tongue-twister in the world that does not contain a click. The name of the village, near Zamość in the Lubelskie region, contains thirteen letters and three vowels. So, how do you pronounce that? Exactly the way it’s spelled.
Polish differs from the Czech style in one completely obvious way: Polish does not use háčky. Instead it uses a lot of symbols for sounds with more than one letter, or it relies on the hard and soft distinction. As a result Polish has many fewer diacritics than the Czech style and has a whole lot of Z’s. It also differs from Czech and Slovak in that vowel length does not exist in Polish. However, Polish has the acute diacritic on the vowel Ó, mostly in the ending -ów, which looks like a long vowel. As we saw above, O is rarely long in Czech and Slovak.
Five Polish vowels, A, E, I, O, and U, are pronounced as in Serbo-Croatian. Unlike in Czech and Slovak, I and Y are distinguishable vowel sounds in addition to their hardening and softening effects on the preceding consonant. Polish Y is a “high central vowel”, a sound intermediate between the “high front” KIT vowel and the “high back” FOOT vowel. It sounds somewhat like either of those or schwa, the “mid central vowel”.
Polish, unlike any other Slavic language, has two nasal vowels, Ą and Ę, written with an ogonek. Ą tends toward O like the nasal vowel of French “bon”, and Ę is a nasalized version of E, which is somewhat like the French “in” vowel in “mince” in people without the Parisian merger between “in” and “un”, like Canadians. Ą denasalizes to “om” before B and P, “on” before C, D, and T, and “ong” before K and G, and Ę denasalizes “em”, “en”, and “eng”, with the K and G kept to produce the sounds of “plonk” or “Englebert”. So, the place Elbląg in Warmińsko-Mazurskie is pronounced “EL-blonk” and the Świętokrzyskie region, whose name means “Holy Cross”, is pronounced “shvyen-to-KZHI-skye”. Note that the nasal vowel is kept before other consonants, such as in the places Częstochowa (ches-to-KHO-va) and Wodzisław Śląski (vo-DZI-swaf SHLAW-ski) in Upper Silesia. If a nasal vowel is word-final, it is diphthongized, so są “there are” is pronounced as São Paulo or French “sont” said by a Canadian and final Ę Is more like a nasalised “hey” or the nasal Ê in French “même” said by a Canadian.
Polish, like Serbo-Croatian, has retroflexes, but unlike Serbo-Croatian, there are no háčky to spell them with. Instead they are written with symbols including Z. Serbo-Croatian Č, Dž, and Š are spelled Cz, Dż (with a dot, not a háček), and Sz. Since Ž is already a Z and so it would be confusing to add a Z to it, it is spelled either Ż or rz. The versions with the “rz” in them are usually in words that are pronounced with an R in other Slavic languages, and with Ř in Czech, like rzeka “river”, which is rijeka in Serbo-Croatian and řeka in Czech. UNLIKE in Czech or Slovak, the retroflexes are HARD, and so if you see an I after them instead of a Y, the word is NOT Polish. They then have soft postalveolar equivalents Ć, Dź, Ś, and Ź with the acute diacritic. However, these letters are only used if the next letter is another consonant. Before a vowel, they are Ci, Dzi, Si, and Zi, and if the next vowel is an I, it doesn’t need another one because the I already made the consonant soft. It then follows that the versions with no diacritics, C, Dz, S, and Z are hard unless softened by an I. These hard versions with NO diacritic are also pronounced the same way as in Serbo-Croatian. It is possible in Polish for C, Dz, S, and Z to be followed by J, and in this case the C, S, or Z is HARD, followed by the J sound.
L and N are hard and contrast with soft postalveolars identical to Serbo-Croatian Lj and Nj. L can only be softened by an I, while soft N is written Ń before another consonant. Ń plus s after a vowel can shift the N to J (like English Y) and nasalize the vowel, as in the Pomeranian capital city Gdańsk.
H and Ch are not differentiated as they are in Czech. Both are unvoiced. The velars G, K, Ch, and H can be either hard or soft.
Polish is the only Slavic language with the letter W. It is used to write the “V” sound of English, which is “V” in other Slavic languages. The W sound of English, however, DOES exist in Polish and is represented by the letter Łł, which is only used in Polish, or in a few foreign words like “auto” and “euro” by U. So, Wrocław, the capital of Lower Silesia, is pronounced VROTS-waf and the city Łódź, which means “boat”, is pronounced WOOCH (except soft).
Polish has BOTH voicing assimilation as in Czech and Slovak AND final obstruent devoicing as in Bulgarian. So, futbol is “FOOD-ball” and Kijów “Kyiv” is “KI-yoof”.
Polish is notorious for odd consonant clusters. Wikipedia gives bezwzględny “absolute”, źdźbło “blade of grass”, wstrząs “stress”, and krnąbrność “disobedience”. Trz and drz are sequences of t or d plus rz and are not the same as cz and dż.
So, there’s a lot of processes here. Let’s look at our tongue-twister, “W Szczebrzeszynie chrząszcz brzmi w trzcinie”.
W here is something we haven’t discussed: Polish has words W “in” and Z “with” that contract with the next word. “In” takes a noun in the locative case, which here adds the “-ie” to Szczebrzeszyn.
So, in our first word, W attaches to Szcz. Szcz is the doubling “shtch” of the retroflex cz consonant and is hard, and because it has unvoiced “cz” is unvoiced it devoices the W to F. So, F-sz-cz-e-b is the first syllable. Then rze is retroflex, with rz being a retroflex Z. Remember I pointed out that this retroflex Z appears in Chinese and there is spelled R. Rz is unrolled and resembles English R, except that it is raspy. -rz-e- is the second syllable. The third syllable is the second from the end, and this is the one that is usually stressed in Polish. So SZY, with retroflex sz requiring the vowel Y and not I. Finally, -n-ie has the I and so the N is soft, like Serbo-Croatian Nj and Italian gn. So, fshcheb-zhe-shy-nye.
Then Ch is the “velar fricative” like German ch, rz is retroflex, and ą is a nasal. Since the double retroflex szcz starts with S and not C, it does NOT denasalize to “on”. So, this nine-letter word is the monosyllable “KHZHASHCH”. Except with retroflexes.
Then b-rz-m-i is straightforward. Finally “w trzcynie” has devoicing of w because of t in trz, followed by HARD c because of the Y, and SOFT n because of the I.
And so:
W Szczebrzeszynie chrząszcz brzmi w trzcinie
I Szczebrzeszyn z tego słynie.
Wół go pyta: „Panie chrząszczu,
Po cóż pan tak brzęczy w gąszczu?
Jan Brzechwa (1898-1966)
EAST SLAVIC
This sign, at the border between Russia and Georgia, warns people not to cross the border. The second language on the sign is the Iranian language Ossetian, used in the republics of North Ossetia in Russia and South Ossetia in Georgia. After the appearance of this sign in 2013 villagers in Georgia alleged that at the time the sign appeared, the Russian border guards put barbed wire around about 25 hectares of Georgian land, barring it from access by its owners. Ossetian is believed to be the last survival of a whole group of Iranian languages of Europe, whose speakers in the ancient past were known as “Scythians”, “Sarmatians”, or “Alans”.
The East Slavic languages, Russian, Ukrainian, and Belarusian, have conservative Cyrillic alphabets similar to Bulgarian. As mentioned before, all of the letters in Bulgarian Cyrillic are found in Russian.
So, we remember the five vowels of Serbo-Croatian: A, E, I, O, and U. These five vowels take the “default” Cyrillic vowel letters, А, Е, И, О, and У. But unlike in Bulgarian, there is a sixth vowel Y, and a hard/soft distinction, like in West Slavic. East Slavic languages deal with these issues with a double set of vowel letters, hard and soft.
Ukrainian differs from the other two languages in that the default vowel letters are all hard. So, in romanization of Ukrainian, the default letter И follows West Slavic and is transliterated as Y instead of I as in all other romanizations of Cyrillic. Since И is not available for the soft version, І is used instead. Ukrainian has a distinction of pronunciation between these two letters similar to the one in Czech, in which Y has the KIT vowel and I has the FLEECE vowel. The soft versions of the vowels, following Polish, are Romanized as IA, IE, I, IO, and IU, and these are written as Я, Є, І, ьо, and Ю.
As in Bulgarian, since there is no “Cyrillic J”, the sound before a vowel at the beginning of a word is written as a soft vowel, with Йо used for “yo” because there is no capital version of ь.
Sometimes the “Cyrillic J” sound is written with the soft yer ь. This takes place if there is a consonant with a soft-hard distinction followed by the sound followed by a vowel. For instance, near Donéts’k there is a small town called Нью-Йорк Nyu-York.
The remaining instances of “Cyrillic J” then are written with Й romanized as Y, except that Ukrainian can have the consonant before I or Y. This is written with Cyrillic Ї or Йи, which are NOT found in Russian or Belarusian. Unlike in East or South Slavic, there are diphthongized long versions of the I sound, which are the hard ий and the soft ій. These two are found very commonly as the masculine nominative singular adjective endings, as in the place Кривий Ріг Kryvýi Rih, which means “curved horn”. Note that while ій is romanized as iy, ий is romanized as yi and not as yy.
All East Slavic languages have a strong distinction between stressed and unstressed syllables, and the stress is unpredictable and not usually indicated in writing. It is often indicated in Wikipedia articles for places. The Ukrainian article is listed in the language bar as Українська Ukrayíns’ka.
In Ukrainian unstressed syllables, there are some vowel reductions. Unstressed E and I tend to both be like the FACE vowel when not stressed and unstressed O tends toward the American version of the GOAT vowel, except not diphthongized, and so they are the same as Spanish E and O when unstressed.
Ukrainian is non-retroflex and so the consonants Ч, Ш, and Ж, romanized as Ch, Sh, and Zh, are the English sounds, with Zh being the S in “measure”. Hard versus soft distinction is then confined to the consonants T, D, Ts, Dz, N, S, Z, L, and R. These are Cyrillic Т, Д, Ц, Дз, Н, С, З, Л, and Р. In some words a soft version of a consonant is needed with a consonant after it, and so a hard or soft vowel cannot be used to differentiate between that and a hard consonant. In this situation, a silent soft yer, called a “soft sign” is used and is romanized with an apostrophe. One very common instance of this in placenames is the ending -s’k, which is written as -ськ.
P, B, M, F, and V before a soft vowel may have an apostrophe after it, as in the place Кам'янець-Подільський Kamyanéts’-Podíl’s’kyi in Khmel’nyts’kyi province near Moldova. This means that there is a syllable break and a full “y” sound between the consonant and the vowel. Note that here a Y is used in the romanization and not an I, and the use of an apostrophe in romanization is deprecated.
Ukrainian Г is the Czech and Slovak H sound and is romanized as H, unlike in Russian and Belarusian. The English G sound, as in “goal”, is mostly found in Ukrainian in foreign words and since the end of the Soviet Union, it is legal again to write it as Ґ.
Ukrainian and Russian Щ, romanized as shch, is a double of Ш sh. It is pronounced as a sequence of English Sh plus Ch. Doubling consonants in Ukrainian and Belarusian makes them held longer, as in Italian or romanizations of Japanese. The other letters that can be doubled are Т, Д, Ц, Н, З, С, Л, Ч, and Ж. This is not as diagnostic as one might hope because Russian has double Т, Д, Н, С, and Л which does not affect pronunciation. The Ukrainian double letters are often found at the end of neuter nouns before the ending, with the Ukrainian nominative singular in -ya, as in the places Запоріжжя Zaporízhzhia, Заболоття Zabolóttia in Volhynia province, and Добропілля Dobropíllia in Donets’k province.
Ukrainian does NOT have final obstruent devoicing like Russian and Belarusian. So, Kyiv is pronounced Ki-YEEV, sometimes shortened to KEEV. It does NOT end with an F sound. It has assimilation of voicing in consonant clusters only if the final consonant is voiced. So, футбол futból is pronounced food-BAWL.
Next we come to Russian and Belarusian. In these languages, unlike in Ukrainian, the default vowels I and E are soft.
In Belarusian and Russian the hard Y is written with the “yery” ы. Note that ы is a single letter and does not contain a soft yer ь. Yery is never at the beginning of a word except in some foreign words such as the place Ытык-Кюёль Ytýk-Kyuyól’ (Yakut Ytyk-Küöl) in the Republic of Sakha. I and Y are pronounced differently and have the sounds they have in Polish.
The hard E in Russian and in Belarusian is written with Э, which is different from the Ukrainian Є in being hard and not soft. In Russian this letter is only found at the beginning of a word, except in some foreign words such as the place Улан-Удэ Ulán-Udé, capital of the Republic of Buryatia. In romanization of Russian soft E has the “Y” left off of it, but this is deprecated when pronounced as the “Cyrillic J” is because it can lead to ambiguity, as in the place Екатеринбург Yekaterinbúrg and the place Элиста Elistá, capital of the Republic of Kalmykia.
In Russian and Belarusian, the soft A, O, and U are written with Я, Ё, and Ю. In Russian sometimes the dots are left off the Ё and it is indistinguishable from Е, especially in older documents. Unlike in Ukrainian, these are romanized as ya, yo, and yu. Sometimes Ё is romanized as E, as in the name Горбачёв Gorbachyóv. In Russian, the hard O, but NOT the soft O, is pronounced like the American English GOAT vowel except not diphthongized, like the O in Spanish.
Both Russian and Belarusian have vowel reductions in unstressed syllables. Unstressed syllables contain only hard A, U, and Y, written with А, У, and Ы, and but in Russian the hard A may be written with O as well in an unstressed syllable. In Russian but not in Belarusian the soft vowels reduce to only two vowels A and I, with soft A written with Я and soft I written with Е, И, or Ю. The soft O is therefore always stressed and written with Ё (which can have the dots left off in older documents) and in Belarusian, O is always stressed. This means there can be only one Ё per word in both Russian and Belarusian and only one O per word in Belarusian.
In Russian, but NOT in Belarusian and Ukrainian, the “hard yer” before a vowel indicates that the vowel has the “Cyrillic J” sound before it, as in the place имени XXII Партсъезда Ímyeni Dvadtsát’vtoróvo Parts’’yézda, meaning “in the name of the 22nd Party Conference”, but indicates that the preceding consonant is HARD It is transliterated, clumsily, with TWO apostrophes. Please, for the love of your computer’s text processing software, do not use a quote. In this situation Belarusian, like Ukrainian, uses an apostrophe. So, hard yers are found only in Russian, and in Bulgarian as the original Church Slavonic vowel.
Belarusian contains the W sound, as in Polish. It only appears after vowels except in foreign words and is written Ў, romanized as U.
Both Russian and Belarusian contain retroflexes. Belarusian contains all the Serbo-Croatian retroflexes, as Polish does, with all of them being hard and written Ч, Дж, Ш (doubled with Шч as in Polish szcz), and Ж. These are romanized as Ch, Dzh, Sh (and shch), and Zh. In Russian, however, Ч and Щ are postalveolar and therefore always soft, like Polish Ć and Ść. In addition, Russian Ц, Г, К, and Х and Belarusian Т, Д, and Р are always hard. Letters which are always soft or always hard, that is, Ч, Ш, Щ, Ж, Russian Ц, Г, К, and Х, and Belarusian Т, Д, and Р, are always followed by the default vowels А, Е, И, О, and У, except that Ё is used instead of O after Ч and Щ in Russian. However, some foreign words, especially from French, German, and Turkic languages, violate this rule such as Парашют parashút. A few Russian place names have an anomalous Ukrainian-looking soft sign after a letter which is supposed to always be hard such as Россошь Róssosh in Voronezh province.
Recordings of all the hard and soft consonants of Russian can be found here.
Belarusian uses the pronunciation of Г from Ukrainian and like in Ukrainian, this is romanized as H.
Russian and Belarusian have voicing assimilation and final obstruent devoicing, as in Polish. This is responsible for the pronunciation of words ending in V in romanization with the F sound.
And finally, Russian is used extensively in Belarus and Ukraine. In Belarus, a majority of the country’s population says Russian is their native language on the census.
NON-SLAVIC CYRILLIC
The Soviet Union developed many Cyrillic writing systems for non-Slavic languages spoken in the USSR and for Mongolian. Here, the banner at the 2019 summit of the CSTO is bilingual, with Russian on the left and Kyrgyz on the right.
Cyrillic, like the Roman alphabet, is used in a variety of language families.
In the Indo-European phylum, Romanian historically used Cyrillic as well as Roman and the Soviet Union developed Cyrillic versions of the Iranian languages Ossetian and Persian. Traditional Cyrillic Romanian died out by 1900 but was then revived by the USSR with a different spelling system as “Moldovan”. “Moldovan” now is only found in the Russian-aligned breakaway region of Transnistria. Ossetian is used in the republic of North Ossetia (capital Vladikavkáz, Ossetian Dzæwdžyqæw) in Russia and in the breakaway Russian-aligned South Ossetia (capital Tskhinvali) in Georgia. Cyrillic Persian, known as “Tajik”, is used in Tajikistan.
In the Uralic phylum, Cyrillic is used in the following, all in Russia: Erzya and Moksha languages in Mordovia (capital Saránsk, locally Saranosh), Mari in Mari El (capital Yoshkár-Olá), Udmurt in Udmurtia (capital Izhévsk, in Udmurt Izh), Komi in the Komi Republic (capital Syktyvkár), Permyak in the northern part of Perm’ Krai around Kudymkár (Permyak Kudymkör), Khanty and Mansi in the Khanty-Mansi Autonomous Okrug, also known as Yugra (capital Khánty-Mansíysk with largest city Surgút), and Nenets in the Yamalo-Nenets Autonomous Okrug (capital Salekhárd, Nenets Salya’harad), the Nenets Autonomous Okrug (capital Naryán-Mar, Nenets Nyar’yana-Marq), and northern Krasnoyarsk Krai around Noríl’sk. The USSR had a Cyrillicized Finnish known as Karelian, but this is no longer used. As a result, Karelia (capital Petrozavódsk, Finnish Petroskoi) is the only republic of Russia with a secondary language written in the Roman alphabet. The Saami language in Murmansk province also uses the Roman alphabet.
Cyrillic is used for the major Turkic languages Kazakh and Kyrgyz, and in Soviet times in the languages Azeri, Uzbek, and Turkmen, which now use the Roman alphabet. In addition to these languages of independent countries, Cyrillic is used in Turkic languages of Russia: Bashkir in Bashkortostan (capital Ufá, Bashkir Öfö) and Chelyabinsk province, Tatar in Tatarstan (capital Kazán) and in scattered areas from Chelyabinsk to Novosibirsk, Karachay-Balkar in Karachay-Circassia (capital Cherkéssk) and Kabardino-Balkaria (capital Nál’chik, in Balkar Naltsık), Kumyk in Dagestan (capital Makhachkalá), Altay in the Altay Republic (capital Górno-Altáysk, in Altay Ulalu), Yakut in Sakha (capital Yakútsk, in Yakut Djokuuskay), Dolgan in the Taymyr Dolgano-Nenetskiy Autonomous Okrug, Tuvan in Tuva (capital Kyzýl), Khakass in Khakassia (capital Abakán), and the Chuvash language in Chuvashia (capital Cheboksáry). Cyrillic also has been used to write Uyghur in Kazakhstan.
Cyrillic was historically used in the Crimean language in Ukraine and the Qaraqalpaq language in the republic of Qoraqalpog'iston of Uzbekistan but this has been replaced by the Roman alphabet.
Cyrillic is also used in Mongolian in both Mongolia and Russia, in the Khalkh and Buryat dialects. Khalkh is the language “Mongolian” of Mongolia, and Buryat is used in Buryatia (capital Ulán-Udé), Irkutsk province around Ust’-Ordýnskiy, and Zabykal’skiy Krai (capital Chitá) around Agínskoye. Kalmyk of Kalmykia (capital Elistá) is also a form of Mongolian, displaced far to the west by Chinese genocide in the eighteenth-century QIng conquest of Xinjiang.
Cyrillic is also used in the Tungusic language Evenk in two patches separated by the Yakuts: northern Krasnoyarsk Krai around Turá and in northeastern Sakha, and in southern Sakha and neighboring parts of Zabaykal’skiy Krai, Buryatia, and Khabarovsk Krai, and also in the isolate “Paleosibirian” languages Koryak of Kamchatka and Chukchi of the Chukotka Autonomous Okrug (capital Anádyr’, in Chukchi Kagyrgyn), and the Inuit language Yup’ik, also of Chukotka, which is the only language indigenous to both Asia and North America.
There is a Cyrillic form of Chinese, used to write the Dungan Mandarin dialect of Kazakhstan and Kyrgyzstan. The Dungan (东干族 Dōnggānzú) are descendants of Hui people (回族 Huízú) who fled into territory held by the Russian Empire during the reconquest of Xinjiang by 左宗棠 Zuǒ Zōngtáng (known commonly in English as “General Tso”) in the 1870s.
And Cyrillic is used in two phyla of Caucasian languages. In Northwest Caucasian it is used in the Circassian language in three republics: Adygeya (capital Maykóp, in Circassian Mıéquapə), Kabardino-Balkaria (whose capital Nál’chik in Circassian is Nalś’əč), and Karachay-Circassia (whose capital Cherkéssk in Circassian is Şărdjăsqală), and in Krasnodar Krai, and in the Abkhaz language in the breakaway Russian-aligned Abkhazia in Georgia, whose capital is called Sukhumi in Georgian and Aqwa in Abkhaz. In Northeast Caucasian Cyrillic is used in Ingush in Ingushetia (capital Magás near the former capital Nazrán’ which is Näsare in Ingush), in Chechen in Chechnya (capital Gróznyy, in Chechen Sölƶa-Ġala), and in multiple languages in Dagestan with Avar, Dargin, Lezghin, Lak, and Tabasaran being the most prominent.
So, when you encounter some unusual-looking Cyrillic, what is it?
First we will look at Turkic languages. Most Turkic languages have what is called vowel harmony. The vowels are divided into two sets, usually called “back” and “front” vowels, and a word tends to contain only front vowels or only back vowels. In Turkish, the “front” vowels are E, İ, which is marked with a dot, or the vowels with umlauts, Ö and Ü. The corresponding “back” vowels are A, I WITHOUT a dot, O, and U. The rule about front and back vowels applies to any suffixes added to a word, so that the plural suffix can be either -lar or -ler and the possessive suffix (which is added to the possessed noun, not the possessor) is either -ı or -i, with -sı or -si after a vowel.
So, Cyrillic needs to have all of these vowels. The default vowels А, Е, И, О, and У are used for A, E, İ (WITH a dot), O, and U. The I without a dot, which is pronounced like Japaneseうu or Korean 으 eu, resembles the Russian hard Y vowel, so it is written with Ы. Then the umlauted vowels Ö and Ü are written Ө and Ү.
These eight vowels are the vowels of Kyrgyz. Unlike Turkish and other Turkic languages, Kyrgyz has short and long vowels and so the long vowels are doubled. Kyrgyz also has the consonants Ң, romanized “ng”. That’s it: Kyrgyz uses the Russian alphabet with only three letters added. In addition in romanization, К is romanized “k” before front vowels and “q” before back vowels, and Г is romanized “g” before front vowels and “ğ” before back vowels, Й is romanized “y”, Ж is romanized “j” instead of “zh”, and Х is romanized “x” instead of “kh”. Q and ğ are uvular consonants, with q pronounced as it is in romanizations of Arabic and ğ being pronounced like the French R. All of these romanizations follow the rules of the Azeri language. So, Кыргыз is romanized “Qırğız”. Note that the name Кыргыз violates the hard-soft conventions of Russian, with a non-default vowel appearing after the neutral consonants К and Г. Kyrgyz retains Russian words without respelling them, so they use all the letters of Russian including ones not needed for Kyrgyz words. Kyrgyz words do not contain Ё, Ф, В, Ц, or the yers Ь and Ъ.
Kazakh has a more complicated vowel system than Kyrgyz. It has Cyrillic І, like Ukrainian, Э, Ә, and Ұ as well as the eight Kyrgyz vowels. These are romanized as ı, e, ä, and ū. The Cyrillic Ы is then romanized as y. After another vowel, У is romanized as w. Kazakh also has Қ and Ғ for q and ğ and romanizes the letters Ш and Ч with the Turkish ş and ç.
In Russia the most commonly encountered Turkic languages are Tatar in Tatarstan and Bashkir in Bashkortostan. Bashkir has the eight Kyrgyz vowels plus Э and Ә, has Ҡ (instead of Қ) and Ғ, and adds Ҙ, romanized as dh, which is the “th” in “either”, Ҫ, romanized as th, which is the “th” in “ether”, and Һ, romanized as h, which is the English H. Tatar has the eight Kyrgyz vowels plus Ә and the Russian-looking diphthong ый, romanized as ıy. Tatar consonants are the same as Kazakh except that Қ and Ғ are written using the hard yer as in Caucasian languages as Къ and Гъ and the soft yer ь appears as the “voiceless glottal stop”, found in English as the “dropped” T in words like “kitten”, or for some British speakers, “bottle”.
Tajik, a Cyrillicized form of Persian, has the five default vowel letters А, Е, И, О, and У and has long vowels Ӣ and Ӯ. It has Қ and Ғ for q and “gh”, Ҷ for j, here having the English pronunciation of J, and the voiceless glottal stop ь.
Mongolian has six of the eight Kyrgyz vowels: А, И, О, Ө, У, and Ү. It does NOT have Е, instead using Э, and does NOT have Ы. It has vowel harmony like a Turkic language and has vowel length with double letters like Kyrgyz. Unlike in Turkic languages, the vowels Ё, Ю, and Я are used as in Russian. And as mentioned before, the soft yer is used in its original Church Slavonic sense as a short “I”.
In the Caucasus, the “mountain of tongues”, the languages are quite bizarre looking in Cyrillic, like the Ossetian on the bilingual sign at the beginning of the East Slavic section. Ossetian is the only Cyrillic language that uses Ӕ. It has Хъ and Гъ for the q and gh sounds of Persian and uses the hard yer ъ after the consonants П, Т, К, Ц, and Ч to indicate the Caucasian “ejective” consonants. Occasionally English speakers use the ejective K at the end of words.
Finally, Northwest and Northeast Caucasian languages have some of the most complex consonant systems in the world, with the extinct Northwest Caucasian language Ubykh having the most consonants of any language which does not have clicks. They can be distinguished from Ossetian by the ejective consonants being written with a pálochka, which is Russian for “stick”. Pálochka looks exactly like the Ukrainian capital I except that it is NEVER found at the beginning of a word, being only found after the ejective.
Tevfik Esenç (1904-1992), the last surviving native speaker of the Ubykh language.