Spelling conventions in late-period Czech, or, Making Google Translate less cranky

So I'm digging through some Latin/Czech dictionaries from 1579 and 1605, right, and I'm reminded yet again that like most other languages, Czech in period was (a) not fully consistent in its spelling, and (b) generally spelled somewhat differently than modern Czech, including using letters that are not present at all, or barely, in modern Czech. This can present difficulties in translation for those of us who aren't fluent, since Google Translate, bless its algorithmic heart, only knows modern Czech.

I've spent enough time digging around in period Czech texts by this point that I make the relevant spelling substitutions fairly fluently, but I realized it might be useful to have them set down somewhere I could point others to, for use in their own research.

I will note that while most of these are actually fairly consistent, others can be either situational (like y, as noted) or inconsistent across manuscripts or contexts (like v, as noted). For researchers who have some familiarity with Czech, this doesn't pose a problem, as it's generally clear what the word is, but for those unfamiliar with the language, this can make things more difficult.

Here's a list of the letters or digraphs (two letters for one sound) you'll find in 15th-16th century Czech texts, and the letters that generally represent the same sounds in modern Czech. Period spelling is on the left, modern equivalent is on the right.

g → j*

ij → í

w → v

v → u (but not in every manuscript; sometimes a v really is just a v)

y (standalone) → i

y (preceding an a) → j

cz → č

ie → ě

ſſ or ſs → š

rz → ř

* Except as the initial letter of a name, where it does often show up as J instead of G.

In later-period texts, those last four do start showing up as single letters with diacritics, much as they do modernly, rather than as a digraph, in accordance with the reforms proposed in Jan Hus's De orthographia bohemica. However, while modern Czech uses a different diacritic as the mark for long vowels than it does for palatalization, period Czech appears to have used the same mark for both. Most of the time this is fine, especially to a Czech speaker, because there is only one letter that can be either short (e), long (é), or palatalized (ě), and someone familiar with the language will have a clear sense of whether a mark over any given e is intended for length or palatalization. In any other situation, there's no ambiguity; a mark over a letter can only mean one thing.

What I find particularly interesting about these changes is that almost none of these letters or letter combinations persist in modern Czech (though you do still see them in modern Polish); none of the digraphs are still used, and g and w don't exist at all outside of foreign loan-words. Given how common both g and w are in period texts, because the sounds they're standing in for are extremely common in Czech, both then and now, you can imagine how weird those words look to modern speakers.

But apart from the spelling, the language is otherwise basically still the same! Doing simple letter substitutions as above before running your text through machine translation is pretty much all that's necessary, and I love that this means that words written by people five hundred years ago are still completely and perfectly comprehensible today.

Or and Vert

Search This Blog

Spelling conventions in late-period Czech, or, Making Google Translate less cranky

Labels

Comments

Post a Comment

Popular posts from this blog

Maunche text for Gavin Kent

DMNES off-label use: Generating lists of names by culture

Silver Rapier scroll for Percival Michaelson