A dictionary describes languages, and in this article we describe OmegaWiki's notion of "language". OmegaWiki bases its notion on the ISO-639-3 standard, with some variation. There is a language table in the database.
OmegaWiki's notion of language
OmegaWiki's list of language codes is generally drawn from the ISO 639-3 standard, which has some 7602 entries for individual languages and language families (or macro-languages). This is a different list than the project codes used by MediaWiki. The ISO 639-3 standard is administered by the SIL, based on SIL's Ethnologue and the Linguist list.
When no ISO 639-3 is available for a language that we want to use, we take a code from the Linguist List or create our own code.
SIL has this to say about the scope of language identifiers for individual languages:
- "There is no one definition of "language" that is agreed upon by all and appropriate for all purposes. As a result, there can be disagreement, even among speakers or linguistic experts, as to whether two varieties represent dialects of a single language or two distinct languages. For this part of ISO 639, judgments regarding when two varieties are considered to be the same or different languages are based on a number of factors, including linguistic similarity, intelligibility, a common literature, the views of speakers concerning the relationship between language and identity, and other factors."
The list of all languages having an ISO 639-3 code is displayed at Help:Language/Checklist for editable languages with comments on whether the language is available for edition at OmegaWiki, and reasons why some languages are not considered.
The ISO 639-3 standard also contains codes for macrolanguages, that is:
- "...clusters of closely-related language varieties that, based on the criteria discussed above, can be considered distinct individual languages, yet in certain usage contexts a single language identity for all is needed."
Examples of macrolanguages registered in 639-3 are "zho" (Chinese), which covers 14 Chinese languages, and "ara" (Arabic), which covers 29 Arabic languages.
Generally we avoid using macrolanguages in OmegaWiki. We don't have a language "Chinese", but we have "Mandarin (simplified)", "Mandarin (traditional)", "Minnan (simplified)", "Minnan (traditional)", Minnan (POJ)". See below for language with multiple script.
We do have a language "Arabic". It is not the macrolanguage with code "ara", but the individual language with code "arb", corresponding to Standard Arabic.
In general, the distinction between a dialect and a language is fuzzy.
The ISO 639-3 standard explicitly declines to encode dialects, where "...the term dialect is used as in the field of linguistics where it simply identifies any sub-variety of a language such as might be based on geographic region, age, gender, social class, time period, or the like." Thus it has one language code "eng" (English) which covers usage in the UK, USA, and some 104 other countries, including dozens of dialects from Cockney to Black English.
In OmegaWiki, there are two ways to deal with "dialects":
- we define separate languages, such as Moroccan Arabic, Algerian Arabic, while keeping a general language Arabic for those words that are the same accross all regions ;
- or we use an annotation "area", which specifies in which region a word is used. For example for English, we can currently specify the following areas: USA, UK, Australia, New Zealand, South Africa, New England.
The decision on which system to use is made on a case by case basis. The second method is easier to maintain when a lot of areas are involved.
Languages with several scripts
There are many languages which are or have been written with several scripts.
Serbian is one such language, it can be written either with the Latin script or with the Cyrillic script. In that case, we define separate languages: Serbian (Latin script) and Serbian (Cyrillic script), and we create custom codes, respectively "srp-Latn" and "srp-Cyrl".
Similarly, we have the languages "Hebrew" and "Hebrew (nikkud)".
Some languages, such as "ase" American Sign Language are neither spoken nor written. Although OmegaWiki is a dictionary of written expressions, it is a goal eventually to catalog sign languages as well. Thus, some non-written languages are in scope for OmegaWiki.
Sutton SignWriting script provides a way to record sign languages in writing. However, Sutton SignWriting is not yet encoded in Unicode, so it is technically difficult to record it in the database. ISO 639-3 does have language codes for multiple sign languages.
Chemical formulas, Scientific Latin, etc.
For chemical formulas like H2O or Scientific Latin names for animals or plants (e.g. Equus caballus) we use a language called International with the ISO 639-3 code "mul".
Future prospects: ISO 639-6
OmegaWiki intends to transition to ISO 639-6 as its list for linguistic entities. ISO 639-6's goal is to have codes for "comprehensive coverage of language variation". Presumably this would include codes for everything registered in ISO 639-3, and also dialects and scripts which are outside ISO 639-3's scope. The ISO 639-6 codes can be browsed on geolang.com.
List of languages in use on OmegaWiki
- Editable languages is a page with a list of the editable languages (automatically generated).
- Omegawiki.org's statistics shows the number of expressions, definitions etc. in each language.
If you would like to edit in a language that is not yet available in OmegaWiki, you can ask for it in the International Beer Parlour. Editable languages can be added by bureaucrats.
- Language/List - list of languages and link to their portals
- Language/Checklist for editable languages - list of which languages are included / will not be included in OmegaWiki.