CALS vs WALS: Part 2 - Nouns
CALS vs WALS: Part 2 - Nouns
PART 1: PHONOLOGY
PART 2: MORPHOLOGY, NOMINAL CATEGORIES, NOMINAL SYNTAX
---
So I recently found out about CALS, the conlang world's answer to WALS. And as I noticed that all the categories were the same, and that the numbers of catalogued languages on each side were pretty similar, I started thinking that someone should go through and compare things.
And, by the great tradition of "someone should" => "I should", here we are.
For each of the features in both the WALS and CALS databases, I've converted them to percentages, then subtracted the CALS number from the WALS number. Nothing too mathematically profound, but it should still give us some interesting data. In effect, a value of +10% means that 10% of conlangs have that feature and "shouldn't" have that feature, while -15% means that 15% of conlangs don't have that feature and "should". (Pretty heavy inverted commas there - I'm not trying to be prescriptive - but still a good way of picturing things.)
And because it gives us a *lot* of data, much of it interesting, I've decided to split things up into a few posts so people don't get bogged down in numbers. In general, I've chosen the features with the most extreme positive or negative values, plus a few that I just find interesting. If anyone's curious about any features I've missed off, let me know and I'll throw together another graph.
Just comparing percentages doesn't give you the full picture by any means - an extra 10% on a feature that 75% of natlangs have will show up the same as an extra 10% on a feature that pretty much never happens. But it's not a bad start. Maybe I'll delve into some more complex statistical stuff at a later date.
PART 1: PHONOLOGY
Consonant Inventories
It seems that there's a tendency towards average-sized (19-25) consonant inventories here.
People seem to be shying away more from the very small (6-14) than the very large (34+) inventories - maybe they're not seen as as interesting. ("Just one more phoneme...")
Vowel Inventories
But for vowels, unlike consonants, there's a tendency away from the average. Possibly some of the huge interesting Indo-European vowel systems are pulling people away from the mean, towards larger (7+) inventories.
Voicing in Plosives and Fricatives
There's a strong tendency here - 20% more conlangs have a voicing contrast throughout.
Front Rounded Vowels
And again, possibly fueled by the tendency towards larger vowel systems, we see more conlangers going for the most "interesting" options.
Tone
As people generally assume, lots of conlangs don't have tones. But what surprised me here was how well the languages with tone matched the natlang distribution of complexity of tone system - I'd expected the tonal-conlangers to have gone much more for big dramatic contours-and-sandhi systems over simple two-way contrasts. Maybe there are more pitch-accent langs than I thought...
Stress
Fixed stress seems unpopular. But though I'd have expected to see unpredictable stress as popular, I wouldn't have expected "Right-oriented: one of the last three" to have shown up quite so strongly.
Uncommon Consonants
English rears its ugly head again. Non-sibilant dental fricatives are pretty rare in natlangs, being less common than co-articulated /kp/ - but because English has them (and, if I'm honest, because they're quite a nice-sounding sound) they show up in 18.5% more conlangs than natlangs. Though, to be honest, I was expecting a larger number - the tendency away from tone was larger than this one.
COMING SOON: MORPHOSYNTAX
PART 2: MORPHOLOGY, NOMINAL CATEGORIES, NOMINAL SYNTAX
---
So I recently found out about CALS, the conlang world's answer to WALS. And as I noticed that all the categories were the same, and that the numbers of catalogued languages on each side were pretty similar, I started thinking that someone should go through and compare things.
And, by the great tradition of "someone should" => "I should", here we are.
For each of the features in both the WALS and CALS databases, I've converted them to percentages, then subtracted the CALS number from the WALS number. Nothing too mathematically profound, but it should still give us some interesting data. In effect, a value of +10% means that 10% of conlangs have that feature and "shouldn't" have that feature, while -15% means that 15% of conlangs don't have that feature and "should". (Pretty heavy inverted commas there - I'm not trying to be prescriptive - but still a good way of picturing things.)
And because it gives us a *lot* of data, much of it interesting, I've decided to split things up into a few posts so people don't get bogged down in numbers. In general, I've chosen the features with the most extreme positive or negative values, plus a few that I just find interesting. If anyone's curious about any features I've missed off, let me know and I'll throw together another graph.
Just comparing percentages doesn't give you the full picture by any means - an extra 10% on a feature that 75% of natlangs have will show up the same as an extra 10% on a feature that pretty much never happens. But it's not a bad start. Maybe I'll delve into some more complex statistical stuff at a later date.
PART 1: PHONOLOGY
Consonant Inventories
It seems that there's a tendency towards average-sized (19-25) consonant inventories here.
People seem to be shying away more from the very small (6-14) than the very large (34+) inventories - maybe they're not seen as as interesting. ("Just one more phoneme...")
Vowel Inventories
But for vowels, unlike consonants, there's a tendency away from the average. Possibly some of the huge interesting Indo-European vowel systems are pulling people away from the mean, towards larger (7+) inventories.
Voicing in Plosives and Fricatives
There's a strong tendency here - 20% more conlangs have a voicing contrast throughout.
Front Rounded Vowels
And again, possibly fueled by the tendency towards larger vowel systems, we see more conlangers going for the most "interesting" options.
Tone
As people generally assume, lots of conlangs don't have tones. But what surprised me here was how well the languages with tone matched the natlang distribution of complexity of tone system - I'd expected the tonal-conlangers to have gone much more for big dramatic contours-and-sandhi systems over simple two-way contrasts. Maybe there are more pitch-accent langs than I thought...
Stress
Fixed stress seems unpopular. But though I'd have expected to see unpredictable stress as popular, I wouldn't have expected "Right-oriented: one of the last three" to have shown up quite so strongly.
Uncommon Consonants
English rears its ugly head again. Non-sibilant dental fricatives are pretty rare in natlangs, being less common than co-articulated /kp/ - but because English has them (and, if I'm honest, because they're quite a nice-sounding sound) they show up in 18.5% more conlangs than natlangs. Though, to be honest, I was expecting a larger number - the tendency away from tone was larger than this one.
COMING SOON: MORPHOSYNTAX
Last edited by PTSnoop on 12 Jul 2013 00:33, edited 2 times in total.
- Creyeditor
- MVP
- Posts: 5091
- Joined: 14 Aug 2012 19:32
Re: CALS vs WALS: A Comparison
Very good idea (though CALS is by no means representative)
Creyeditor
"Thoughts are free."
Produce, Analyze, Manipulate
1 2 3 4 4
Ook & Omlűt & Nautli languages & Sperenjas
Papuan languages, Morphophonology, Lexical Semantics
"Thoughts are free."
Produce, Analyze, Manipulate
1 2 3 4 4
Ook & Omlűt & Nautli languages & Sperenjas
Papuan languages, Morphophonology, Lexical Semantics
- Ear of the Sphinx
- mayan
- Posts: 1587
- Joined: 23 Aug 2010 01:41
- Location: Nose of the Sun
Re: CALS vs WALS: A Comparison
Do those sites have information on other uncommon consonants, such as the bilabial trill or labiodental flap? I want to see how many other conlangs have them.
Please don't read this.
Re: CALS vs WALS: A Comparison
They don't.Valkura wrote:Do those sites have information on other uncommon consonants, such as the bilabial trill or labiodental flap?
Re: CALS vs WALS: A Comparison
PART 2: MORPHOLOGY, NOMINAL CATEGORIES, NOMINAL SYNTAX
Morphology was quite a short section, so I've included all the noun stuff as well. And to fit things on the graph, I've increased the y axis from ±30% to ±40%.
Head Or Dependent Marking
General tendency here towards dependent-marking. But interestingly, the trend's away from "Inconsistent or other" rather than "Head marking". Clearly, we need more people to think of crazy inconsistent systems.
Reduplication
This one's the main reason for my change to 40%. There's a *very* strong tendency here away from partial reduplication, scraping my limits at -39.7%.
Number Of Genders
This is one of those places where we're not doing so badly. I'd have expected more of a bias towards no genders (I tend to avoid the things, myself), but if anything, we've got more than we need.
Associative Plurals
Another 30%-breaker, apparently we don't like associative plurals. Or possibly (like me) we'd not really heard of them before...
Definite Articles
Another place where we're not doing too badly. There's a bias towards no articles at all - plausibly to get further away from Standard Indo-European - but not as strong as I'd have thought. Maybe it's time to start reintroducing the things.
Indefinite Pronouns
A strong trend away from interrogative-based indefinite pronouns. (Which is a shame, I like questions like "He ate something?" for "What did he eat?".)
Number of Cases
Vague bell curve here, centered at around three or four cases, and then another big peak for the 10+ case systems. And again, it looks like we need more minimal systems and more inconsistent-borderline systems here.
Ordinal Numerals
The tendency here seems to be towards the regular and consistent "one two three" and "oneth twoth threeth" systems - possibly "first twoth threeth" feels arbitrary and inconsistent. But again, natlangs prove more abitrary and inconsistent than the average conlang...
Distributive Numerals
This is consistent with what we saw about people not really using reduplication before.
Conjunctions and Quantifiers
Another 30%-breaker. Like for indefinite pronouns, we're seeing conlangers more likely to create separate categories instead of just blending in categories we've already got.
Adjectives Without Nouns
Would it be simplistic of me to assume that the "Not without noun" bar slots neatly into the "Without marking" bar, and the "marked by suffix" into the "marked by preceding word" bar? Maybe people who would otherwise have allowed unmarked adjectives-as-nouns decided against them for ambiguity reasons, while preceding-word people decided on suffixes instead? Maybe not, but I can dream.
And and With
And to finish, a nice simple graph, again matching the tendency for conlangers to create multiple categories rather than reusing existing things.
COMING SOON: VERBAL CATEGORIES
Morphology was quite a short section, so I've included all the noun stuff as well. And to fit things on the graph, I've increased the y axis from ±30% to ±40%.
Head Or Dependent Marking
General tendency here towards dependent-marking. But interestingly, the trend's away from "Inconsistent or other" rather than "Head marking". Clearly, we need more people to think of crazy inconsistent systems.
Reduplication
This one's the main reason for my change to 40%. There's a *very* strong tendency here away from partial reduplication, scraping my limits at -39.7%.
Number Of Genders
This is one of those places where we're not doing so badly. I'd have expected more of a bias towards no genders (I tend to avoid the things, myself), but if anything, we've got more than we need.
Associative Plurals
Another 30%-breaker, apparently we don't like associative plurals. Or possibly (like me) we'd not really heard of them before...
Definite Articles
Another place where we're not doing too badly. There's a bias towards no articles at all - plausibly to get further away from Standard Indo-European - but not as strong as I'd have thought. Maybe it's time to start reintroducing the things.
Indefinite Pronouns
A strong trend away from interrogative-based indefinite pronouns. (Which is a shame, I like questions like "He ate something?" for "What did he eat?".)
Number of Cases
Vague bell curve here, centered at around three or four cases, and then another big peak for the 10+ case systems. And again, it looks like we need more minimal systems and more inconsistent-borderline systems here.
Ordinal Numerals
The tendency here seems to be towards the regular and consistent "one two three" and "oneth twoth threeth" systems - possibly "first twoth threeth" feels arbitrary and inconsistent. But again, natlangs prove more abitrary and inconsistent than the average conlang...
Distributive Numerals
This is consistent with what we saw about people not really using reduplication before.
Conjunctions and Quantifiers
Another 30%-breaker. Like for indefinite pronouns, we're seeing conlangers more likely to create separate categories instead of just blending in categories we've already got.
Adjectives Without Nouns
Would it be simplistic of me to assume that the "Not without noun" bar slots neatly into the "Without marking" bar, and the "marked by suffix" into the "marked by preceding word" bar? Maybe people who would otherwise have allowed unmarked adjectives-as-nouns decided against them for ambiguity reasons, while preceding-word people decided on suffixes instead? Maybe not, but I can dream.
And and With
And to finish, a nice simple graph, again matching the tendency for conlangers to create multiple categories rather than reusing existing things.
COMING SOON: VERBAL CATEGORIES
Re: CALS vs WALS: Part 2 - Nouns
I'm quite enjoying this. It seems Iriex has a slight tendency away from the norm (it includes reduplication, 'with' is the same as 'and' etc.)
It's strange though, I would have thought that reusing existing categories was a common thing to do since it requires less work.
It's strange though, I would have thought that reusing existing categories was a common thing to do since it requires less work.
Sin ar Pàrras agus nì sinne mar a thogras sinn. Choisinn sinn e agus ’s urrainn dhuinn ga loisgeadh.
Re: CALS vs WALS: A Comparison
It's the other way around: the indefinite pronoun is based on the interrogative. So you would say "What did he eat?" but, "He ate a what" instead of "He ate something," or "He didn't eat a what," for "He ate nothing."PTSnoop wrote:A strong trend away from interrogative-based indefinite pronouns. (Which is a shame, I like questions like "He ate something?" for "What did he eat?".)
Based on these two posts, it looks like the two least common conlang features in Himmaswa are a similarity between conjunctions and some quantifiers with interrogatives "He didn't eat however what" = "He didn't eat anything," and interrogative-based indefinite pronouns "He ate an instance of what." The most common conlang features in Himmaswa are one-th two-th three-th ordinals, and a lack of distributive numerals.
Re: CALS vs WALS: A Comparison
In most languages, I think it would be "he ate whatthing". In most languages, indefinites are derived from interrogatives (cf English "somewhere" and "somehow"). In only a few languages indefinites and interrogatives are identical.clawgrip wrote: It's the other way around: the indefinite pronoun is based on the interrogative. So you would say "What did he eat?" but, "He ate a what" instead of "He ate something,"
Re: CALS vs WALS: Part 2 - Nouns
Hmm, the people on the ZBB have pointed out to me that there are, for some inexplicable reason, natlangs recorded on CALS.
This is going to throw off all my numbers - for example, it turns out my big reduplication 40% is now actually closer to 50%. I'll go through and retcon all the earlier graphs to the conlang-only numbers once I have time.
This is going to throw off all my numbers - for example, it turns out my big reduplication 40% is now actually closer to 50%. I'll go through and retcon all the earlier graphs to the conlang-only numbers once I have time.
Re: CALS vs WALS: A Comparison
Yeah, I guess that's kind of what I meant, but I may have simplified it a little too much. I was thinking a bit about Japanese, (the non-native language I speak best) where it's a bit extreme. Indefinite pronouns are formed by adding the interrogative pronoun to the question particle (which also means "or")Xing wrote:In most languages, I think it would be "he ate whatthing". In most languages, indefinites are derived from interrogatives (cf English "somewhere" and "somehow"). In only a few languages indefinites and interrogatives are identical.clawgrip wrote: It's the other way around: the indefinite pronoun is based on the interrogative. So you would say "What did he eat?" but, "He ate a what" instead of "He ate something,"