People keep making this claim here and there, is there some actual research behind it ?
I'm asking because except for Mr Payack's hoax
, I cannot see where it comes from.
Well, the author of LanguageLog (Benjamin Zimmer) seems to think the lexicographers of the Oxford English Dictionary have done such research.
If you count just head-words of lexical entries there were 300,000 in the 2nd edition of the OED, but there will be more in the 3rd edition.
And if you count all lexemes (all lexical entries, even those that aren't for head-words), there are over 600,000 of those in the 2nd edition of the OED, and apparently they expect around 1.3 million in the 3rd edition.
.... even if we consider one particular dictionary there is no simple answer to how many "words" it contains. The second edition of the Oxford English Dictionary has about 300,000 headwords, covering 640,000 words and phrases, according to AskOxford. (The Third Edition, now in preparation, will increase that number to 1.3 million or more.) So do we count headwords? All defined words and phrases? Every distinct sense and subsense of those words and phrases? Every spelling variant? Do archaic words make the cut, and if so, what's the chronological cutoff for "English"? In estimating the size of the lexicon, AskOxford remains admirably agnostic in its FAQ (emphasis mine):
How many words are there in the English language?
There is no single sensible answer to this question. It is impossible to count the number of words in a language, because it is so hard to decide what counts as a word. Is dog one word, or two (a noun meaning 'a kind of animal', and a verb meaning 'to follow persistently')? If we count it as two, then do we count inflections separately too (dogs plural noun, dogs present tense of the verb). Is dog-tired a word, or just two other words joined together? Is hot dog really two words, since we might also find hot-dog or even hotdog?
It is also difficult to decide what counts as 'English'. What about medical and scientific terms? Latin words used in law, French words used in cooking, German words used in academic writing, Japanese words used in martial arts? Do you count Scots dialect? Youth slang? Computing jargon?
The lexicographers preparing the 3rd edition of the OED are the ones to whom I was referring, who say that the new OED will have about 1.3 million entries ("lexemes")
I've heard claims of 200,000, 500,000, 600,000, 2,000,000, and I'd guess a few more. How did they arrive at these numbers? And how can they be so different?
The above from the AskOxford's FAQ should explain it. What's a word? How do you tell one word is different from another word, instead of just being a use or a form of it? How do you tell a word is an English word, instead of a foreign word just being used in an English sentence?
The fuzziness at some of the boundaries means that it's hard to count.
The numbers I used to use -- namely, around 225,000 and around 625,000 --- were based on the counts lexicographers gave in talking about the dictionaries they produced. But I'd have to somehow find out which dictionaries those were and who the lexicographers were, and re-read what they wrote, to find out if they were talking about head-words of entries (or entries of head-words), or something a little less strict.
Anyway, if you don't say something like "English's millionth word was coined sometime last summer", it seems reasonable to say "lexicographers are now saying English has over a million words". As has been mentioned in an article Zimmer links to, there are over a million names of chemicals