Sketch Engine KB - Why is the number of words in my corpus different or wrong?

If you used your corpus in a different tool such as AntConc, LancsBox or WordSmith Tools, you may notice that Sketch Engine (and possibly each of the other tools) shows a different number of words in the same corpus.

Different systems may use different tokenizations, i.e. the rules for determining what the smallest constituent of a corpus is. See: https://www.sketchengine.eu/my_keywords/token/

In addition to tokens, Sketch Engine also gives the total number of words. Each system may define a word differently. Sketch Engine defines a word as a token which starts with an alphabetic character. Tokens starting with digits or punctuation are not words. See: https://www.sketchengine.eu/my_keywords/word/

Examples of words in English that can be tokenized as one or two or more tokens:

contractions (isn't, didn't…)
possessives (John's, children's…)
multiple punctuations such as ellipsis (…), duplicate question marks (???), multiple exclamation marks (!!), sequences of punctuation (!")

Similar differences can be found in other languages too.