Different systems may use different tokenizations, i.e. the rules for determining what the smallest constituent of a corpus is. See: https://www.sketchengine.eu/my_keywords/token/
In addition to tokens, Sketch Engine also gives the total number of words. Each system may define a word differently. Sketch Engine defines a word as a token which starts with an alphabetic character. Tokens starting with digits or punctuation are not words. See: https://www.sketchengine.eu/my_keywords/word/
Examples of words in English that can be tokenized as one or two or more tokens:
- contractions (isn't, didn't…)
- possessives (John's, children's…)
- multiple punctuations such as ellipsis (…), duplicate question marks (???), multiple exclamation marks (!!), sequences of punctuation (!")
Similar differences can be found in other languages too.