What is the difference between keywords and word sketches?

In general, there are 2 main differences between the Keywords and terms tool and the Word sketch tool:

  1. The Keywords & terms tool works on comparing a Focus corpus (your opened corpus) with a reference corpus (the corpus you selected for the comparison). It compares the relative frequency of a concrete word in the focus corpus with the relative frequency of the same word in the reference corpus. The output of the keywords & terms extraction is a list of words with the highest ratio (relative freq. in the focus corpus vs relative freq. in the reference corpus). To simplify it: you need to have a tokenized corpus (divided into tokens) and a reference corpus of the same language. The reference corpus should be as large as possible to provide quality results. (Usually, our TenTen corpora are pre-selected as the reference corpora.)
  2. On the other hand, the word sketch tool generates a one-page summarization of the most typical collocations/combinations for a specific lemma/word in a specific corpus. All provided collocations relate to the specific word for which the word sketch tool was used. This tool always requires a word sketch grammar, tokenization, and usually part-of-speech tagging and lemmatization to get more specified collocations (object relations, modifiers, etc.). The quality of the word sketch results depends heavily on the size of the selected corpus. Small corpora will have only poor word sketch results, provided a few collocations only due to lack of data.

related blog posts: