Why do the corpus building take (so) long?

To begin with, process of corpus building can be stopped at any time by clicking on the 'Cancel' link below the text Compiling

Creating a corpus from the Web may last fairly long. It mostly depends on the size of downloaded sources. In the case of creating corpus from uploaded files, the speed depends on the user's internet connection.

If you build a corpus using the 'Web search' option, the result size depends on the number of seed words you included and their frequency on the internet (specifically on Bing where the selected words are searched for). A lot of seed words (ca 15–20 words) or seed words which are very frequent result in a bigger corpus, but also the process (both text downloading and corpus building) takes longer. 

Creating corpora from the website is based on the size of the whole website. Sketch Engine can download up to 2,000 pages from a website. However, the speed is limited to ca. 6 pages per minute to avoid possible overloading the downloaded page and subsequently blocking our tool by the website providers.

related topics: