todoELE | Corpus | esTenTen – Spanish corpus from the web

esTenTen – Spanish corpus from the web

https://www.sketchengine.eu/estenten-spanish-corpus/

Etiquetas:

The Spanish Web corpus (esTenTen) is a text corpus created from the collected internet texts. The corpus belongs to the TenTen corpus family which is a set of the same processed web corpora with the target size 10+ billion words. Sketch Engine currently provides access to TenTen corpora in more than 40 languages.

The corpus contains subcorpora based on the language varieties – European Spanish and American Spanish. Particular Spanish varieties were downloaded from web domains in the respective continents.

Detailed information about TenTen corpora is on the separate page Common TenTen corpora attributes.

Part-of-speech tagset
Part-of-speech tagging and lemmatisation were performed using FreeLing analyser with Spanish configuration, see Spanish FreeLing tagset.

Overview of Spanish TenTen corpora
A list of Spanish TenTen corpora available in the Sketch Engine database:

Spanish Web corpus 2023 (esTenTen23) – 28.6 billion words (European Spanish Web, American Spanish Web, whole Spanish Wikipedia) with topic classification for the biggest web domains based on a semi-manual check of sample texts
Spanish Web corpus 2018 (esTenTen18) – 16.9 billion words (European Spanish Web, American Spanish Web, whole Spanish Wikipedia) with topic classification for the biggest web domains based on a semi-manual check of sample texts
Spanish Web corpus 2011 (esTenTen11) – 9.5 billion words (European Spanish Web, American Spanish Web, small part of Spanish Wikipedia)

like0
compartir

Deja un comentario

Debes indicar tu nombre en el mensaje para que se publique tu comentario.

Publica

Etiquetas

corpus de estudiantes corpus de referencia corpus de vídeo corpus escrito corpus general corpus multilingüe corpus oral corpus sonoro corpus textual español para fines específicos inmigrantes variedades geográficas

Search form

Etiquetas:

Deja un comentario

Plain text

Etiquetas