OneClick Terms is a simple term extractor interface giving easy access to terminology extraction functionality. It is powered by the Sketch Engine technology.
Supported languages: Chinese simplified and traditional, Czech, Dutch, English, French, German, Italian, Japanese, Korean, Polish, Portuguese, Russian, Slovak, Slovenian, Spanish.
OneClick Terms is a simple term extractor interface giving easy access to terminology extraction functionality. It is powered by the Sketch Engine technology which guarantees speedy processing. Unparalleled linguistic analysis uses part-of-speech tagging and lemmatization to produces exceptionally clean term extraction results requiring hardly any manual cleaning. The extracted terms are ready for import into a CAT (Computer Assisted Translation) tool or a term management system.
OneClick Terms can be used for text analytics and topic modelling to identify the main topic(s) of a large quantity of text through keywords and terms which serve as indicators of the main subject(s).
The term extraction quality is achieved by using language specific criteria describing the allowed terminology structures in the language. For example, a term in English will most likely take the form of (noun+)noun+noun or adjective+noun while in Spanish, most likely, noun+adjective(+adjective) or noun+de+noun. There is a more complex set of rules for each language which ensures that no noise is included in the results. This approach does not require black lists or stoplists either.
OneClick Terms can extract terminology from a number of common document formats (TMX, XLIFFv2, PDF, DOC, DOCX, HTML, TXT) and export the results into plain text, CSV or TBX formats.
word form each form of the word will be listed separately, i.e. test, tests, tested will be listed as three separate items
base form (lemma) all forms of the same word will be listed as the base form, i.e. test, tests, tested will be listed as one item 'test'
rare words only terms which are very rare in general language will be included
common words terms which are relatively common in general language will also be included
an initial number of extracted terms, you can load more
a term will only be included if at least the set number of times in the corpus
only letters and numbers terms such as NATO, mp3, B2B will be included but not the following ones: FL!P, K-system etc.
at least one letter the term must contain at least one letter, the following ones will not be included: 3!!!, [3-3] etc.