) always are in the same places DEFFILTERLINK 2016/01 where still the same delimiters (e.g. Values must be in the numeral format, e.g. To set a structure attribute containing a date for computing trends. WSTRANSLATEĬonfiguration of languages and corpora for bilingual word sketch using the “Translate button”.įor example: WSTRANSLATE ",French,frtenten,German,detenten2_simplews,Polish,pltenten,Spanish,eseutenten11_freeling,Italian,ittenten"Īppropriate dictionaries named - are expected in pcdict_path ( /corpora/pcdicts or as specified in run.cgi). NONWORDREĪ regular expression determining which tokens should not be considered words, defaults to ].* – therefore the default definition of a word is ].*. The structure that should be considered to be a document, defaults to “doc”. Identification of the person responsible for maintaining the corpus DOCSTRUCTURE It is also used to map attribute alias “-” in the web API. NEWVERSIONįor old versions of corpora only: the name of the new version of the corpus DEFAULTATTRĭefault attribute for CQL query evaluation. ALIGNDEFįor parallel corpora only: comma-separated list of mapping definition files to aligned corpora. such a structure that is present in both parallel corpora and on which the alignment is performed. ALIGNSTRUCTįor parallel corpora only: the name of the mapping structure, i.e. All corpora should have a structure defined in ALIGNSTRUCT (“align” in manatee 2.67 and higher). Arabic) ALIGNEDįor parallel corpora only: comma-separated list of aligned corpora. Indicates whether the language of the corpus is in the right-to-left script (e.g. Arabic, Chinese, Japanese, Korean, Nepali, Telugu, Tamil, …įor example, in the case of an Arabic corpus, the registry file should contain the following configuration (it implies that lowercase attributes are not required such as lc or lemma_lc): NOLETTERCASE "1" NOLETTERCASEįor corpora of languages that do not have capital letters, e.g. Language name – it should be capitalized and one of the allowed names, otherwise the system will not be able to automatically detect the right locale and you may experience errors when sorting or regular expression matching of non-ASCII characters. Name of the corpus defaults to the corpus config filename ENCODINGĭefault LOCALE for attributes (see below).
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |