When applying a SpaCy pipeline to the input text, the annotations can be filtered by a custom set of rules.
These rules are based on linguistic features, such as:
In order to be part of the output set, an annotation needs to fulfill all filter rules.
Three different rule types are available:
lengthmay exclude annotations that exceed (
max) or fall below (
min) a certain character length threshold.
non-stopwordsmay only include annotations with
allword tokens not being stopwords. (
denycauses the service to block annotations without stopwords. (This is not recommended.)
linguisticsmay only include annotations with
allword tokens being members of the comma-separated list by a given linguistic feature. (
denycauses the service to block annotations that match the given linguistic features.
In addition to the linguistic ruleset, a
lemmatization step can be enabled to lemmatize the text before entity search. The lemmatization can be enabled by adding an additional rule component of type
lemmatize to the filter rules.
You can add additional pretrained SpaCy pipelines in the settings page. The specific pipeline must be downloadable through:
python3 -m spacy download PIPELINE_NAME
Look for annotations longer than 4 characters
Look for annotations that consist of at least one NOUN
Look for annotations that do not include a single stopword