LuceneStandardTokenizer Class

public final class LuceneStandardTokenizer
extends LexicalTokenizer

Breaks text following the Unicode Text Segmentation rules. This tokenizer is implemented using Apache Lucene.

Constructor Summary

Constructor Description
LuceneStandardTokenizer(String name)

Constructor of LuceneStandardTokenizer.

Method Summary

Modifier and Type Method and Description
Integer getMaxTokenLength()

Get the maxTokenLength property: The maximum token length.

LuceneStandardTokenizer setMaxTokenLength(Integer maxTokenLength)

Set the maxTokenLength property: The maximum token length.

JsonWriter toJson(JsonWriter jsonWriter)

Methods inherited from LexicalTokenizer

Methods inherited from java.lang.Object

Constructor Details

LuceneStandardTokenizer

public LuceneStandardTokenizer(String name)

Constructor of LuceneStandardTokenizer.

Parameters:

name - The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

Method Details

getMaxTokenLength

public Integer getMaxTokenLength()

Get the maxTokenLength property: The maximum token length. Default is 255. Tokens longer than the maximum length are split.

Returns:

the maxTokenLength value.

setMaxTokenLength

public LuceneStandardTokenizer setMaxTokenLength(Integer maxTokenLength)

Set the maxTokenLength property: The maximum token length. Default is 255. Tokens longer than the maximum length are split.

Parameters:

maxTokenLength - the maxTokenLength value to set.

Returns:

the LuceneStandardTokenizer object itself.

toJson

public JsonWriter toJson(JsonWriter jsonWriter)

Overrides:

LuceneStandardTokenizer.toJson(JsonWriter jsonWriter)

Parameters:

jsonWriter

Throws:

Applies to