Modifier and Type | Field and Description |
---|---|
static java.lang.String |
DEFAULT_POSTPUNCTUATION_SYMBOLS
A string containing the default post-punctuation characters.
|
static java.lang.String |
DEFAULT_PREPUNCTUATION_SYMBOLS
A string containing the default pre-punctuation characters.
|
static java.lang.String |
DEFAULT_SINGLE_CHAR_SYMBOLS
A string containing the default single characters.
|
static java.lang.String |
DEFAULT_WHITESPACE_SYMBOLS
A string containing the default whitespace characters.
|
static int |
EOF
A constant indicating that the end of the stream has been read.
|
Constructor and Description |
---|
CharTokenizer()
Constructs a Tokenizer.
|
CharTokenizer(java.io.Reader file)
Creates a tokenizer that will return tokens from the given file.
|
CharTokenizer(java.lang.String string)
Creates a tokenizer that will return tokens from the given string.
|
Modifier and Type | Method and Description |
---|---|
java.lang.String |
getErrorDescription()
if hasErrors returns
true , this will return a description
of the error encountered, otherwise it will return null |
boolean |
hasErrors()
Returns
true if there were errors while reading tokens |
boolean |
hasNext()
Returns
true if there are more tokens, false
otherwise. |
boolean |
isSentenceSeparator()
Determines if the current token should start a new sentence.
|
Token |
next()
Returns the next token.
|
void |
remove() |
void |
setInputReader(java.io.Reader reader)
Sets the input reader
|
void |
setInputText(java.lang.String inputString)
Sets the text to tokenize.
|
void |
setPostpunctuationSymbols(java.lang.String symbols)
Sets the postpunctuation symbols of this Tokenizer to the given symbols.
|
void |
setPrepunctuationSymbols(java.lang.String symbols)
Sets the prepunctuation symbols of this Tokenizer to the given symbols.
|
void |
setSingleCharSymbols(java.lang.String symbols)
Sets the single character symbols of this Tokenizer to the given
symbols.
|
void |
setWhitespaceSymbols(java.lang.String symbols)
Sets the whitespace symbols of this Tokenizer to the given symbols.
|
public static final int EOF
public static final java.lang.String DEFAULT_WHITESPACE_SYMBOLS
public static final java.lang.String DEFAULT_SINGLE_CHAR_SYMBOLS
public static final java.lang.String DEFAULT_PREPUNCTUATION_SYMBOLS
public static final java.lang.String DEFAULT_POSTPUNCTUATION_SYMBOLS
public CharTokenizer()
public CharTokenizer(java.lang.String string)
string
- the string to tokenizepublic CharTokenizer(java.io.Reader file)
file
- where to read the input frompublic void setWhitespaceSymbols(java.lang.String symbols)
symbols
- the whitespace symbolspublic void setSingleCharSymbols(java.lang.String symbols)
symbols
- the single character symbolspublic void setPrepunctuationSymbols(java.lang.String symbols)
symbols
- the prepunctuation symbolspublic void setPostpunctuationSymbols(java.lang.String symbols)
symbols
- the postpunctuation symbolspublic void setInputText(java.lang.String inputString)
inputString
- the string to tokenizepublic void setInputReader(java.io.Reader reader)
reader
- the input sourcepublic Token next()
next
in interface java.util.Iterator<Token>
null
if no more tokenspublic boolean hasNext()
true
if there are more tokens, false
otherwise.hasNext
in interface java.util.Iterator<Token>
true
if there are more tokens false
otherwisepublic void remove()
remove
in interface java.util.Iterator<Token>
public boolean hasErrors()
true
if there were errors while reading tokenstrue
if there were errors; false
otherwisepublic java.lang.String getErrorDescription()
true
, this will return a description
of the error encountered, otherwise it will return null
public boolean isSentenceSeparator()
true
if a new sentence should be started