com.swabunga.spell.event
Interface WordTokenizer

All Known Implementing Classes:
AbstractWordTokenizer, DocumentWordTokenizer

public interface WordTokenizer

An interface for objects which take a text-based media as input, and iterate through the words in the text stored in that media. Examples of such media could be Strings, Documents, Files, TextComponents etc.

When the object is instantiated, and before the first call to next() is made, the following methods should throw a WordNotFoundException:
getCurrentWordEnd(), getCurrentWordPosition(), isNewSentence() and replaceWord().

A call to next() when hasMoreWords() returns false should throw a WordNotFoundException.

Author:
Jason Height ([email protected])

Method Summary
 java.lang.String getContext()
          Returns the context text that is being tokenized (should include any changes that have been made).
 int getCurrentWordCount()
          Returns the number of word tokens that have been processed thus far
 int getCurrentWordEnd()
          Returns an index representing the end location of the current word in the text.
 int getCurrentWordPosition()
          Returns an index representing the start location of the current word in the text.
 boolean hasMoreWords()
          Returns true if there are more words left
 boolean isNewSentence()
          Returns true if the current word is at the start of a sentence
 java.lang.String nextWord()
          This returns the next word in the iteration.
 void replaceWord(java.lang.String newWord)
          Replaces the current word token

When a word is replaced care should be taken that the WordTokenizer repositions itself such that the words that were added aren't rechecked.

 

Method Detail

getContext

public java.lang.String getContext()
Returns the context text that is being tokenized (should include any changes that have been made).

Returns:
the text being searched.

getCurrentWordCount

public int getCurrentWordCount()
Returns the number of word tokens that have been processed thus far

Returns:
the number of words found so far.

getCurrentWordEnd

public int getCurrentWordEnd()
Returns an index representing the end location of the current word in the text.

Returns:
index of the end of the current word in the text.
Throws:
WordNotFoundException - current word has not yet been set.

getCurrentWordPosition

public int getCurrentWordPosition()
Returns an index representing the start location of the current word in the text.

Returns:
index of the start of the current word in the text.
Throws:
WordNotFoundException - current word has not yet been set.

isNewSentence

public boolean isNewSentence()
Returns true if the current word is at the start of a sentence

Returns:
true if the current word starts a sentence.
Throws:
WordNotFoundException - current word has not yet been set.

hasMoreWords

public boolean hasMoreWords()
Returns true if there are more words left

Returns:
true if more words can be found in the text.

nextWord

public java.lang.String nextWord()
This returns the next word in the iteration. Note that any implementation should return the current word, and then replace the current word with the next word found in the input text (if one exists).

Returns:
the next word in the iteration.
Throws:
WordNotFoundException - search string contains no more words.

replaceWord

public void replaceWord(java.lang.String newWord)
Replaces the current word token

When a word is replaced care should be taken that the WordTokenizer repositions itself such that the words that were added aren't rechecked. Of course this is not mandatory, maybe there is a case when an application doesnt need to do this.

Parameters:
newWord - the string which should replace the current word.
Throws:
WordNotFoundException - current word has not yet been set.