WordTokenizer

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.swabunga.spell.event
Interface WordTokenizer

All Known Implementing Classes:: AbstractWordTokenizer, DocumentWordTokenizer

public interface WordTokenizer

An interface for objects which take a text-based media as input, and iterate through the words in the text stored in that media. Examples of such media could be Strings, Documents, Files, TextComponents etc.

When the object is instantiated, and before the first call to next() is made, the following methods should throw a WordNotFoundException:
getCurrentWordEnd(), getCurrentWordPosition(), isNewSentence() and replaceWord().

A call to next() when hasMoreWords() returns false should throw a WordNotFoundException.

Author:: Jason Height ([email protected])

Method Summary
`java.lang.String`	`getContext()` Returns the context text that is being tokenized (should include any changes that have been made).
`int`	`getCurrentWordCount()` Returns the number of word tokens that have been processed thus far
`int`	`getCurrentWordEnd()` Returns an index representing the end location of the current word in the text.
`int`	`getCurrentWordPosition()` Returns an index representing the start location of the current word in the text.
`boolean`	`hasMoreWords()` Returns true if there are more words left
`boolean`	`isNewSentence()` Returns true if the current word is at the start of a sentence
`java.lang.String`	`nextWord()` This returns the next word in the iteration.
`void`	`replaceWord(java.lang.String newWord)` Replaces the current word token When a word is replaced care should be taken that the WordTokenizer repositions itself such that the words that were added aren't rechecked.

Method Detail

getContext

public java.lang.String getContext()

Returns the context text that is being tokenized (should include any changes that have been made).

Returns:: the text being searched.

getCurrentWordCount

public int getCurrentWordCount()

Returns the number of word tokens that have been processed thus far

Returns:: the number of words found so far.

getCurrentWordEnd

public int getCurrentWordEnd()

Returns an index representing the end location of the current word in the text.

Returns:: index of the end of the current word in the text.
Throws:: WordNotFoundException - current word has not yet been set.

getCurrentWordPosition

public int getCurrentWordPosition()

Returns an index representing the start location of the current word in the text.

Returns:: index of the start of the current word in the text.
Throws:: WordNotFoundException - current word has not yet been set.

isNewSentence

public boolean isNewSentence()

Returns true if the current word is at the start of a sentence

Returns:: true if the current word starts a sentence.
Throws:: WordNotFoundException - current word has not yet been set.

hasMoreWords

public boolean hasMoreWords()

Returns true if there are more words left

Returns:: true if more words can be found in the text.

nextWord

public java.lang.String nextWord()

This returns the next word in the iteration. Note that any implementation should return the current word, and then replace the current word with the next word found in the input text (if one exists).

Returns:: the next word in the iteration.
Throws:: WordNotFoundException - search string contains no more words.

replaceWord

public void replaceWord(java.lang.String newWord)

Replaces the current word token

When a word is replaced care should be taken that the WordTokenizer repositions itself such that the words that were added aren't rechecked. Of course this is not mandatory, maybe there is a case when an application doesnt need to do this.

Parameters:: newWord - the string which should replace the current word.
Throws:: WordNotFoundException - current word has not yet been set.