public interface Word extends WordSequenceContent
Encapsulates a natural language word and provides information about hyphenation break possibilities in that a word. There are actually two APIs presented in this interface, one that gives access to primitive hyphenation data, and the other that provides access to content of the various layout options for the word.
The primitive methods expose a conceptual array of hyphenation points for the encapsulated word.
The count of those points is obtained from qtyHyphenationPoints(HyphenationFilter)
.
The remaining methods expose some piece of the content of an array element.
The method getOffset(HyphenationFilter, int)
indicates the offset to the character in the word at which
the break information exists.
The methods getWeight(HyphenationFilter, int)
, getLiangWeight(HyphenationFilter, int)
, and
getMutatingBreak(HyphenationFilter, int)
, provide information about the break itself.
The layout-oriented methods expose a conceptual array of layout options, each of which has one or more segments
(i.e. parts of the word). The segments in turn expose the chars that they contain.
Computing and showing the actual layout options gives client applications the ability to loop through the options,
segments, and segment content without having to compute sizes, provide for the special case of using the whole word,
or making any transformations needed by MutatingHyphenationBreak
s.
Design Note: It is tempting to make the hyphenation point more object-oriented, with a class of its own. This might make this data slightly easier to use, but at what could be a substantial cost in memory consumption and performance. Since hyphenation data wants to be static, effort has been made to allow implementations as much flexibility as possible in how they store and use that data.
Modifier and Type | Method and Description |
---|---|
int |
getLiangWeight(HyphenationFilter filter,
int pointIndex)
For a given hyphenation point, returns the Liang-style weight of that hyphenation point.
|
MutatingHyphenationBreak |
getMutatingBreak(HyphenationFilter filter,
int pointIndex)
Provides detailed information about a hyphenation break opportunity that, if taken, changes the content of the
word.
|
java.lang.CharSequence |
getNormalizedWord()
Returns the normalized character content of the encapsulated word.
|
int |
getOffset(HyphenationFilter filter,
int pointIndex)
For a given hyphenation point, returns the offset into the word characters where that point exists.
|
char |
getSegmentChar(HyphenationFilter filter,
int layoutOptionIndex,
int segmentIndex,
int charIndex)
Returns a given char within a segment.
|
int |
getSegmentLength(HyphenationFilter filter,
int layoutOptionIndex,
int segmentIndex)
Returns the number of chars in a given segment.
|
int |
getWeight(HyphenationFilter filter,
int pointIndex)
For a given hyphenation point, returns the weight of that hyphenation point.
|
int |
qtyHyphenationPoints(HyphenationFilter filter)
Returns the number of hyphenation points in the word, as filtered.
|
int |
qtyLayoutOptions(HyphenationFilter filter)
Returns the number of feasible layout options in the word, assuming that the word can be broken no more than
once.
|
int |
qtySegments(HyphenationFilter filter,
int layoutOptionIndex)
Returns the number of segments for a given layout option.
|
java.lang.CharSequence getNormalizedWord()
MutatingHyphenationBreak
instances in the
word.
The normalized form is the one which is all lowercase.int qtyHyphenationPoints(HyphenationFilter filter)
filter
- The filter to be used to determine which hyphenation points are of interest.int getOffset(HyphenationFilter filter, int pointIndex)
filter
- The filter to be used to determine which hyphenation points are of interest.pointIndex
- The index into the conceptual array of hyphenation points for this word.int getWeight(HyphenationFilter filter, int pointIndex)
filter
- The filter to be used to determine which hyphenation points are of interest.pointIndex
- The index into the conceptual array of hyphenation points for this word.getLiangWeight(HyphenationFilter, int)
int getLiangWeight(HyphenationFilter filter, int pointIndex)
For a given hyphenation point, returns the Liang-style weight of that hyphenation point.
This provides the same information as getWeight(HyphenationFilter, int)
, except the values returned use
the Liang input scheme.
The Liang input scheme uses values between 0 and 5 (we exclude the 0 values since they are implied).
We have expanded the range to between 0 and 9.
Odd values indicate possible hyphenation points, even values (including 0) indicate points that should not be
hyphenated.
Higher numbers indicate a greater magnitude of "goodness" for odd numbers, and a greater magnitude of "badness"
for even numbers.
This method is included for backward compatibility with systems that may already be dependent on the Liang input
scheme.
(The Liang input scheme was probably optimized for input and point selection efficiency, but is somewhat
counter-intuitive for an application that is evaluating the results).
The following table compares the values returned by the two methods. Note that, except for the difference between "good" and "bad", the various values have no absolute meanings, but only meanings relative to the others. Implementations are free to use some subset of the possible return values. For example, some implementations may return Liang values in the range 0 to 5, and others may return them in the range 0 to 9.
Description | Weight | Liang Input Scheme |
---|---|---|
better than below | 5 | 9 |
better than below | 4 | 7 |
better than below | 3 | 5 |
better than below | 2 | 3 |
allowable | 1 | 1 |
avoid | 0 | 0 |
worse than above | -1 | 2 |
worse than above | -2 | 4 |
worse than above | -3 | 6 |
worse than above | -4 | 8 |
filter
- The filter to be used to determine which hyphenation points are of interest.pointIndex
- The index into the conceptual array of hyphenation points for this word.getWeight(HyphenationFilter, int)
MutatingHyphenationBreak getMutatingBreak(HyphenationFilter filter, int pointIndex)
getOffset(HyphenationFilter, int)
and values (provided by
getWeight(HyphenationFilter, int)
and getLiangWeight(HyphenationFilter, int)
are sufficient to
tell client applications everything they need to make hyphenation decisions.
MutatingHyphenationBreak
provides extra information for certain "hard" cases.filter
- The filter to be used to determine which hyphenation points are of interest.pointIndex
- The index into the conceptual array of hyphenation points for this word.MutatingHyphenationBreak
instance for the hyphen break opportunity at [pointIndex
, or
null if there is none.int qtyLayoutOptions(HyphenationFilter filter)
filter
- The filter to be used to determine which hyphenation points are of interest.int qtySegments(HyphenationFilter filter, int layoutOptionIndex)
filter
- The filter to be used to determine which hyphenation points are of interest.layoutOptionIndex
- The index to the layout option being queried.int getSegmentLength(HyphenationFilter filter, int layoutOptionIndex, int segmentIndex)
filter
- The filter to be used to determine which hyphenation points are of interest.layoutOptionIndex
- The index to the layout option being queried.segmentIndex
- The index to the segment (within the layout option) that is being queried.char getSegmentChar(HyphenationFilter filter, int layoutOptionIndex, int segmentIndex, int charIndex)
filter
- The filter to be used to determine which hyphenation points are of interest.layoutOptionIndex
- The index to the layout option being queried.segmentIndex
- The index to the segment (within the layout option) that is being queried.charIndex
- The index to the char (within the segment) that is being queried.charIndex
in the segment.This documentation was created 2017-01-24 at 21:26 GMT by The aXSL Group and may be freely copied. See license for details.