|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.globalphasing.startools.StarTokeniser
public class StarTokeniser
Specialised STAR-oriented tokeniser analogous to
StringTokenizer
.
This class should not be compared to
StreamTokenizer
, since
StarTokeniser knows nothing about numeric types, case (in)sensitivity of data
values etc. In the (mm)CIF world, this information is contained in the
(mm)CIF dictionary, so should be handled by the parser itself, not the
tokeniser
N.B.Handling of very large files by this library has not been
characterised or tested in any way, and should not be relied on. Limitations
on handling such files in the current implementation arise from the standard
Java API itself, and include:
startMatching(java.io.File, boolean)
can memory map and
start matching a file longer than Integer.MAX_VALUE
characters (2Gb in character sets where 1 character is 1 byte). However,
the Matcher.start()
and
Matcher.end()
methods that are used by this library
return int
, and the behaviour of the
Matcher
class is unknown when matching continues
beyond Integer.MAX_VALUE
characters.
startMatching(java.io.File)
or
startMatching(java.io.File, int)
methods,
LineNumberReader.getLineNumber()
is used to keep track
of the line number in the file. This latter method returns an int
.
The behaviour of the LineNumberReader
class when handling
a file with more than Integer.MAX_VALUE
lines is unknown.
Field Summary | |
---|---|
static int |
ALLOW_BAD_CONSTRUCT
If set, nextToken() returns a StarToken with a token
type of StarTokenTypes.TOKEN_STRING
instead of StarTokenTypes.TOKEN_BAD_CONSTRUCT |
static int |
ALLOW_SQUARE_BRACKET
If set, nextToken() returns a StarToken with a token
type of StarTokenTypes.TOKEN_STRING
instead of StarTokenTypes.TOKEN_SQUARE_BRACKET |
Constructor Summary | |
---|---|
StarTokeniser()
Constructor for StarTokeniser class. |
|
StarTokeniser(int flags)
Constructor for StarTokeniser class. |
Method Summary | |
---|---|
int |
getDebugLevel()
Get debug level for this tokeniser. |
boolean |
hasMoreTokens()
Tests if there are more tokens available from this tokeniser. |
StarToken |
nextToken()
Get next STAR token. |
String |
quoteDataValue(CharSequence data,
boolean semicolon)
Checks whether the data parameter is a valid CIF value token. |
void |
setDebugLevel(int level)
Set debug level for this tokeniser. |
void |
startMatching(CharSequence data)
Prepare StarTokeniser instance to tokenise a string. |
void |
startMatching(File file)
Prepare StarTokeniser instance to tokenise the contents of a file. |
void |
startMatching(File file,
boolean map)
Prepare StarTokeniser instance to tokenise the contents of a file. |
void |
startMatching(File file,
int chunkSize)
Prepare StarTokeniser instance to tokenise the contents of a file. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int ALLOW_SQUARE_BRACKET
nextToken()
returns a StarToken
with a token
type of StarTokenTypes.TOKEN_STRING
instead of StarTokenTypes.TOKEN_SQUARE_BRACKET
public static final int ALLOW_BAD_CONSTRUCT
nextToken()
returns a StarToken
with a token
type of StarTokenTypes.TOKEN_STRING
instead of StarTokenTypes.TOKEN_BAD_CONSTRUCT
Constructor Detail |
---|
public StarTokeniser()
public StarTokeniser(int flags)
flags
- OR-ed list of flags to control the operation of the tokeniser.
Available flags are listed in the Field Summary.Method Detail |
---|
public int getDebugLevel()
public void setDebugLevel(int level)
level
- 0 means no debugging.public void startMatching(CharSequence data)
data
- CharSequence
containing STAR/CIF/mmCIF data
to be tokenisedpublic void startMatching(File file, boolean map)
StarRegex.REGEX
in a single operation.
file
- Instance of File
containing STAR datamap
- true
to use memory mapping/direct buffer.
false
to read contents of file into a buffer
and parse that. (Files longer than Integer.MAX_VALUE
bytes (2Gb) will always be memory mapped.)public void startMatching(File file)
startMatching(java.io.File, int)
with a chunk size
of 1.startMatching
methods, this one provides the most accurate line numbers for the
StarToken
instances that are produced. Consider using one of the
others if the I/O involved in reading STAR files becomes limiting.
file
- Instance of File
containing STAR datapublic void startMatching(File file, int chunkSize)
StarRegex.REGEX
in chunks.
file
- Instance of File
containing STAR datachunkSize
- Minimum number of lines in a chunk. The actual number of lines
may be greater than this, because a chunk will never end in
the middle of multi-line text, instead reading to the end of
the multiline text token.
This method should not be used with a chunkSize
parameter
of Integer.MAX_VALUE
in an attempt to process a large
file in a single operation, since a multi-line text token may cause an
attempt to read more than this number of lines. Use
startMatching(java.io.File, boolean)
to process the whole file
at once.public StarToken nextToken()
public boolean hasMoreTokens()
nextToken()
with
no argument will successfully return a token. true
if and
only if there is at least one token after the current position;
false
otherwise.
true
if and only if there is at least one token in
the string after the current position; false
otherwise.public String quoteDataValue(CharSequence data, boolean semicolon) throws IllegalArgumentException
data
parameter is a valid CIF value token.
It is returned if so, otherwise it attempts to turn it into one
first by double-quoting, then single-quoting, then
if semicolon
is set, by adding a ';'
at the start and
"\n;"
at the end.
It is the responsibility of the caller
to decide whether or not the returned value should be output in the first
column of a new line if it starts with a ;
character.
This method calls _startMatching(CharSequence)
, so it resets
the state of the tokeniser.
This method is provided to support the writing rather than the reading of mmCIF data. It is convenient to implement it using the Tokeniser though, which is why it is here and not in some other class.
data
- character sequence to be checkedsemicolon
- whether or not to try semi-colon quoting
data
, quoted if necessary to form a valid CIF token
IllegalArgumentException
- if data
cannot be turned into
a STAR data value token.
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |