com.globalphasing.startools
Class StarTokeniser

java.lang.Object
  extended by com.globalphasing.startools.StarTokeniser

public class StarTokeniser
extends Object

Specialised STAR-oriented tokeniser analogous to StringTokenizer. This class should not be compared to StreamTokenizer, since StarTokeniser knows nothing about numeric types, case (in)sensitivity of data values etc. In the (mm)CIF world, this information is contained in the (mm)CIF dictionary, so should be handled by the parser itself, not the tokeniser
N.B.Handling of very large files by this library has not been characterised or tested in any way, and should not be relied on. Limitations on handling such files in the current implementation arise from the standard Java API itself, and include:

Author:
Peter Keller

Field Summary
static int ALLOW_BAD_CONSTRUCT
          If set, nextToken() returns a StarToken with a token type of StarTokenTypes.TOKEN_STRING instead of StarTokenTypes.TOKEN_BAD_CONSTRUCT
static int ALLOW_SQUARE_BRACKET
          If set, nextToken() returns a StarToken with a token type of StarTokenTypes.TOKEN_STRING instead of StarTokenTypes.TOKEN_SQUARE_BRACKET
 
Constructor Summary
StarTokeniser()
          Constructor for StarTokeniser class.
StarTokeniser(int flags)
          Constructor for StarTokeniser class.
 
Method Summary
 int getDebugLevel()
          Get debug level for this tokeniser.
 boolean hasMoreTokens()
          Tests if there are more tokens available from this tokeniser.
 StarToken nextToken()
          Get next STAR token.
 String quoteDataValue(CharSequence data, boolean semicolon)
          Checks whether the data parameter is a valid CIF value token.
 void setDebugLevel(int level)
          Set debug level for this tokeniser.
 void startMatching(CharSequence data)
          Prepare StarTokeniser instance to tokenise a string.
 void startMatching(File file)
          Prepare StarTokeniser instance to tokenise the contents of a file.
 void startMatching(File file, boolean map)
          Prepare StarTokeniser instance to tokenise the contents of a file.
 void startMatching(File file, int chunkSize)
          Prepare StarTokeniser instance to tokenise the contents of a file.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ALLOW_SQUARE_BRACKET

public static final int ALLOW_SQUARE_BRACKET
If set, nextToken() returns a StarToken with a token type of StarTokenTypes.TOKEN_STRING instead of StarTokenTypes.TOKEN_SQUARE_BRACKET

See Also:
Constant Field Values

ALLOW_BAD_CONSTRUCT

public static final int ALLOW_BAD_CONSTRUCT
If set, nextToken() returns a StarToken with a token type of StarTokenTypes.TOKEN_STRING instead of StarTokenTypes.TOKEN_BAD_CONSTRUCT

See Also:
Constant Field Values
Constructor Detail

StarTokeniser

public StarTokeniser()
Constructor for StarTokeniser class.


StarTokeniser

public StarTokeniser(int flags)
Constructor for StarTokeniser class.

Parameters:
flags - OR-ed list of flags to control the operation of the tokeniser. Available flags are listed in the Field Summary.
Method Detail

getDebugLevel

public int getDebugLevel()
Get debug level for this tokeniser.

Returns:
debug level

setDebugLevel

public void setDebugLevel(int level)
Set debug level for this tokeniser.

Parameters:
level - 0 means no debugging.

startMatching

public void startMatching(CharSequence data)
Prepare StarTokeniser instance to tokenise a string. This may be called on a StarTokeniser instance that has already been used to match other data: the previous state will be lost and the object will reset with the new data.

Parameters:
data - CharSequence containing STAR/CIF/mmCIF data to be tokenised

startMatching

public void startMatching(File file,
                          boolean map)
Prepare StarTokeniser instance to tokenise the contents of a file. This may be called on a StarTokeniser instance that has already been used to match other data: the previous state will be lost and the object will reset with the new data. Using this method will cause the entire file contents to be matched against StarRegex.REGEX in a single operation.

Parameters:
file - Instance of File containing STAR data
map - true to use memory mapping/direct buffer. false to read contents of file into a buffer and parse that. (Files longer than Integer.MAX_VALUE bytes (2Gb) will always be memory mapped.)

startMatching

public void startMatching(File file)
Prepare StarTokeniser instance to tokenise the contents of a file. This may be called on a StarTokeniser instance that has already been used to match other data: the previous state will be lost and the object will reset with the new data.
This method invokes startMatching(java.io.File, int) with a chunk size of 1.
Normally, this is the most appropriate method to use to tokenise the contents of a file, because of all the startMatching methods, this one provides the most accurate line numbers for the StarToken instances that are produced. Consider using one of the others if the I/O involved in reading STAR files becomes limiting.

Parameters:
file - Instance of File containing STAR data

startMatching

public void startMatching(File file,
                          int chunkSize)
Prepare StarTokeniser instance to tokenise the contents of a file. This may be called on a StarTokeniser instance that has already been used to match other data: the previous state will be lost and the object will reset with the new data. Using this method will cause lines to be read from the file and matched against StarRegex.REGEX in chunks.

Parameters:
file - Instance of File containing STAR data
chunkSize - Minimum number of lines in a chunk. The actual number of lines may be greater than this, because a chunk will never end in the middle of multi-line text, instead reading to the end of the multiline text token. This method should not be used with a chunkSize parameter of Integer.MAX_VALUE in an attempt to process a large file in a single operation, since a multi-line text token may cause an attempt to read more than this number of lines. Use startMatching(java.io.File, boolean) to process the whole file at once.

nextToken

public StarToken nextToken()
Get next STAR token.

Returns:
Returns null if no more tokens found

hasMoreTokens

public boolean hasMoreTokens()
Tests if there are more tokens available from this tokeniser. If this method returns true, then a subsequent call to nextToken() with no argument will successfully return a token. true if and only if there is at least one token after the current position; false otherwise.

Returns:
true if and only if there is at least one token in the string after the current position; false otherwise.

quoteDataValue

public String quoteDataValue(CharSequence data,
                             boolean semicolon)
                      throws IllegalArgumentException
Checks whether the data parameter is a valid CIF value token. It is returned if so, otherwise it attempts to turn it into one first by double-quoting, then single-quoting, then if semicolon is set, by adding a ';' at the start and "\n;" at the end. It is the responsibility of the caller to decide whether or not the returned value should be output in the first column of a new line if it starts with a ; character.

This method calls _startMatching(CharSequence), so it resets the state of the tokeniser.

This method is provided to support the writing rather than the reading of mmCIF data. It is convenient to implement it using the Tokeniser though, which is why it is here and not in some other class.

Parameters:
data - character sequence to be checked
semicolon - whether or not to try semi-colon quoting
Returns:
data, quoted if necessary to form a valid CIF token
Throws:
IllegalArgumentException - if data cannot be turned into a STAR data value token.


Copyright and Licence