com.globalphasing.startools
Class StarTokenTypes

java.lang.Object
  extended by com.globalphasing.startools.StarTokenTypes

public class StarTokenTypes
extends Object

Class containing static final constants required for tokenising STAR data, and static methods for interpreting those constants.

Author:
Peter Keller

Field Summary
static int EOF
          Symbolic constant representing the last line of a file or character data.
static int TOKEN_BAD_CONSTRUCT
          Token: Token that violates STAR rules on privileged constructs.
static int TOKEN_BAD_TOKEN
          Token: Catch-all token for a sequence of non-whitespace characters.
static int TOKEN_COMMENT
          Token: Comment
static int TOKEN_DATA_BLOCK
          Token: Data block header
static int TOKEN_DATA_NAME
          Token: Data name
static int TOKEN_DQUOTE_STRING
          Token: Double-quoted string
static int TOKEN_GLOBAL
          Token: STAR global block (forbidden in CIF's).
static int TOKEN_LOOP
          Token: Loop initiator
static int TOKEN_LOOP_STOP
          Token: Loop terminator (forbidden in CIF's)
static int TOKEN_MULTILINE
          Token: Multiline text.
static int TOKEN_NULL
          Token: CIF null value (unquoted . character)
static int TOKEN_SAVE_FRAME
          Token: STAR save frame header or terminator (forbidden in CIF's)
static int TOKEN_SAVE_FRAME_REF
          Token: STAR save frame reference (forbidden in CIF's)
static int TOKEN_SQUARE_BRACKET
          Token: Token that starts with [ or ].
static int TOKEN_SQUOTE_STRING
          Token: Single-quoted string
static int TOKEN_STRING
          Token: Non-quoted string
static int TOKEN_UNKNOWN
          Token: CIF unknown value.
 
Method Summary
static boolean dataToken(int token_type)
          Returns true if token_type represents one of the tokens that represents a STAR data value (as opposed to a data block header, data name, etc.)
static boolean starErrorToken(int token_type)
          Returns true if token_type represents a token that is syntactically invalid according to the STAR standard.
static boolean starOnlyToken(int token_type)
          Returns true if token_type represents a valid STAR token, but not allowed in CIF's or mmCIF's.
static String tokenTypeAsString(int token_type)
          Returns the string representation of a numerical token type
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

TOKEN_MULTILINE

public static final int TOKEN_MULTILINE
Token: Multiline text.

See Also:
Constant Field Values

TOKEN_COMMENT

public static final int TOKEN_COMMENT
Token: Comment

See Also:
Constant Field Values

TOKEN_GLOBAL

public static final int TOKEN_GLOBAL
Token: STAR global block (forbidden in CIF's).

See Also:
Constant Field Values

TOKEN_SAVE_FRAME

public static final int TOKEN_SAVE_FRAME
Token: STAR save frame header or terminator (forbidden in CIF's)

See Also:
Constant Field Values

TOKEN_SAVE_FRAME_REF

public static final int TOKEN_SAVE_FRAME_REF
Token: STAR save frame reference (forbidden in CIF's)

See Also:
Constant Field Values

TOKEN_LOOP_STOP

public static final int TOKEN_LOOP_STOP
Token: Loop terminator (forbidden in CIF's)

See Also:
Constant Field Values

TOKEN_DATA_BLOCK

public static final int TOKEN_DATA_BLOCK
Token: Data block header

See Also:
Constant Field Values

TOKEN_LOOP

public static final int TOKEN_LOOP
Token: Loop initiator

See Also:
Constant Field Values

TOKEN_BAD_CONSTRUCT

public static final int TOKEN_BAD_CONSTRUCT
Token: Token that violates STAR rules on privileged constructs.
The STAR standard specifies that all unquoted strings that start with one of the following character sequences are privileged constructs:

Of these, data_ must not be followed by whitespace, and loop_, global_ and stop_ must be followed by whitespace. (save_ may or may not be followed by whitespace.) The production of this token indicates that these rules have been violated.

See: Specification of the STAR file, Hall, S.R. and Spadaccini, N., Int. Tab. vol. G § 2.1.3.10 (Springer 2005)

On the other hand, the CIF specification implies that unquoted tokens that start with one of loop_, global_ or stop_ and are followed by at least one more non-whitespace character may be treated as non-quoted string values

See: Specification of the Crystallographic Information File (CIF), Hall, S.R., Westbrook, J.D. et al., Int. Tab. vol. G § 2.2.7.1.4, paragraphs (9), (10) and (11)

By default, an instance of StarTokeniser returns this token type for such a bad construct. Set StarTokeniser.ALLOW_BAD_CONSTRUCT in the StarTokeniser.StarTokeniser(int) constructor to return TOKEN_STRING instead for non-quoted strings that start with loop_, global_ or stop_.

See Also:
Constant Field Values

TOKEN_DATA_NAME

public static final int TOKEN_DATA_NAME
Token: Data name

See Also:
Constant Field Values

TOKEN_SQUOTE_STRING

public static final int TOKEN_SQUOTE_STRING
Token: Single-quoted string

See Also:
Constant Field Values

TOKEN_DQUOTE_STRING

public static final int TOKEN_DQUOTE_STRING
Token: Double-quoted string

See Also:
Constant Field Values

TOKEN_NULL

public static final int TOKEN_NULL
Token: CIF null value (unquoted . character)

See Also:
Constant Field Values

TOKEN_UNKNOWN

public static final int TOKEN_UNKNOWN
Token: CIF unknown value. This production means that the token consists of an unquoted ? character, which the CIF standard defines as representing an unknown value. It does not mean that the contents of the token couldn't be determined.

See Also:
Constant Field Values

TOKEN_SQUARE_BRACKET

public static final int TOKEN_SQUARE_BRACKET
Token: Token that starts with [ or ].

These characters are reserved in the CIF specification for future use in delimiting multi-line text. It is unclear how or if the STAR specification would change if this notation was brought into use, or if the mmCIF standard would also change as a result.

By default, an instance of StarTokeniser will return this token type for a token that starts with [ or ].

Set StarTokeniser.ALLOW_SQUARE_BRACKET in the StarTokeniser.StarTokeniser(int) constructor to return TOKEN_STRING instead.

See: Specification of the Crystallographic Information File (CIF), Hall, S.R., Westbrook, J.D. et al., Int. Tab. vol. G § 2.2.7.1.4(19).

See Also:
Constant Field Values

TOKEN_STRING

public static final int TOKEN_STRING
Token: Non-quoted string

See Also:
Constant Field Values

TOKEN_BAD_TOKEN

public static final int TOKEN_BAD_TOKEN
Token: Catch-all token for a sequence of non-whitespace characters.

If this token is returned, it means that none of the other tokens produced a match. This indicates a STAR syntax error, such as a single/double quoted string or multi-line text token that is missing its closing delimiter.

See Also:
Constant Field Values

EOF

public static final int EOF
Symbolic constant representing the last line of a file or character data.

See Also:
Constant Field Values
Method Detail

tokenTypeAsString

public static String tokenTypeAsString(int token_type)
Returns the string representation of a numerical token type

Parameters:
token_type - numerical token type
Returns:
string representation of token type

starOnlyToken

public static boolean starOnlyToken(int token_type)
Returns true if token_type represents a valid STAR token, but not allowed in CIF's or mmCIF's.

Parameters:
token_type -
Returns:
Whether or not token_type is only valid in a STAR context.

dataToken

public static boolean dataToken(int token_type)
Returns true if token_type represents one of the tokens that represents a STAR data value (as opposed to a data block header, data name, etc.)

Parameters:
token_type -
Returns:
Whether or not token_type is a data value

starErrorToken

public static boolean starErrorToken(int token_type)
Returns true if token_type represents a token that is syntactically invalid according to the STAR standard.

Parameters:
token_type -
Returns:
whether or not token violates STAR syntax


Copyright and Licence