org.jmol.adapter.readers.cifpdb
Class CifReader.RidiculousFileFormatTokenizer

java.lang.Object
  extended by org.jmol.adapter.readers.cifpdb.CifReader.RidiculousFileFormatTokenizer
Enclosing class:
CifReader

 class CifReader.RidiculousFileFormatTokenizer
extends java.lang.Object

A special tokenizer class for dealing with quoted strings in CIF files.

regarding the treatment of single quotes vs. primes in cif file, PMR wrote:

* There is a formal grammar for CIF (see http://www.iucr.org/iucr-top/cif/index.html) which confirms this. The textual explanation is

14. Matching single or double quote characters (' or ") may be used to bound a string representing a non-simple data value provided the string does not extend over more than one line.

15. Because data values are invariably separated from other tokens in the file by white space, such a quote-delimited character string may contain instances of the character used to delimit the string provided they are not followed by white space. For example, the data item _example 'a dog's life' is legal; the data value is a dog's life.

[PMR - the terminating character(s) are quote+whitespace. That would mean that: _example 'Jones' life' would be an error

The CIF format was developed in that late 1980's under the aegis of the International Union of Crystallography (I am a consultant to the COMCIFs committee). It was ratified by the Union and there have been several workshops. mmCIF is an extension of CIF which includes a relational structure. The formal publications are:

Hall, S. R. (1991). "The STAR File: A New Format for Electronic Data Transfer and Archiving", J. Chem. Inform. Comp. Sci., 31, 326-333. Hall, S. R., Allen, F. H. and Brown, I. D. (1991). "The Crystallographic Information File (CIF): A New Standard Archive File for Crystallography", Acta Cryst., A47, 655-685. Hall, S.R. & Spadaccini, N. (1994). "The STAR File: Detailed Specifications," J. Chem. Info. Comp. Sci., 34, 505-508.


Field Summary
(package private)  int cch
           
(package private)  int ich
           
(package private)  int ichPeeked
           
(package private)  java.lang.String str
           
(package private)  java.lang.String strPeeked
           
(package private)  boolean wasUnQuoted
           
 
Constructor Summary
CifReader.RidiculousFileFormatTokenizer()
           
 
Method Summary
(package private)  java.lang.String fullTrim(java.lang.String str)
          specially for names that might be multiline
(package private)  boolean getData()
          general reader for loop data fills loopData with fieldCount fields
(package private)  java.lang.String getNextDataToken()
          first checks to see if the next token is an unquoted control code, and if so, returns null
(package private)  java.lang.String getNextToken()
           
(package private)  java.lang.String getTokenPeeked()
           
(package private)  boolean hasMoreTokens()
           
(package private)  java.lang.String nextToken()
          assume that hasMoreTokens() has been called and that ich is pointing at a non-white character.
(package private)  java.lang.String peekToken()
          just look at the next token.
private  void setString(java.lang.String str)
          sets a string to be parsed from the beginning
(package private)  java.lang.String setStringNextLine()
          sets the string for parsing to be from the next line when the token buffer is empty, and if ';' is at the beginning of that line, extends the string to include that full multiline string.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

str

java.lang.String str

ich

int ich

cch

int cch

wasUnQuoted

boolean wasUnQuoted

strPeeked

java.lang.String strPeeked

ichPeeked

int ichPeeked
Constructor Detail

CifReader.RidiculousFileFormatTokenizer

CifReader.RidiculousFileFormatTokenizer()
Method Detail

setString

private void setString(java.lang.String str)
sets a string to be parsed from the beginning

Parameters:
str -

setStringNextLine

java.lang.String setStringNextLine()
                             throws java.lang.Exception
sets the string for parsing to be from the next line when the token buffer is empty, and if ';' is at the beginning of that line, extends the string to include that full multiline string. Uses \1 to indicate that this is a special quotation.

Returns:
the next line or null if EOF
Throws:
java.lang.Exception

hasMoreTokens

boolean hasMoreTokens()
Returns:
TRUE if there are more tokens in the line buffer

nextToken

java.lang.String nextToken()
assume that hasMoreTokens() has been called and that ich is pointing at a non-white character. Also sets boolean wasUnQuoted, because we need to know if we should be checking for a control keyword. 'loop_' is different from just loop_ without the quotes.

Returns:
null if no more tokens, "\0" if '.' or '?', or next token

getData

boolean getData()
          throws java.lang.Exception
general reader for loop data fills loopData with fieldCount fields

Returns:
false if EOF
Throws:
java.lang.Exception

getNextDataToken

java.lang.String getNextDataToken()
                            throws java.lang.Exception
first checks to see if the next token is an unquoted control code, and if so, returns null

Returns:
next data token or null
Throws:
java.lang.Exception

getNextToken

java.lang.String getNextToken()
                        throws java.lang.Exception
Returns:
the next token of any kind, or null
Throws:
java.lang.Exception

peekToken

java.lang.String peekToken()
                     throws java.lang.Exception
just look at the next token. Saves it for retrieval using getTokenPeeked()

Returns:
next token or null if EOF
Throws:
java.lang.Exception

getTokenPeeked

java.lang.String getTokenPeeked()
Returns:
the token last acquired; may be null

fullTrim

java.lang.String fullTrim(java.lang.String str)
specially for names that might be multiline

Parameters:
str -
Returns:
str without any leading/trailing white space, and no '\n'