CML/Golem dictionary syntax

Here, we’ll go through one entry from a very small CML/Golem dictionary, which you can find in your Golem distribution in GOLEM/docs/AnnotatedDictionaryEntries.xml.

<?xml version="1.0"?>
  <!--  This example's derived from a dictionary for the CASTEP
        code; you can find it at http://www.castep.org/ -->
<dictionary
  namespace="http://www.castep.org/castep/dictionary"
  dictionaryPrefix="castep"
  title="CASTEP Dictionary"
  xmlns="http://www.xml-cml.org/schema"
  xmlns:castep="http://www.castep.org/cml/dictionary/"
  xmlns:h="http://www.w3.org/1999/xhtml/"
  xmlns:cml="http://www.xml-cml.org/schema"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  xmlns:golem="http://www.lexical.org.uk/golem"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

You specify the three attributes above here - a URL for the dictionary (which you can think of as a machine-readable name; pick something unique at a domain you control), the dictionaryPrefix (short name for the dictionary), and the dictionary title (which is the title people using the dictionary’ll see.)

As mentioned earlier, the dictionary generator will add the namespaces for you, but if you’re writing your own dictionary for whatever reason, just copy them across as they are here.

<entry id="xcFunctional" term="Exchange-Correlation Functional">

The id must be unique within the dictionary. The term is designed for documentation, so should be as pithy as you can make it.

<definition> and <description>

Next, we deal with documenting what this dictionary entry means.

<definition>
  The exchange-correlation functional used.
</definition>

The definition is a one-sentence description of the concept this dictionary entry defines.

Next, <description> - note that h: here was defined, above, to be bound to the XHTML namespace:

<description>
    <h:div class="dictDescription">
      The exchange-correlation functional used in a given simulation.
      Available values for this are:
      <h:ul>
        <h:li>
          <h:strong>LDA</h:strong>, the Local Density Approximation
        </h:li>
        <h:li>
          <h:strong>PW91</h:strong>, Perdew and Wang's 1991 formulation
        </h:li>
        <h:li>
          <h:strong>PBE</h:strong>,
          Perdew, Burke and Enzerhof's original GGA functional
        </h:li>
        <h:li>
          <h:strong>RPBE</h:strong>, Hammer et al's revised PBE functional
        </h:li>
      </h:ul>
    </h:div>
</description>

<description> contains a longer description - ideally, documentation for the term. As mentioned above, this takes the form of XHTML.

<metadataList>
  <metadata name="dc:contributor" content="golem-kiln" />
</metadataList>

Optionally, <metadataList> - which contains CML <metadata> - can be used to document the provenance of a dictionary entry (for instance, who wrote it).

<golem:xpath>

Next, we need to tell our programs how to find the data this entry is describing; we do that by giving an XPath expression pointing to where it can be found.:

<!-- Where this concept can be found in CML documents which
use this dialect - usually inferred by the dictionary builder  -->
<golem:xpath>./cml:parameter[@dictRef="castep:xcFunctional"]</golem:xpath>

<golem:template>

Once we have found the data, we need to know to read it. Here, the data that we’re trying to read looks something like

<parameter dictRef="castep:xcFunctional" name="Exchange-Correlation Functional">
  <scalar dataType="xsd:string">PBE</scalar>
</parameter>

Here, we’re trying to read a scalar (a number or string). Golem templates use XSLT to convert pieces of CML, like this one, into JSON objects of the form [value, "u:units"], where u is the namespace in which the units are declared.

The golem dictionary generation tools “know about” - i.e., have templates for the following tags, assuming they are used in the same way as FoX uses them:

  • <scalar>
  • <array>
  • <matrix>
  • <cellParameter>
  • <metadata>
  • <lattice>
  • <atomArray>

So if the data you are reading is in one of these tags, then the following will let you read it:

<golem:template call="scalar" role="getvalue" binding="pygolem_serialization" />

Here, the value of call will, typically, correspond to the name of the tag which has the actual data in: so here it’s “scalar”.

If additional information (say, extra properties on each atom in an atomArray) is added, the read will still succeed, but the extra information will be ignored; either you will need to modify the template for that type, or arrange to read out that extra information in another way (such as using etree methods, as in the examples later in this documentation).

If your data is contained in some other tag, and you wish to read it directly using Golem, then you need to:

  • Write a dictionary entry with the tagname as the id (i.e. <entry id="newEntry" ...>);

  • Write an XSLT stylesheet which produces a JSON document of the form [ newEntry_value, "units:newEntry_units" ] when run over the data. For example, lattice vectors are represented by markup of the form:

    <lattice>
        <latticeVector>a b c</latticeVector>
        <latticeVector>d e f</latticeVector>
        <latticeVector>g h i</latticeVector>
    </lattice>
    
  • which we associate with a stylesheet

      <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:cml
    ="http://www.xml-cml.org/schema" xmlns:str="http://exslt.org/strings" version="1
    .0" extension-element-prefixes="str">
        <xsl:output method="text"/>
    
        <xsl:template match="/">
          <xsl:apply-templates/>
        </xsl:template>
    
        <xsl:template match="cml:lattice">
          <xsl:text>[[</xsl:text>
          <xsl:for-each select="cml:latticeVector">
            <xsl:text>[</xsl:text>
            <xsl:for-each select="str:tokenize(string(.), ' ')">
              <xsl:choose>
                <xsl:when test="position() != last()">
                  <xsl:value-of select="."/><xsl:text>,</xsl:text>
                </xsl:when>
                <xsl:otherwise>
                  <xsl:value-of select="."/>
                </xsl:otherwise>
              </xsl:choose>
            </xsl:for-each>
            <xsl:choose>
              <xsl:when test="position() != last()">
                <xsl:text>],</xsl:text>
              </xsl:when>
              <xsl:otherwise>
                <xsl:text>]</xsl:text>
              </xsl:otherwise>
            </xsl:choose>
          </xsl:for-each>
          <xsl:text>], "A**-1"]</xsl:text>
        </xsl:template>
      </xsl:stylesheet>
    
  • Put this in your dictionary entry. Here’s how you do that:

    <golem:template role="getvalue" binding="pygolem_serialization">
      <xsl:stylesheet>
        <!-- stylesheet goes here -->
      </xsl:stylesheet>
    </golem:template>
    
  • Add <golem:template call="newEntry" role="getvalue" binding="pygolem_serialization" /> to the dictionary entries which’ll use this new template to read their data.

The role of templates determines how they are used: all templates used for reading data with Golem should have role="getvalue" and binding="pygolem_serialization". This is the only special case in role; but you can add other templates with different roles, too. These get mapped onto functions if you’re using the Golem library: for instance, if you’ve got a dictionary d with namespace n, then (in a Python interactive shell):

>>> xcFunctional_entry = d["{%s}xcFunctional" % n]
>>> print str(xcFunctional_entry.arb_to_input("RPBE"))

will print out the value of this template when passed “RPBE” as an argument.

<!--
Arguments are labelled p1, p2... pn in the template; if you're writing
a template to which you're going to pass arguments, you need to set
the @binding attribute to "input", as here, and @input to "external"
(meaning it takes externally-provided arguments, not XML).
-->

<golem:template role="arb_to_input" binding="input" input="external">
  <xsl:stylesheet version='1.0'
                  xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
                  xmlns:cml='http://www.xml-cml.org/schema'>
    <xsl:strip-space elements="*" />
    <xsl:output method="text" />
    <xsl:param name="p1" />
    <xsl:template match="/">
      <xsl:text>XC_FUNCTIONAL </xsl:text><xsl:value-of select="$p1" />
    </xsl:template>
  </xsl:stylesheet>
</golem:template>

<golem:synonym> et al

In this section, we describe the terms which can be used to draw relations between concepts in dictionaries. Those relationships can then be used by the Golem library to enable equivalent concepts to be looked up at once, rather than having to check them all separately.

First, <golem:implements>; if entry A implements entry B, then if a piece of CML satisfies the definition of term B, it also satisfies the definition of term A. In other words, term A is an implementation (analogous to a subclass) of term B.

<golem:implements>convertibleToInput</golem:implements>
<golem:implements>value</golem:implements>
<golem:implements namespace="http://www.example.com/example/">absolute</golem:implements>

In the final case here, the term absolute resides in a different dictionary with the given namespace; to be able to make use of this in your code, your program will need to load both dictionaries.

If two concepts are synonymous, then any instance of one concept is equivalent to an instance of the other, although they may be serialized differently. To implement that in a dictionary, add the following to concept1‘s dictionary entry:

<golem:synonym>concept2</golem:synonym>

Or if concept2 resides in a different dictionary with a different namespace:

<golem:synonym namespace=”http://namespace2/”>concept2</golem:synonym>

although, as above, you will not be able to make use of this relationship without explicitly loading the second dictionary. Synonyms are symmetric, unlike implements; stating that a implements b does not imply that b implements a, whereas stating that a is synonymous with b does imply that b is synonymous with a. It is sufficient to specify the synonym on either one of these concepts; it doesn’t need to be given on both.

In both cases, when searching for a concept using the findin method (in the following section on how to use the Golem libraries), all synonyms and implementations of the current concept are found.

<golem:seeAlso> is used to denote any other relationship between concepts; it does not imply any particular relationship, but alerts the user that it may be worth looking at the other entry. This is mostly of use in dictionary-browsing applications and similar tools, where it can be used (for instance) to implement a thesaurus.

We may also want to represent certain aspects of document structure in the dictionary. This is particularly useful when you are wanting to evaluate “everything in a section of the document”; for example, “every input parameter”. We can denote these relationships using <golem:childOf>.

A <golem:childOf> is found as a childNode (in the XML sense) of the CML representation of another node in the dictionary:

<golem:childOf>input</golem:childOf>

So, here, xcFunctional is found in the child nodes of the CML representation of the dictionary term input.

<golem:possibleValues>

The range, and type, of data one expects for a given concept can be given with <golem:possibleValues>.

<golem:possibleValues type="string">
  <golem:enumeration> <!--  contains the possible values for this concept -->
    <golem:value>LDA</golem:value> <!--  and one of these for each value -->
    <golem:value>PW91</golem:value>
    <golem:value>PBE</golem:value>
    <golem:value>RPBE</golem:value>
  </golem:enumeration>
</golem:possibleValues>

The type of data may be int, float, string, or matrix. int``s (and analogously ``floats) are specified as follows:

<golem:possibleValues type="int">
  <golem:range>
    <golem:minimum>2</golem:minimum>
  </golem:range>
</golem:possibleValues>

Matrices are a little more complex: you can specify both the dimension of the matrix and the type of the data therein. For example:

<golem:possibleValues type="matrix">
    <!--  The data type of the matrix *elements* goes in here -
    so this is a matrix of floats... -->
  <golem:matrix dimensionx="3" dimensiony="3" type="float" symmetric="false"/>
  <!--  The @symmetric element on golem:matrix specifies whether the
  matrix is symmetric. If it is, only the upper diagonal elements
  should be given; if not, the full matrix is expected. -->

  <!-- If you give a <golem:range> here, it applies to all the
        elements in the matrix: -->
   <golem:range>
     <golem:minimum>0</golem:minimum>
     <golem:maximum>10</golem:maximum>
   </golem:range>

   <!-- constrains the matrix elements to lie in the range
        0 <= element <= 10. -->
</golem:possibleValues>