Using the Golem library

In this chapter, we move from considering the Golem markup language to the Golem library. If you aren’t familiar with the Python programming language, it’ll be helpful to have a look at the Python tutorial (http://www.python.org/doc/tut).

Any code samples starting with:

>>>

are example interactive sessions; you can start a Python interactive shell by simply running python.

Importing Golem and loading a dictionary

If you have installed Golem using setup.py/setuptools/easy_install

Golem will be on your PYTHONPATH, so you can simply use import golem in your script.

If you have installed Golem using make (Unix/MacOS X only)

If you have installed Golem using make, you will need to set your PYTHONPATH. Either set the PYTHONPATH environment variable:

$ export PYTHONPATH=/usr/local/share/pygolem:$PYTHONPATH

or pass it on the command line when you invoke python:

$ PYTHONPATH=/usr/local/share/pygolem:$PYTHONPATH python

Loading a dictionary and looking up terms

Loading a dictionary

Running Python interactively:

>>> import golem
>>> d = golem.Dictionary("/PATH/TO/DICTIONARY")

d is now an instance of golem.Dictionary, which inherits from (the Python type) dict. Thus, d.keys(), for instance, will list all the terms in the dictionary.

Looking up terms

Terms in the dictionary are indexed by keys in Clark form: {namespace}id, where namespace is the dictionary’s namespace URI and id is the ID of the dictionary entry you’re looking up. Thus, an entry cutoff in a dictionary with namespace http://www.castep.org/cml/dictionary/:

>>> cutoff = d["{http://www.castep.org/cml/dictionary/}cutoff"]
>>> cutoff
<golem.Entry object at 0x20172d0>

If there was no cutoff entry in this dictionary, a KeyError exception would be raised. We can then use this to find instances of this concept in a CML file:

>>> cutoffs = cutoff.findin("LiH-geomopt1.cml")
>>> cutoffs
[<golem.EntryInstance object at 0x202ced0>]

findin always returns a list - if the concept is present, it returns all the instances of the concept (or its implementations or synonyms) in document order, and an empty list if the concept is not found. One can then extract the value of these instances, as follows:

>>> val = cutoffs[0].getvalue()
>>> val
330.0

and once you’ve obtained a value, check what its units are, and which term it is an example of:

>>> val.entry
<golem.Entry object at 0x20172d0>
>>> val.unit
u'castepunits:eV'
>>> val.entry.term
'Basis set cutoff energy'

In the next section, we show some examples of how to use Golem for real-world data-extraction from CML files, taken from the reporting component of MaterialsGrid.