Summon
------
.. highlight:: sh
Summon is a utility to enable researchers to extract data
from one or more CML files and collate the results as a comma-separated list
of data, one line per file.
Installing Summon
^^^^^^^^^^^^^^^^^
Summon is installed as part of Golem. If you have installed Golem using the
``make`` approach outlined earlier, you'll find ``summon`` in
``/usr/local/bin``; however, if you have used ``setup.py`` or
``easy_install``, you may need to add ``summon`` to your path.
For example, on OS X: ::
$ which summon
/opt/local/Library/Frameworks/Python.framework/Versions/2.4/bin/summon
Similarly, on Windows and Unix machines, the ``summon`` script will be
installed in your site-wide Python scripts directory (the same location where
``easy_install`` is found) - typically ``C:\Python25\Scripts``
Note for Windows users
^^^^^^^^^^^^^^^^^^^^^^
Windows and Unix have different approaches to deciding what programs should be
executable; this makes it difficult to install the utilities which ship with
Golem as executables out-of-the-box. So, in the following examples, assuming
you've added your Python install directory to your path, please substitute: ::
c:\mycmldata\> python c:\Python25\Scripts\summon
(again, assuming Python is installed in ``c:\Python25\``; if it's
installed somewhere else, substitute that path in too) for: ::
$ summon
when running the examples.
Using Summon
^^^^^^^^^^^^
Summon comes with a built-in help message: ::
$ summon --help
usage: summon options file1.xml [file2.xml ...]
options:
--version show program's version number and exit
-h, --help show this help message and exit
-t TERM, --term=TERM terms to look up
-d DICTIONARY, --dictionary=DICTIONARY
dictionary to use
-c CONFIG, --config=CONFIG
config file to use
-f, --final take only last value in file?
-o OUTFILE, --outfile=OUTFILE
dump output to csv file
To explain how summon works, we start by taking an example. ::
$ summon -t numbers_of_species -d rmcprofileDict.xml ag3cocn6_300k.xml
Here, we're extracting the numbers of the different atomic species in a
simulation (``ag3cocn6_300k.xml``), which is represented by the term
``number_of_species`` in the CML/Golem dictionary ``rmcprofileDict.xml``.
We cover the development of dictionaries later, but for now it's sufficient
to know that a dictionary contains a list of terms and metadata on how
to locate and manipulate the data the terms refer to.
The result of this call will look something like::
numbers_of_species
"[864, 288, 1728, 1728]"
where the line reflects the parameter name, and the 4 numbers are the counts
of the numbers of atoms of each of the four types in the simulation file (from
the name of the file you can guess that these are Ag, Co, C and N). To extract
multiple quantities at once, specify each separately using ``-t TERM``, where
``TERM`` is the ``id`` of the concept in the dictionary or Summon
configuration file; quantities from the same file will be written to the same
line in the output.
You can pass multiple XML files on the command line, in which case you get one
line of data per file. However, by itself, there's no means to tell which line
corresponds with each file, and thus if you need this information, you will
need to ensure that you capture data that provides this unambiguous link.
Suppose we performed a large number of simulations using the `OSSIA
`_ code, where each simulation corresponded
to a different temperature. If we want to extract the value of a quantity
called ``energy`` from each file, to plot it versus ``temperature`` (as
defined in ``ossiaDict.xml``), and assuming the CML outputted by OSSIA is in
the current working directory, we would use the following summon command
containing multiple instances of the ``-t`` option: ::
$ summon -t temperature -t energy -d /PATH/TO/DICTIONARIES/ossiaDict.xml *.xml
As you can see, in practice we usually don't need file names; just to know
that temperatures and energies match up. For instance, in this case we do know
that both temperature and energy for any simulation, which will be given on
the same line, will come from the same file.
.. highlight:: xml
Any term in a dictionary with a defined mapping from CML to a Golem
object can be extracted in this manner - basically, any entry that directly
contains, or is, a ````, ````, ````, ````,
````, ```` or ````. Concepts where this
mapping is not defined will raise an error. Looking in the dictionary, you
can spot these because they do not contain lines like::
.. highlight:: sh
Routing output
^^^^^^^^^^^^^^
To route output to a file as well as to the console, use the ``-o`` (or
``--output=``) option: ::
$ summon -t temperature -t energy -d ossiaDict.xml -o output.csv *.xml
This produces a CSV file, ``output.csv``, ready to import into your favourite
spreadsheet.
Summon configuration files
^^^^^^^^^^^^^^^^^^^^^^^^^^
Some of the time, the concepts you wish to extract may not be directly
referenced within the CML/Golem dictionary for your code. This could be for a
number of reasons; maybe the concept is too specific to be incorporated in the
dictionary, such as a specific bond length in a system (which would only exist
in systems of that kind), or possibly markup for this concept has been
introduced recently into your code and the dictionary hasn't been updated to
match yet. In that case, you may need to write a Summon configuration file in
order to define those concepts. Let's say that the ``number_of_species``
concept, used earlier, is missing from ``rmcprofileDict.xml``; we therefore
need to supply a definition.
Writing Summon configuration files
""""""""""""""""""""""""""""""""""
We do that by adding entries to a configuration file (say,
``rmcprofile.cfg``): ::
[numbers_of_species]
type: array
xpath: //cml:parameterList/cml:parameter[@dictRef="rmcprofile:numbers_of_species"]
.. highlight:: xml
where the name of the concept is the first line (``[numbers_of_species]``),
the ``type`` of the data therein is ``array`` (taken from the list of types
Golem, and therefore ``summon``, understands, as given earlier), and the
XPath expression which points to the bit of CML we want to extract is
``xpath``. (The CML namespace is defined to always be
``http://www.xml-cml.org/schema``, so you don't need to worry about declaring
that). The fragment of CML that this refers to is: ::
864 288 1728 1728
.. highlight:: sh
Thus, you can easily (effectively) define extra terms when you need them. If
you wish to rename or inherit terms from elsewhere - say, a dictionary that
came with Golem - then that is also possible. Again, using
``numbers_of_species`` as an example, and assuming the ``rmcprofile``
dictionary is available, you can define the term as follows. ::
[numbers_of_species]
dictRef: {http://www.esc.cam.ac.uk/rmcprofile}numbers_of_species
In general, ``dictRef`` consists of the namespace of the dictionary you wish
to use and the ``id`` of the term within that dictionary you want.
Alternatively, you can import entire dictionaries into your configuration with
the following declaration: ::
[global]
dictionary: /path/to/dictionary
This makes every term in the dictionary available in this configuration
file; in other words, it is exactly equivalent to entering into your
configuration file ::
[term]
dictRef: {http://dictionary.namespace/}term
for every entry in the dictionary. However, if you explicitly define a
term in the config file, that definition will be used *even if* it has
been loaded from a dictionary already. If a Summon configuration file
contains multiple ``[global]`` sections, and some of those dictionaries
contain the same term, then the first-loaded dictionary wins.
Using Summon configuration files
""""""""""""""""""""""""""""""""
Summon configuration files are used in the same way as dictionaries, except
that you load them with the ``-c`` command line option, not ``-d``. For
example, using ``rmcprofile.cfg`` from earlier: ::
$ summon -t numbers_of_species -c rmcprofile.cfg ag3cocn6_300k.xml
will generate the same output as the dictionary-using approach given earlier.
In summary, Summon configuration files are a way of constructing new
CML/Golem dictionaries on the fly as you need them, without going to all the
effort of writing the XML by hand.