Summon ------ .. highlight:: sh Summon is a utility to enable researchers to extract data from one or more CML files and collate the results as a comma-separated list of data, one line per file. Installing Summon ^^^^^^^^^^^^^^^^^ Summon is installed as part of Golem. If you have installed Golem using the ``make`` approach outlined earlier, you'll find ``summon`` in ``/usr/local/bin``; however, if you have used ``setup.py`` or ``easy_install``, you may need to add ``summon`` to your path. For example, on OS X: :: $ which summon /opt/local/Library/Frameworks/Python.framework/Versions/2.4/bin/summon Similarly, on Windows and Unix machines, the ``summon`` script will be installed in your site-wide Python scripts directory (the same location where ``easy_install`` is found) - typically ``C:\Python25\Scripts`` Note for Windows users ^^^^^^^^^^^^^^^^^^^^^^ Windows and Unix have different approaches to deciding what programs should be executable; this makes it difficult to install the utilities which ship with Golem as executables out-of-the-box. So, in the following examples, assuming you've added your Python install directory to your path, please substitute: :: c:\mycmldata\> python c:\Python25\Scripts\summon (again, assuming Python is installed in ``c:\Python25\``; if it's installed somewhere else, substitute that path in too) for: :: $ summon when running the examples. Using Summon ^^^^^^^^^^^^ Summon comes with a built-in help message: :: $ summon --help usage: summon options file1.xml [file2.xml ...] options: --version show program's version number and exit -h, --help show this help message and exit -t TERM, --term=TERM terms to look up -d DICTIONARY, --dictionary=DICTIONARY dictionary to use -c CONFIG, --config=CONFIG config file to use -f, --final take only last value in file? -o OUTFILE, --outfile=OUTFILE dump output to csv file To explain how summon works, we start by taking an example. :: $ summon -t numbers_of_species -d rmcprofileDict.xml ag3cocn6_300k.xml Here, we're extracting the numbers of the different atomic species in a simulation (``ag3cocn6_300k.xml``), which is represented by the term ``number_of_species`` in the CML/Golem dictionary ``rmcprofileDict.xml``. We cover the development of dictionaries later, but for now it's sufficient to know that a dictionary contains a list of terms and metadata on how to locate and manipulate the data the terms refer to. The result of this call will look something like:: numbers_of_species "[864, 288, 1728, 1728]" where the line reflects the parameter name, and the 4 numbers are the counts of the numbers of atoms of each of the four types in the simulation file (from the name of the file you can guess that these are Ag, Co, C and N). To extract multiple quantities at once, specify each separately using ``-t TERM``, where ``TERM`` is the ``id`` of the concept in the dictionary or Summon configuration file; quantities from the same file will be written to the same line in the output. You can pass multiple XML files on the command line, in which case you get one line of data per file. However, by itself, there's no means to tell which line corresponds with each file, and thus if you need this information, you will need to ensure that you capture data that provides this unambiguous link. Suppose we performed a large number of simulations using the `OSSIA `_ code, where each simulation corresponded to a different temperature. If we want to extract the value of a quantity called ``energy`` from each file, to plot it versus ``temperature`` (as defined in ``ossiaDict.xml``), and assuming the CML outputted by OSSIA is in the current working directory, we would use the following summon command containing multiple instances of the ``-t`` option: :: $ summon -t temperature -t energy -d /PATH/TO/DICTIONARIES/ossiaDict.xml *.xml As you can see, in practice we usually don't need file names; just to know that temperatures and energies match up. For instance, in this case we do know that both temperature and energy for any simulation, which will be given on the same line, will come from the same file. .. highlight:: xml Any term in a dictionary with a defined mapping from CML to a Golem object can be extracted in this manner - basically, any entry that directly contains, or is, a ````, ````, ````, ````, ````, ```` or ````. Concepts where this mapping is not defined will raise an error. Looking in the dictionary, you can spot these because they do not contain lines like:: .. highlight:: sh Routing output ^^^^^^^^^^^^^^ To route output to a file as well as to the console, use the ``-o`` (or ``--output=``) option: :: $ summon -t temperature -t energy -d ossiaDict.xml -o output.csv *.xml This produces a CSV file, ``output.csv``, ready to import into your favourite spreadsheet. Summon configuration files ^^^^^^^^^^^^^^^^^^^^^^^^^^ Some of the time, the concepts you wish to extract may not be directly referenced within the CML/Golem dictionary for your code. This could be for a number of reasons; maybe the concept is too specific to be incorporated in the dictionary, such as a specific bond length in a system (which would only exist in systems of that kind), or possibly markup for this concept has been introduced recently into your code and the dictionary hasn't been updated to match yet. In that case, you may need to write a Summon configuration file in order to define those concepts. Let's say that the ``number_of_species`` concept, used earlier, is missing from ``rmcprofileDict.xml``; we therefore need to supply a definition. Writing Summon configuration files """""""""""""""""""""""""""""""""" We do that by adding entries to a configuration file (say, ``rmcprofile.cfg``): :: [numbers_of_species] type: array xpath: //cml:parameterList/cml:parameter[@dictRef="rmcprofile:numbers_of_species"] .. highlight:: xml where the name of the concept is the first line (``[numbers_of_species]``), the ``type`` of the data therein is ``array`` (taken from the list of types Golem, and therefore ``summon``, understands, as given earlier), and the XPath expression which points to the bit of CML we want to extract is ``xpath``. (The CML namespace is defined to always be ``http://www.xml-cml.org/schema``, so you don't need to worry about declaring that). The fragment of CML that this refers to is: :: 864 288 1728 1728 .. highlight:: sh Thus, you can easily (effectively) define extra terms when you need them. If you wish to rename or inherit terms from elsewhere - say, a dictionary that came with Golem - then that is also possible. Again, using ``numbers_of_species`` as an example, and assuming the ``rmcprofile`` dictionary is available, you can define the term as follows. :: [numbers_of_species] dictRef: {http://www.esc.cam.ac.uk/rmcprofile}numbers_of_species In general, ``dictRef`` consists of the namespace of the dictionary you wish to use and the ``id`` of the term within that dictionary you want. Alternatively, you can import entire dictionaries into your configuration with the following declaration: :: [global] dictionary: /path/to/dictionary This makes every term in the dictionary available in this configuration file; in other words, it is exactly equivalent to entering into your configuration file :: [term] dictRef: {http://dictionary.namespace/}term for every entry in the dictionary. However, if you explicitly define a term in the config file, that definition will be used *even if* it has been loaded from a dictionary already. If a Summon configuration file contains multiple ``[global]`` sections, and some of those dictionaries contain the same term, then the first-loaded dictionary wins. Using Summon configuration files """""""""""""""""""""""""""""""" Summon configuration files are used in the same way as dictionaries, except that you load them with the ``-c`` command line option, not ``-d``. For example, using ``rmcprofile.cfg`` from earlier: :: $ summon -t numbers_of_species -c rmcprofile.cfg ag3cocn6_300k.xml will generate the same output as the dictionary-using approach given earlier. In summary, Summon configuration files are a way of constructing new CML/Golem dictionaries on the fly as you need them, without going to all the effort of writing the XML by hand.