The Golem ontology system

What is Golem?

Golem has been developed as part of the MaterialsGrid project.

Golem is a set of tools, and ontology language, for processing data written in the CML, the Chemical Markup Language. The Golem language is XML, and the tools and libraries are written in Python.

Together, the language and toolkit help scientists use, and write, tools for processing scientific data by reference to the concepts found therein, rather than having to fight with the formats and syntax the data happens to be serialized in.

The toolkit is under the MIT License — open-source and Free Software.

Every code, or resource, which uses CML has a subtly different set of concepts it is trying to represent, and will as a result use CML syntax slightly differently. These differences are encapsulated in Golem/CML Dictionaries, which specify the concepts and syntax particular to a given domain of CML usage.

We have developed dictionaries for many CML-emitting codes and resources. The codes include CASTEP, DL_POLY, SIESTA, MOPAC and GULP; we also have a dictionary for the CrystalEye crystallographic structure database. However, Golem also includes tools to make it straightforward to develop new dictionaries for new CML dialects.

Where do I get Golem?

The latest release is Golem 0.99b.

If you've got a recent version of Python, you can install it by: (sudo) easy_install golem

Alternatively, you can download binary and source distributions — and an (untested) Windows installer — from Google Code or the Python Package Index.

If you'd like to get the latest source, or contribute to Golem, our tree's available by SVN from our repository on Google Code, alongside our bug-tracker, wiki and mailing lists. If you find any problems, please let us know!

Documentation

Have a look at Brighten the Corners, Andrew Walkingshaw's weblog, as well - particularly the posts tagged "Golem".

We're working on it!

For now, there are some example dictionary entries and a heavily-commented program using Golem here, and the Pydoc's here.