Data Modelling for Computational Chemistry - a Summary


Philip Couch
Daresbury Laboratory, Daresbury, Warrington, Cheshire, UK. WA4 4AD.

Adoption of a markup language

The interoperability of computational codes running in Grid environments is currently hindered by a lack of data standards. The simplest and most straightforward level of standardisation is that of the data syntax; the eCCP1 project has proposed the adoption of XML to address this issue. However, XML imposes a hierarchical structure on the data that is not always appropriate, and a question remains as to whether a more suitable technology can be used (perhaps alongside XML) to represent computational chemistry concepts (such as the Resource Description Framework).


The data model design

The data model can be designed in one of two extreme ways. The first involves the consideration of all the concepts of interest and the relationships between them, followed by the construction of a rigid data model that supports these. Tools can be developed that are 'hard-wired' to work with implicit semantics. The second approach involves the design of components that relate to concepts and provides methods to link these components together. But, is there a natural granularity for the XML components?


Existing XML languages with relevance to computational chemistry

Other XML languages exist with relevance to computational chemistry, such as the Chemical Markup Language, and some XML components can be based on these. The computation chemistry component of CML is under development and some extensions are required to cover all of the required concepts. This raises questions as to which XML components can be based on CML and other languages, such as VTK, and which must be created. Consideration must also be given to choosing the most appropriate method for collaboratively designing such representations - eCCP1 is currently using UML for this purpose.


General data containers

In some cases it is a good idea to group concepts into classes and attempt to find a common representation for the class. For example, all scalar properties could be represented using one XML 'scalar' element-type. This approach can reduce schema entries and the burden on application developers. It is not clear to what extend these general containers should be used with quantum chemistry data.


Relationships between concepts

It is not only important to represent computation chemistry concepts, but also the relationships between them. For example, a way of assigning basis sets to atoms is required. There are several important types of relationship that need to be addressed along with multiple methods that can be used to express them. The relative merits of each approach needs to be carefully considered.


Metadata and provenance

In addition to representing data, there is also a requirement to represent metadata, such as information on the particular code that was used to calculate the data, details of the computational methods and individuals involved. The CLRC has produced a general scientific metadata model (CLRC Scientific Metadata Model) that may fulfil the metadata and provenance requirements of the eCCP1 project. This is currently under investigation.


-- PhilipCouch - 08 Nov 2004

Edit | Attach | Print version | History: r6 | r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r2 - 09 Nov 2004 - 10:20:00 - PhilipCouch
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback