Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Developing a Data Model for Computational ChemistryOutline of discussion topicsOn this page you will find a summary of the issues that have arisen so far in our development of a data representation for quantum chemical data. This outline text, and the more detailed discussions accessed via the links in this document, are taken from a paper | ||||||||
Changed: | ||||||||
< < | Data modelling for computational chemistry - a methodology. | |||||||
> > | Data modelling for computational chemistry - a methodology. | |||||||
by Philip Couch, Daresbury Laboratory.
As discussed in ECCP1Activities, we are hoping to develop a way to represent
quantum chemical data that will facilitate exchange of data between different
quantum chemistry packages, starting with those used within the CCP1 community
(e.g. GAMESS-UK and Molpro) but in the hope that this exercise can lead to a wider,
de-facto standardisation effort which other package developers will get involved in.
To achieve this we welcome input during the development process from any other
groups involved in developing quantum chemistry packages, and also those with experience
of data model and data representation development in the chemistry area. To make
your comments you can use either the mailing list![]() ![]() Adoption of a markup languageThe interoperability of computational codes running in Grid environments is currently hindered by a lack of data standards. The simplest and most straightforward level of standardisation is that of the data syntax; the eCCP1 project has proposed the adoption of XML to address this issue. However, XML imposes a hierarchical structure on the data that is not always appropriate, and a question remains as to whether a more suitable technology can be used (perhaps alongside XML) to represent computational chemistry concepts (such as the Resource Description Framework). Status: At the moment, we are assuming that an XML-based approach offers the best way forward, See the detailed discussion.The data model designThe data model can be designed in one of two extreme ways. The first involves the consideration of all the concepts of interest and the relationships between them, followed by the construction of a rigid data model that supports these. Tools can be developed that are 'hard-wired' to work with implicit semantics. The second approach involves the design of components that relate to concepts and provides methods to link these components together. But, is there a natural granularity for the XML components? Status: We are following a component-oriented approach, each document comprising components represented as defined by a number of individual schema, with addition information provided to cross-reference the components as required to describe complex, multi-component data: See details.Existing XML languages with relevance to computational chemistryOther XML languages exist with relevance to computational chemistry, such as the Chemical Markup Language, and some XML components can be based on these. The computation chemistry component ofGeneral data containersIn some cases it is a good idea to group concepts into classes and attempt to find a common representation for the class. For example, all scalar properties could be represented using one XML 'scalar' element-type. This approach can reduce schema entries and the burden on application developers. It is not clear to what extend these general containers should be used with quantum chemistry data. Status: We will follow the generic container model as defined byProposed elements of the data representationAs we develop proposals for new components we will add them here.Relationships between conceptsIt is not only important to represent computation chemistry concepts, but also the relationships between them. For example, a way of assigning basis sets to atoms is required. There are several important types of relationship that need to be addressed along with multiple methods that can be used to express them. The relative merits of each approach needs to be carefully considered. Status: This is an active area of research. We are exploring the W3C XLink spec and RDF as alternatives to explicitly including identifiers & references in the data elements, details.Metadata and provenanceIn addition to representing data, there is also a requirement to represent metadata, such as information on the particular code that was used to calculate the data, details of the computational methods and individuals involved. CCLRC has produced a general scientific metadata model (CCLRC Scientific Metadata Model) that may fulfil the metadata and provenance requirements of the eCCP1 project. Status: The merits/limitations of the CCLRC Scientific Metadata Model are under investigation details. -- PhilipCouch - 08 Nov 2004 |