The data model design

A way of specifying XML representations for quantum chemistry concepts is required, and can be achieved using W3C? XML schemas. XML schemas are not simple to understand, if unfamiliar with the schema language; the unified modelling language (UML) is a preferred option for visualising the representation. This graphical notation can be converted to XML schemas for use with XML tools. In order to explicitly specify some of the XML schema design choices in the UML it is necessary to adopt a convention (UML profile), for this purpose. Many UML modelling tools exist; the Object Domain R3 tool (http://www.objectdomain.com) is being used within the eCCP1 project, adopting the XML schema generation profile of David Carlson (http://www.xml.com/pub/a/2001/08/22/uml.html). The XML schema can be generated by serialising the UML as XMI (XML Metadata Interchange) and then by importing this into the Eclipse based Hypermodel tool (http://www.xmlmodeling.com).

There are two extreme approaches in the design of the data model. The first involves the consideration all of the possible concepts and the relationships between them and creation a rigid data model to support these. This approach relies heavily on XML nesting. Tools can then be created to work with this data model, and to work with implicit semantics; this is a common approach. The problem is that it is rather inflexible. Everything is ‘hardwired’ and any extensions require adaptation of both the data model and code. However, writing code generators that automatically produce APIs (for parsing the formatted data) from the data model can ease this burden. The second extreme involves the design of XML components that have little, or no, interdependence. These components would relate to the various quantum chemistry concepts and the specification of links between components. This modular approach makes the data model extensible and through componentisation, more readily reusable. It results in quite a flat format, since XML nesting is kept to a minimum and mainly used within components. The adoption of this approach does, however, introduce some complexity, in that careful consideration must be given to the specification of the relationships between components. It is useful to group the quantum chemistry concepts into two categories: those that can exist on their own (called entities) and those that exist through the association of several concepts (called associations). Examples of entities are concepts such as molecular structures and atomic basis sets; both of these concepts have some meaning in isolation. Examples of association concepts would include molecular vibrations that require information about dynamics, perhaps specified by a vector, to be associated with a molecular structure. It is the entities that form the primitive types and relate to a single XML component; the associations are complex types that are formed from the linking of components. The adoption of a modular approach makes it simpler to use representations from other XML languages.

Question: Is there a natural granularity for the XML components? As a simple example, should all the data associated with a molecular vibration form a single component, or should a vibration be formed from the association of a vector (specifying dynamics) with a molecular structure?

-- PhilipCouch - 08 Nov 2004

Topic revision: r3 - 09 Nov 2004 - 17:41:46 - PhilipCouch
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback