The relationships between concepts

If a modular approach is adopted for the data model, the representation of the concepts alone is not enough. It is also a requirement to be able to link concepts and specify what these links mean. As an example, an XML document may provide information about a quantum chemistry calculation, including several atomic basis sets and a molecular structure. In order for this data to be correctly interpreted, it could be necessary to understand which atomic basis set has been assigned to which atom – a way of specifying such ‘mappings’ is required and this is a surprisingly non-trivial problem. The type of link described here is one that ‘maps’ entities. A further type of link is one that forms an association concept by linking other entity concepts together. An example would be the ‘binding’ of a vector specifying atom dynamics with a molecular structure to form the association concept of a vibration - it is important to distinguish between the two types of link. A third type of link is one that relates an entity with another link. An example could be the linking of a density matrix to a link that maps basis sets to atoms. It should not be possible to use data from an XML document out of context. Distinguishing between links in this manner helps to avoid this problem by providing information on the semantic ties between data. These ties are not just important for the semantics, but also for allowing some implicit mappings. For example, specifying that a particular vector is bound to a particular molecule does not provide any information about how this vector might map to the individual atoms. However, if the vector is always tied to a particular molecule then we can rely on, for example, document ordering to map components of the vector to atoms of the molecule. Although in general it is not good practice to rely on implicit mappings, in this case it is useful because it significantly reduces the verbosity of the overall representation. Additional links are required that do not relate to the association of XML components, but to the repetition of data. In some cases, several components may share a significant amount of data that it is not desirable to duplicate. In this case, it is useful to have a method of linking sub-components. An example would be the use of sub-components of a 6-31G atomic basis set to form part of the 6-31G* atomic basis set. Further, there could be the requirement for a method of grouping components, for example, an additional component that acts as a general container for other components. This could be used to, for example, group 50 scalar properties to form a list and this list could then be linked to a molecular structure. This is in contrast to explicitly linking each scalar to the structure. Often, current methods rely on document ordering for this grouping. The eCCP1 project is currently addressing how these links should be expressed in an XML document. There are several standards that could be used for this purpose and each is briefly discussed herein.

The simplest way is to link components is via ‘id’ and ‘ref’ attributes. Each concept is allowed a unique identifier and has the ability to reference the identifier of other components. This approach is commonly used, but has some drawbacks. The first is that the data model needs to specify all the possible references that a particular component will need to make, making this approach somewhat inflexible. The second, and more significant drawback, is that often components may be taken from different sources and collated for the purpose of a calculation. For example, a structure may be taken from a structure database and basis set from a basis set library. These components do not know about each other’s identifiers and a simple method is required to express how the components relate. Further, this method does not provide any explicit semantics for the link.

Of course, XML nesting relates components in a hierarchy. Nesting can usefully be used to group data that form entities. But, heavy dependence on nesting produces problems; data becomes locked away in the branches of the tree structure and it often becomes necessary to repeat data across nodes. It is therefore likely that a rather flat document structure is preferential to a heavily nested one. Repetition could be avoided through the id and ref attribute approach, but this raises the issues discussed above.

A further approach is to include a specification for expressing these links in the data model. This would need to provide support for locating the XML nodes to be linked and for adding semantics to the link. This specification could make use of W3C? standards that allow the location of nodes and nodes sets to be specified (XPath, XPointer). In addition, the XLink specification provides much of what is needed here. It allows the use of XPointer and makes provision for link semantics.

It is clear from the above discussion that difficulty does not lie in creating a representation for quantum chemistry concepts, but rather in associating the concepts and specifying what the associations mean. Difficulties are encountered because the XML data model is hierarchical and the nature of the mappings can be complex. It seems that a natural solution would be to work towards a way of integrating XML with other semantic web technologies that are able to express complex relationships in a more natural way. One promising technology is the Resource Description Framework (RDF). This specifies a way of making statements such as ‘The basis set with identifier “carbon” is assigned to all atoms that have identifiers “C1”’. RDF statements can be expressed as triples: a subject, predicate and object (n3 format). These triples can then be serialised as XML and can therefore be included as part of the XML data document. It will also be an important requirement to document the types of possible links and place constraints on the components that must be used by each type of link. XML dictionaries could be used for this purpose.

Question: Which method is best for the specification of links between XML components?

-- PhilipCouch - 08 Nov 2004

Topic revision: r2 - 09 Nov 2004 - 10:30:00 - PhilipCouch
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback