General data containers

In some cases, it is a good idea to group concepts into classes and attempt to find a common representation for the class. For example, there are many scalar properties that are important in computational chemistry and need to be supported in the data model. Each scalar property could be considered separately, such as the electron exchange energy or kinetic energy. But, this would lead to a rather large XML schema that would need constant revision as new scalar properties are added. An alternative is to find a common representation for all scalar properties. CML implements this by specifying representations for several classes of concepts (scalar, matrix, array). These representations allow references to be made to XML dictionary entries. These references specify the particular instance of the class that is represented (perhaps electron correlation energy for scalars). Dictionaries need to be created by specific communities, providing individual terms and their definitions. The use of general containers for data does require some extra consideration with regard to validation. XML documents should be both valid and well formed. For a document to be well formed it must be syntactically correct. In order for the document to be valid, it must conform to the restrictions specified in an XML schema. Due to limitations of W3C? XML schemas, it is not possible to specify restrictions on a documents structure based on element attribute values. Therefore, it is possible to specify where the general containers (representing a class of concepts) may occur, but not where specific concepts may occur. Combining W3C? XML schemas with the eXtensible Stylesheet Language (XSL) can enhance its expressive power. The CML schemas use the schema appinfo element to contain XSL that express further constraints on the validity of a CML document. There are many mature XSL parser APIs that can be used by application code developers to check conformance to these validity constraints. In addition to the inclusion of XSL in the XML schemas, it may also be included in dictionaries for the same purpose.

It is simpler to make use of components from other XML languages if the schemas in which they are specified are designed in a modular fashion (e.g. specify all elements as global elements). In particular, such components can be easily referenced from other schemas. The CML schemas have been designed in this way, and it is therefore a simple task to make use of CML components. Components used from other languages should make use of XML namespacing to ensure that name collisions are avoided. It is important to note that XML does not add any semantics to data; it is used to describe and structure data. The description adds some implicit semantics, but often this is not sufficient. Further semantics can be added via a couple of mechanisms. The first is to add simple annotation to the XML schema, and this could be made machine and/or human readable. The second is through XML dictionaries. The format of the XML dictionaries are specified in XML schemas; in the CML this specification is part of the Scientific, Technical and Medical Markup Language (STMML).

Question: What types of data should be contained within general containers?
Question: How should we standardise dictionaries?

-- PhilipCouch - 08 Nov 2004

Topic revision: r2 - 09 Nov 2004 - 10:26:00 - PhilipCouch
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback