AgentX [#!agentx!#] is a library which allows users and application to extract information from any type of digital data source independently of the way that data is stored or organised. This simple interface thus enables applications to gain easy access to disparate information, without extensive coding. It also insulates the application from changes to the underlying data formats consumed. AgentX is easy to extend to new data formats and sources through plug-ins.
It is becoming increasing common for complex processes to involve the use of software developed by different communities. These processes often require information to be passed between the software components and this exchange can be difficult to achieve. Often the software has been developed in isolation and the way that information can be input and output will differ. There is little standardisation in the way that the information is handled and this makes it difficult to exchange between software.
One option for the architect of a complex system is to limit the choice of software to those that can already work together. This is less than ideal because it may not be possible to find software that meets specific functional requirements.
An alternative is for ``mediators'' to be developed. These mediators could be additional software developed specifically to enable information to be transferred for example, by converting a file produced by one piece of software into a file that can be read into another piece of software. This approach also has problems: the developer of the mediator has to have a good understanding of all types of files involved in the conversion process. If there are many different types these mediators are expensive to produce. An additional issue is that such mediators are often brittle. They are not able to function with slight changes to the files that might occur between different versions of a particular piece of software. In other words, this approach is often associated with a significant maintenance cost.
One approach used to address these problems is to try to get the communities developing the software to agree on standards relating to the way that information is to be exchanged. A good example would be the efforts of BASDA to develop eBIS-XML a standard for business documents. Business applications are developed to be able to read and write eBIS-XML documents and in this way business information can be exchanged between these applications. The problems with this approach are largely socio-political. Different communities have different requirements; some communities already have well defined standards relating to their data and want to promote these: it is difficult to reach agreement and in the best case reaching agreement is a long process.
AgentX explores a different approach to simplifying this exchange of information with an assumption that the standardisation process is incomplete. It is a library that allows users to extract information from sources of information independently of the way that information is stored or organised. As an example, AgentX can be used to extract information from a file in a way that does not depend on the file format.
It is a tool that is designed to be used as a component of a system that needs to work with a range of sources of information, particularly where these sources are heterogeneous and/ or subject to change. It is designed to allow users to access a source of information without the need to understand the technical details of that source.
This approach makes it the responsibility of an information provider to describe their source in a way that makes it accessible to others. This description is used by AgentX to access the information from that source. AgentX can be considered as a layer of abstraction: software that needs to access a source no longer deals with it directly instead, the software makes use of AgentX to access the information. This makes it considerably easier to develop software that need to access many different sources of information. The developer only needs to understand how to communicate with AgentX and not how to access information from all of the individual sources.
AgentX is standards based, being built upon technology developed as part of the Semantic Web effort (including SPARQL, RDF and OWL). It is written in C/ C++, can be used on a range of platforms (Windows, Linux, Mac OS) and can accessed from software written in a range of different languages. AgentX has no dependencies on other software. It is currently only able to extract information from ASCII (plain text) documents, including XML. However, it is designed to be easy to extend to other sources of information (for example SAGE Line outputs) through the development of plug-ins.
For information about the e-CCP project which developed AgentX, see http://www.grids.ac.uk/eccp.
The AgentX library calls, made from RMCS as a result of the user specified metadata expressions, query documents for data with a specific context. For example, AgentX could be used to find the total energy of a system calculated during a simulation. In this case the user specified expression might have the form:
[frame=single] AgentX = FinalEnergy, output.xml:PropertyList [title='rolling averages'].Property [dictRef='dl_poly:eng_tot'].value
The term providing the name of the metadata item is 'FinalEnergy' and the document to be queried is output.xml. The string following output.xml: is parsed by RMCS and converted to a series of AgentX library calls. In this example, AgentX is asked to locate all the data sets in output.xml that relate to the concept ``property'' and which have the reference dl_poly:eng_tot (the average energy of a system in the context of the DL_POLY simulation code). The value of this property is extracted and associated with the term FinalEnergy. The RCommand shell tools are then used to store this name-value pair in the metadata database.
AgentX works with a specification of ways to locate data in documents (such as a CML document) that have a well defined content model. There are two components to the AgentX framework:
AgentX is able to retrieve information from arbitrary XML documents, as long as mappings are provided. Such mappings exist for a number of codes (e.g. VASP, SIESTA and DL_POLY), and the number of concepts involved is being increased. In addition, mappings are under development for other codes.
Because the content of the data, for AgentX's purposes, resides entirely in the mapping between the concepts and the document structure (rather than solely in the structure itself), we have been able to design a more efficient representation for large datasets (such as DL_POLY configurations containing more than 106 atoms) at the expense of losing the contextual information from the document itself compared to standard XML.
AgentX depends on having appropriate RDF and OWL documents available for its mappings and ontologies. The file structure for these illustrated for computational chemistry applications, relative to the document root of a Web server is as follows.
[frame=single] <docroot>/mappings/bridges.rdf /ontology/compchemont.owl
sub axParserStart() no arguments.
sub axCache() takes argument axCacheNo.
sub axGetUri() takes argument OWLDoc or RDFDoc.
sub axBaseUri() takes argument axBaseURI.
sub axDataGetUri() takes argument file.
sub axParserFinish() has no arguments.
sub axSelect() takes a string argument.
sub axRefine() takes a string argument which is a name=value pair.
sub axDeselect() has no argument.
sub axValue() has no argument.
Rob Allan 2009-11-10