Introduction to the Daresbury Science and Innovation Campus Grid

Rob Allan, Dave Cable and Tim Franks

Computational Science and Engineering Department,

Daresbury Laboratory, Daresbury, Warrington WA4 4AD

John Kewley

e-Science Centre, Daresbury Laboratory, Daresbury, Warrington WA4 4AD

Contact e-Mail: robert.allan@stfc.ac.uk

Version 2.1

Abstract:

This is a document to stimulate discussion about and involvement in the Daresbury Science and Innovation Campus Grid comprising services for research, information management and innovation and supporting both commercial and academic users.

We could define the term ``Campus Cloud'' to embrace the use of ICT on the Campus including computational resources, e-research portals, virtual meetings and information management. Networking, e-mailing, video conferencing and calendaring are considered separately by the Campus ICT group who have also set up a ``connections'' portal. The Cloud is defined differently by its frequent use in the context of utility computing. It however shares some characteristics of a Campus Grid by providing common user interfaces and catalogues for selection of resources and applications (services).

This document gives an overview and introduction to the DSIC project and is provided as input to the Campus Grid SIG.

© STFC 2008-10. Neither the STFC nor its collaborators accept any responsibility for loss or damage arising from the use of information contained in any of their reports or in any communication about their tests or investigations.


Contents

Introduction

There is a strong history of computer use for research purposes at Daresbury Laboratory [6] for accelerator design and control, data analysis and more recently modelling and simulation. Development of expertise in all these areas is exemplified by the advanced proposal for the Hartree Centre [24], which is a gateway technology centre for computational science and engineering focussing on research grand challenges and knowledge exchange. Such developments are underpinned by expertise in networking which is particularly strong at the Laboratory. Other related development work was carried out in the e-Science Programme and included deployment of Access Grid, Computational Grid (NW-GRID) and Portal technologies. A recent book [2] explains the philosphy of using all these technologies in what is now known as a Virtual e-Research Environment.

There have been a number of discussions about extending ICT services for research, information management and innovation across the whole of the Daresbury Science and Innovation Campus. Plans are under way to deploy an ``intelligent'' network fabric with the capability to provide roaming access across site for both commercial and academic users. The network will route them via the appropriate providers. A separate document describes this and explores ways in which it might be used [16].

We here describe and explore the growing set of requirements to exploit this advanced fabric for diverse computational and collaborative research purposes.

NW-GRID and the DSIC Campus Grid

The North West Grid (NW-GRID) service includes clusters of high performance computers and a group of experts in operational support and computational science offering customer support and capable of working with a wide variety of simulation, modelling and data analysis applications enabling you to use and access unrivalled resources for your projects and research.

NW-GRID is a collaboration between Daresbury Laboratory and the Universities of Lancaster, Liverpool and Manchester plus the Proudman Oceanographic Laboratory (POL) and University of Central Lancashire (UCLAN). With the support of NWDA funding for the core sites, we together established a computational Grid comprising high performance computing systems coupled by a high speed private fibre network. The original infrastructure was deployed in 2006, with additions in 2007 and in the spring of 2008. Sun systems with dual core and quad core AMD Opteron processors integrated by Streamline Computing provide the core of the infrastructure. The NW-GRID offers world class services founded in the deployment and exploitation of Grid middleware technologies, enabling the capabilities of the Grid to be realised in leading edge research applications, primarily in computational science and engineering. Over the past few years, services were offered to many research projects in the region and have resulted in publications in high profile journals such as Nature. With the confidence gained from these successes NW-GRID is now being made available to non-academic users in the region [1].

NW-GRID is a important infrastructure for the North West science strategy and the project resonates strongly with the key elements of the NWDA's regional strategy, in particular in working with targeted emerging sectors in the environment, bio-technology and pharmaceutical and complex materials areas, establishing the North West as a global player in Grid technologies, e-research and in embedding e-competencies across the region's business, academic and industrial base.

Figure 1: NW-GRID Cluster at Daresbury
Image NW-Grid

The high performance compute clusters at the core sites are complemented by a high speed private network which can be enhanced and configured to meet all the requirements for secure access and data transfer between clusters and storage systems. All systems are supported by appropriate local disk storage and data backup.

Applications of NW-GRID

Computational power in itself is of no direct benefit. Where NW-GRID creates real value for projects is the combined access to hardware, open source and commercial applications and expert knowledge from the partner sites. NW-GRID offers a pay-as-you-go service for commercial access to computational resources, application licenses and expertise with a number of pricing models to meet customers' growing requirements.

Currently NW-GRID has completed simulation and modelling projects in the following sectors. Separate technical case studies are available for each and some additional information is available from the Web site.

By subscribing to NW-GRID you can use its high performance computer systems for your projects. For applications and further information, please visit the NW-GRID Web site at http://www.nw-grid.ac.uk. Information about some of the work carried out on NW-GRID is described in two publications [19,20] and a report [1].

A separate document is available which lists regional initiatives using high performance computational resources [3].

Computer Resources at DSIC

The following computing systems are available to NW-GRID users as a multi-institutional Grid or to local academic users (and commercial subject to discussion) on DSIC. For further information about commercial access and the services available contact John Bancroft on 01925 603148 or Michael Gleaves on 01925 603710 or visit the DaComS Web site http://www.dacoms.ac.uk.

nw-grid phase 0
8 node IBM cluster plus head node, used as a test cluster and part of DSIC Condor pool;
dl1.nw-grid.ac.uk
96 node Streamline/ Sun cluster with 2.4GHz twin dual core Opteron processors making a total of 380 cores. Head node hosts Globus, Sun Grid Engine and Condor services;
mano.dl.ac.uk
IBM BlueGene-L, 2048 processor cores used for development purposes. See http://request.dl.ac.uk/hosts/mano.live;
legion.dl.ac.uk
IBM BlueGene-P, 4096 processor cores;
cseem64t.dl.ac.uk
Streamline 32 node cluster with 2x ClearSpeed CATS (24 cards) hosted by four of the nodes. Each node has 2x dual core 3GHz Intel Woodcrest processors. See http://www.cse.scitech.ac.uk/disco/cseem64t/cseem64t.shtml;
cseht.dl.ac.uk
Streamline 32 node cluster with twin quad core Intel Harpertown processors making 256 cores in total. This now in addition has 8x Nehalem nodes each with 4x nVidia Tesla 1070 attached GPU cards. See http://www.cse.scitech.ac.uk/disco/cseht/cseht.shtml;
hpcx.dl.ac.uk
former 2560 node IBM system with 1.9GHz Power5 processors. See http://www.hpcx.ac.uk. HPCx, requires grant approval from EPSRC - we note that the HPCx service finished on 31/1/2010;
power7.dl.ac.uk
IBM POWER-7 system, 4x 32-way nodes with 256GB memory each, currently for testing and application development;
condor-main.dl.ac.uk
DL Condor pool, see http://tardis.dl.ac.uk/Condor/cgi-bin/CondorStatus.cgi. This manager node is for the Daresbury Laboratory;
ci-condor.dl.ac.uk
CI Condor pool, see http://tardis.dl.ac.uk/Condor/cgi-bin/CIStatus.cgi. This manager node is for the Cockcroft Institute;
orion-galaxy
Astec compute cluster [info tba]
blade1
12 node IBM BladeCenter, dual Xeon, hosting Web based services and portals;
blade2
14 node IBM BladeCenter, dual Xeon, hosting Web based services;
blade3
IBM JS22 Power blade cluster with QS22 Cell processor blades, see http://www.cse.scitech.ac.uk/disco/novel_arch/cell.shtml;
rmcs.dl.ac.uk
RMCS server hosting Condor, Condor-G and Globus services with an RMCS and RCommands Web services interface.

For more information about the Condor Pools and related resources see http://www.grids.ac.uk/twiki/bin/view/GridAndHPC.

Figure 2: DSIC Campus Grid Architecture
Image dsic_condor_arch

The overall architecture of the DSIC Campus Grid is shown in Figure 2. This depicts the main cluster resources and also the pools of linux and windows workstations accessible via the Condor master nodes. We currently take a ``federated'' approach to Campus Grid deployment with departments responsible for their own pools. Flocking between them is however permitted and encouraged. This architecture is closely based on work done in the NERC funded e-Minerals e-science project in collaboration with University of Cambridge.

Data Storage

Data storage is not currently a high priority on DSIC except for the HPCx service which had its own disc sub-system, tape robot and off site backup archive. Discussions are however under way to host similar services for POL and VEC and potentially for commercial users. There are huge gains to be made via economy of scale and we see ourselves ideally placed, working with appropriate vendors, to offer future data storage services over the network infrastructure already in place.

Sakai Portal Framework

Sakai is our framework of choice for delivering portal services. It is the second most widely used open source portal framework and has been designed to support both learning and research involving up to tens of thousands of users. It is thus an ``enterprise class'' service [10,18,11,2]. Sakai complements our other Campus Grid activities by acting as a single sign-on container for a range of Web 2.0 style shared tools such as: resource folders, e-mail archive, blog, wiki, calendar, online chat, RSS news reader, search, and interfaces to project specific applications. Most importantly Sakai can be used to support ``virtual organisations'' through its worksites and role based access control [12].

Sakai on the Daresbury Science and Innovation Campus

Figure 3: Sakai Portal for the Hartree Centre
Image hartree_portal

A number of Sakai instances hosted on blade2 are currently in use as follows. These consortia mostly use Sakai as an information management and collaboration system for collaborating user groups their projects.

http://rhine.dl.ac.uk:8080/portal
NW-GRID Portal - some 400 users with around 20 worksites, see http://www.nw-grid.ac.uk;
http://portal.ncess.ac.uk
National Centre for e-Social Science - some 500 users with around 50 worksites. Some of the kernel has been changed and bespoke tools added for various research purposes. Most active current project is NeISS, the National e-Infrastructure for Social Simulation, see http://www.neiss.org.uk;
http://yukon.dl.ac.uk:8080/portal
Hartree Centre Portal - development of the centre of excellence in computational science to be based at Daresbury Laboratory;
http://avon.dl.ac.uk:8080/portal
Detector Systems Centre Portal - development of the centre of excellence in detectors;
http://cselnx9.dl.ac.uk:8080/portal
Portal for EU Psi-k Network of Excellence in complex materials research, over 1,000 users, see http://www.psi-k.org. Bespoke tools have been written for this group to produce electronic newsletters and do workshop and conference admin;
http://yangtze.dl.ac.uk:8080/portal
Prototype portal for the Diamond synchrotron - used for demonstration purposes only, for instance of the DataPortal interface;
http://cselnx8.dl.ac.uk:8080/portal
Daresbury Laboratory Computational Science and Engineering Department - support for existing research groups and their projects.

A number of other servers are running Sakai for development and demonstration purposes such as bonny.dl.ac.uk (NeISS development); clyde.dl.ac.uk (Sakai build and test for Steve Swinsburg); dee.dl.ac.uk (test site for Daresbury drawing office and programmes group); congo.dl.ac.uk (gateway centres oversight group);

Similar services have been offered to other gateway centres and to the North West Virtual Engineering Centre.

Access Grid

All core NW-GRID sites and many UK and overseas Universities are equipped with Access Grid rooms for virtual meetings. The A1 AG room is used for the fortnightly NW-GRID Tehcnical Board and Operations Board meetings, for meetings with the NGS, HPCx, HECToR and other project partners, for instance the National Centre for e-Social Science.

The T22 combined Access Grid and video conferencing suite in the Tower at Daresbury is a state-of-the-art facility. It has an AG enabled training room next door (T23) and a conference/ training suite nearby which can hold 60 people (or 2x groups of 30 if partitioned). AG and video conferences can be broadcast into this space. This facility was part funded by NWDA and is available for use on the Daresbury Science and Innovation Campus.

Information Management and Database Support

It is important to include information management in this discussion. This includes the library services, many of which are now on-line with subscription and delivery managed as appropriate using resolvers. Publications of STFC staff, collaborators and facilities users are available through the ePubs repository which is a valuable science research knowledge base at http://epubs.cclrc.ac.uk. Discovery and access tools need to be provided alongside research and commerce tools, for instance ePubs must be accessible from portals and other information systems. We explored these issues in consultancy carried out for JISC during 2007 [5]. Institutional Repositories are discussed in a book by Cathy Jones from the e-Science Centre [15].

During the period when e-Science was a strong research topic on the Daresbury site we supported an Oracle DB service. The systems to do this are still in place, but we no longer have DBA staff so have moved critical services to other sites, for instance the National Grid Service runs an Oracle server from University of Manchester. We believe that DSIC should have a campus wide Oracle service again to support longer term service developments, particularly in information management.

Grid vs. Cloud

The name Grid or Cloud could be chosen. They are similar in meaning but subject to differences of interpretation. Cloud has been associated with the provision of services ``on demand'' as offered by Google, Amazon, Microsoft, etc. who host massive server farms for this purpose. Our meaning is clearly different, but we could use the same term to capture all the services offered over the campus ICT infrastructure as described here.

Cloud computing gained attention in 2007 as it became a popular solution to the problem of horizontal scalability [21]. Our use of Cloud computing naturally evolves from our experience of NW-GRID and the Campus Condor pools and portals described above.

Matching technology to applications.

The purpose of a Cloud interface for DSIC is to allow systems as described above to be introduced and removed dynamically and made accessible independently of where they are physically located, e.g. they may be at vendor sites or university partners sites such as NW-GRID or located on the Campus, such as in the Tower, the main Computer Room or the Cockcroft Institute. The interface should allow access to a rich variety of computing and storage systems.

The term Cloud Computing derives from the common depiction in most technology architecture diagrams, of the Internet or IP availability, using an illustration of a cloud. The computing resources being accessed are typically owned and operated by a third party provider on a consolidated basis in data center locations, in our case typically somewhere on the Campus. Target consumers are not concerned with the underlying technologies used to achieve the increase in server capability. The Cloud simply provides services on demand. In our case however consumers will be concerned with the architecture and will target their applications to the most appropriate system available at the time, usually to get best performance. Grid computing is a technology approach to managing a Cloud, and one with which we have a lot of experience building on NW-GRID and projects such as eMinerals [20]. In effect, all clouds are managed by a Grid but not all Grids manage a Cloud. More specifically, a Compute Grid and a Cloud are synonymous, while a Data Grid and a Cloud can be different. We also use the term Campus Grid through which we extend the Cloud to cover pools of desktop systems possibly using novel scheduling algorithms such as using spare cycles and back fill. We could also refer to this as Integrated Computing.

Critical to the notion of cloud computing is the automation of many management tasks. If the system requires human intervention to allocate tasks to resources it's not a Cloud.

A compute cluster can offer cost effective services for specific applications, but may be limited to a single type of computing node with all nodes running a common operating system. Alternatively, the canonical definition of Grid is one that allows any type of processing engine to enter or leave the system dynamically. This is analogous to an electrical power grid on which any given generating plant might be active or inactive at any given time. This can be achieved by physically connecting or removing distributed servers or by virtualisation. Since we support many ``heritage'' applications which are of the traditional MPI parallel type we will keep the notion of clusters and currently support physical rather than virtual resource dynamics. This can however include dual booting of certain servers.

Potential advantages

Potential advantages of any Cloud or Grid computing approach include:

Architecture

Figure 4: General Cloud Computing Architecture
Image 400px-Cloudcomputing

The architecture behind cloud computing, see Figure 4, is a massive network of ``cloud servers'' interconnected as in a Grid. Virtualisation could be used to maximize the utilisation of the computing power available per server, e.g. to better match the overall workload.

A front end interface such as a Portal allows a user to select a service from a catalogue. This request gets passed to the system management which finds the correct resources and then calls the provisioning services which allocates resources in the Cloud. The provisioning service may deploy the requested software stack or application as well, e.g. via licensing on-demand.

We have considered the use of MOAB from Cluster Resources for some of the above tasks [8].

Cloud storage

Cloud storage is a model of networked data storage where data is stored on multiple virtual servers, generally hosted by third parties, rather than being hosted on dedicated servers. Hosting companies operate large data centers, and people who require their data to be hosted buy or lease storage capacity from them and use it for their storage needs. The data center operators, in the background, virtualise the resources according to the requirements of the customer and expose them as virtual servers, which the customers can themselves manage.

We have achieved this in the past using SRB, the Storage Resource Broker from SDSC [7], which provides a virtual file system interface to distributed storage ``vaults''. Physically, the resource may thus span across multiple servers. In our case storage services are provided for users of DSIC compute resources and other local initiaties such as POL and NW-VEC, e.g. via NW-GRID. We note that SRB will in the future become iRODS and that other solutions, such as AFS, are available.

Grid Software

The middleware infrastructure used on the DSIC Campus Grid is a combination of Globus [14], Condor [17] and SRB [7]. We have made a large investment in developing a ``lightweight Grid infrastructure'' building on this middleware and allowing users to do data management and submit computational Grid jobs from their desktop workstations (which might be also resources in the Condor pool). The software which integrates this infrastructure is now referred to as the G-R-Toolkit [23].

The G-R-Toolkit combines the best software developed at SFTC during its e-Science Programme from 2001-2007. It allows users of many applications in computational research to manage their high performance computing and data and information management tasks directly from their desktop systems. Components of G-R-T, some of which are available separately include:

GROWL Scripts - facilitate management of digital certificates and access to datasets on remote Grid resources. SRB Client and RCommands - desktop tools to manage stored datasets and metadata.
RMCS - uses Condor DAGMan to create and enact workflows to integrate data management and remote computation. R, Perl and Python framework - scripting interfaces suitable for many research domains from bio-informatics to chemistry.
AgentX - a sophisticated semantic toolset using domain specific ontologies to link applications with ASCII, XML and DB formats. G-R-T C library - Web service clients appropriate for application programming.

G-R-T uses Grid middleware to perform its tasks on behalf of the user. Well known technology is re-deployed on a dedicated intermediate server (rmcs.dl.ac.uk), including Web Services, Condor, Globus, SRB and MyProxy.

G-R-T will work alongside and extend existing toolkits. It has a ``plug-in'' capability allowing Grid client functionality to be imported into applications such as Matlab, Stata, Materials Studio and others. G-R-T is written in a modular style using Web services to achieve a Service Oriented Architecture, a widely adopted pattern in software engineering. This enables its client side to be re-factored or extended to suit most research requirements.

Components of the G-R-Toolkit were developed by Rob Allan, Adam Braimah, Phil Couch, Dan Grose, John Kewley and Rik Tyer in STFC's Grid Technology Group and their partners, see http://www.grids.ac.uk/twiki/bin/view/GridAndHPC/GRToolkit.

GROWL is the Grid Resources On Workstation Library [22] development of which was funded in a JISC VRE-1 project.

GrowlScripts is a set of useful command line scripts which were developed by John Kewley during and after the GROWL VRE-1 project.

RMCS is the Remote My Condor Submit developed in the NERC funded e-Minerals project [20].

AgentX was developed by Phil Couch in the e-CCP project funded by STFC.

RCommands were developed by Rik Tyer to enhance RMCS by facilitating logging of metadata records associated with computational jobs.

MultiR and SabreR were developed by Dan Grose at University of Lancaster based on GROWL but written in the R language. They have been applied to longitudinal statistical analysis [9], bio-informatics and geography.

RMCS and RCommands have also been deployed by Jonathan Churchill on the National Grid Servic, see http://wiki.ngs.ac.uk/index.php?title=Category:Community_Software.

Deployment

The software is currently deployed by hand as outlined on the Wiki pages at http://www.grids.ac.uk/twiki/bin/view/GridAndHPC.

Acknowledgements

We thank Jonny Smith, formerly at the Cockcroft Institute and now with Tech-X, who set up most of the CI Condor infrastructure and contributed to previous versions of this document.

We thank Mark Calleja and Martin Dove at University of Cambridge for continuing inspiration and encouragement.

Bibliography

1
C. Addison, R.J. Allan, J.M. Brooke and T. van Ark NW-GRID: North West Grid - Final Report (NW-GRID, 2008) http://www.nw-grid.ac.uk

2
R.J. Allan Virtual Research Environments: from Portals to Science Gateways (Chandos Publishing, Oxford, Aug'2009) 234pp ISBN 978 1 84334 562 6 http://www.woodheadpublishing.com/en/book.aspx?bookID=1892&ChandosTitle=1

3
R.J. Allan and J. Bancroft Business Development Opportunities for NW-GRID

4
R.J. Allan, X. Yang, A.L. Braimah, J. Kewley, R. Crouchley, A. Fish, M. Gonzalez, D. Hughes, C.A. Addison, A. Sykes, J.M. Brooke and B. Kattamuri. NW-GRID Developing a Virtual Research Environment for NW Researchers

5
R.J. Allan, R. Crouchley and C. Ingram JISC Information Enviroment Portal Activity Consultation Reports to JISC (Dec'2007)

6
R.J. Allan, T.J. Franks, D. Cable and P.S. Kummer A History of Computing at Daresbury Laboratory http://tardis.dl.ac.uk/computng_history

7
C. Baru, R. Moore, A. Rajasekar and M. Wan The SDSC Storage Resource Broker Proc. CASCON (Toronto, 30/11-3/12/1998) http://www.npaci.edu/DICE/Pubs/srb.pdf

8
D. Cable Using MOAB Cluster Suite for Cluster Management (STFC, 2009) http://www.cse.scitech.ac.uk/disco/publications/moab_final_report.pdf

9
R. Crouchley and R.J. Allan Longitudinal Statistical Modelling on the Grid in ``Handbook on Online Research Methods", ed. N. Fielding (Sage, June 2008) 467-88

10
R. Crouchley, A. Fish, R.J. Allan and D. Chohan Sakai Evaluation Exercise Consultation report to JISC (Dec'2004)

11
R. Crouchley, R.J. Allan, M. Fraser, M. Baker and T. van Ark Sakai VRE Project Demonstrator: Final Report Report to JISC (Dec'2007) 57pp

12
Rob Allan and Xiaobo Yang Using Role Based Access Control in the Sakai Collaborative Framework (STFC, Mar'2008)

13
I. Foster What is the Grid? A Three Point Checklist (Argonne National Laboratory and University of Chicago, July, 2002)

14
I. Foster and C. Kesselman Globus: a Meta-computing Infrastructure Toolkit Int. J. Supercomputer Applications 11 (1997) 115-28 http://www.globus.org

15
C. Jones Institutional Repositories: Content and Culture in an Open Access Environment (Chandos Publishing, Oxford, 2007) ISBN 978 1 84334 307 3

16
P.S. Kummer ICT and the Campus ITSOC paper ITSOC/P19/08 (5/5/2008)

17
M. Litzkow M. Livney and M. Mutka Condor - a Hunter of Idle Workstations Proc. 8th Int. Conf. on Distributed Computing Systems (1988) 104-11 http://www.cs.wisc.edu/condor

18
C. Severance, J. Hardin, G. Golden, R. Crouchley, A. Fish, T. Finholt, B. Kirshner, J. Eng and R.J. Allan Using the Sakai Collaborative Toolkit in e-Research Applications Concurrency: Practice and Experience 19 (12) 1643-52 (2007) DOI:10.1002/cpe.1115

19
J.M.H. Thomas, J. Kewley, R.J. Allan, J.M. Rintelman, P. Sherwood, C.L. Bailey, S. Mukhopadhyay, A. Wander, B.G. Searle, N.M. Harrison, A. Trewin, G.R. Darling and A.I. Cooper Experiences with Different Middleware Solutions on the NW-GRID http://epubs.cclrc.ac.uk/bitstream/1677/CCP1GUI_GROWL.pdf

20
J.M.H. Thomas, R.P. Tyer, R.J. Allan, J.M. Rintelman, P. Sherwood, M.T. Dove, K.F. Austen, A.M. Walker, R.P. Bruin and L. Pettit. Science carried out as part of the NW-GRID Project using the e-Minerals Infrastructure http://epubs.cclrc.ac.uk/bitstream/1678/rmcs_and_science.pdf

21
The Cloud Computing Portal is a public database of cloud computing providers, news and resources. http://cloudcomputing.qrimp.com/portal.aspx

22
Growl and Growl Scripts Web site http://www.growl.org.uk

23
G-R-Toolkit Web site http://www.grids.ac.uk/twiki/bin/view/GridAndHPC/GRToolkit

24
The Hartree Centre http://www.cse.scitech.ac.uk/events/Hartree_Summary/

About this document ...

Introduction to the Daresbury Science and Innovation Campus Grid

This document was generated using the LaTeX2HTML translator Version 2008 (1.71)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -local_icons -split 3 -html_version 4.0 dsic_grid

The translation was initiated by Rob Allan on 2010-08-31


Rob Allan 2010-08-31