Overview




The UK National Grid Service

Rob Allan and Rob Crouchley

The National Grid Service, NGS, was established in late 2003 and came into operation in early 2004. It followed work done by members of the Grid Engineering Task Force in which the Regional e-Science Centres evaluated middleware such as Globus and SRB to link their existing computational and data resources. Many issues of large-scale deployment were identified and addressed during this work enabling a successful service to be implemented quickly.

By Easter 2003 prototype software had been developed and tested for: Virtual Organisation management, usage accounting, portal access and functional integration testing.

The NGS differentiated itself from the ETF's previous "Level 2" Grid, upon which a number of applications had been tested, by including some new dedicated resources. These were funded by JISC's JCSR - the Joint Committee for the Support of Research. (JISC is the UK's Joint Information Systems Committee which provides resources and services for HE and FE institutions.) The four new systems purchased will be described in the next few pages. They complemented the supercomputing facilities of CSAR (at Manchester Computing) and HPCx (at Daresbury Laboratory) to form the core NGS. Other sites were asked to bid to provide additional services.

The NGS is only accessible using Grid technology - this is an incentive for researchers to learn to use the Grid. Its functionality will increase over time and as usage becomes more widespread. The NGS will be linked up to other international Grid resources - several experiments to demonstrate this have already taken place.

Further information is provided at URLs:

NGS: National Grid Service - http://www.ngs.ac.uk
GOSC: Grid Operations Support Centre - http://www.grid-support.ac.uk
ETF: Grid Engineering Task Force - http://www.grids.ac.uk/ETF




XML Content. Presentation created by R.J. Allan
Hand Knitted Software 2005

Computational Resources on the NGS




The two new computational resources of the NGS were installed at University of Oxford and the White Rose Grid. The latter comprises a consortium of the Universities of Leeds, Sheffield and York and the new computer is installed at Leeds.

These resources are multi-processor "Beowulf" style computers, also referred to as "commodity clusters". They allow for execution of an application on a single dual-processor 3GHz Xeon node or execution of a parallel application across up to the full 64 nodes (128 processors). Batch processing is the norm. Technical information about Beowulf computers is given in a tutorial on the ReDReSS Web site . These resources are intended for medium-scale compute-intensive jobs and for development of applications which might run on the supercomputer services at CSAR and HPCx.

After a tendering excercise via the European Journal, these compute systems and also the data systems for the NGS were purchased from ClusterVision in Amsterdam. They were assembled by that firm with the specified software and middleware to form a Grid.

The full specification of the compute nodes includes:

The software and middleware installed on these systems includes:




Presentation created by R.J. Allan
Hand Knitted Software 2005

Data Resources on the NGS




In addition to the computational resources described above, two data clusters were installed, one at University of Manchester alongside the CSAR service (see below) and one at CCLRC's Rutherford Appleton Laboratory.

These clusters each comprise 20 dual processor nodes identical to those of the compute clusters plus 18TB RAID disc space. They are intended for data-intensive applications, such as post-processing or statistical analysis of large datasets.

The full specification includes:

The software and middleware installed on these systems includes:




Presentation created by R.J. Allan
Hand Knitted Software 2005

Supercomputing Services on the NGS




The non-core services are part of the NGS in terms of a cohesive Grid where users who have access to these resources can interoperate with the NGS.

The CSAR Services

The CSAR supercomputing service hosted at Manchester Computing has a number of large-scale compute servers. Of these, the 512-processor SGI Altix 3700 system called Newton is available via the NGS. A total of 384GB memory is included. The system is however normally split into four partitions, each running Linux as a 64-bit operating system. To gain access to this system the users must be allocated grant funding - it is not a free service.

For further information about CSAR, see http://www.csar.cfs.ac.uk.

The HPCx Service

The HPCx supercomputing service hosted at Daresbury Laboratory is a 1600 processor IBM Regatta system which is available via the NGS. The total memory capacity of the system is 1.6TB. HPCx runs AIX5 which is a 64-bit operating system. To gain access to this system however the users must be allocated grant funding - it is not a free service.

For further information about HPCx, see http://www.hpcx.ac.uk.




Presentation created by R.J. Allan
Hand Knitted Software 2005

Who can use the NGS?




The NGS is to be used by anyone for scientific and academic research purposes. You will be asked to provide a short summary of the work you are doing, what NGS resources you require and how you intend to use those resources.

Applications for access to the National Grid Service can be made via the Web site and selecting the link "Apply for Access". To register you must agree to the "Acceptable Use Policy", and have a digital X.509 certificate. A certificate can be obtained from the Grid Support Cenre by following the process indicated on their Web site. During the NGS registration process you will be asked to provide the distinguished name (DN) from your certificate.

On approval, your registration will give you access to the core nodes of the NGS service. At present these are the Manchester, Oxford, WRG Leeds and CCLRC-RAL nodes. However, application does not automatically give access to the CSAR and HPCx services as these (non-core) services operate their own access mechanisms which require funding from a Research Council grant or other acceptable source. You will therefore need to apply for access to these resources directly.




Presentation created by R.J. Allan
Hand Knitted Software 2005

So what could the NGS do for Social Scientists?




There are four areas in which the NGS can make a contribution to e-Social Science: applications, data management, methodology and computational culture. These are discussed sepately below and will be gone into in much more depth in other tutorials host by both NCeSS and ReDReSS.

Applications

The complexity of socio-economic processes makes the use of the NGS particularly relevant for social scientists. This is the case for managed systems, such as the integration of real-time data to assist with the optimal utilisation of transport resources. Work on distributed simulation would benefit from these facilities which will enable the high speed simulation of large systems; similarly it will be possible to improve the efficiency of labour markets with better matching of workers to jobs. This will involve the joint modelling of worker and employer search with employment service data. In the study of natural systems, social scientists are developing evidence-based social and economic theory, e.g. why are some individuals more at risk of unemployment and social exclusion and similarly with recidivism?

This research is motivated by a desire to determine causality, and involves:

  1. identifying the various factors which influence the behaviour or outcome of interest and quantifying their effects;
  2. controlling for all the different confounding factors which would otherwise result in spurious relationships and misleading results.

Unfortunately, randomised experiments are not feasible, as we cannot randomly e.g. allocate individuals to different levels of training/ support in order to evaluate drug rehabilitation programmes, or to different levels of educational attainment in order to determine the effect of education on subsequent labour market behaviour. The NGS provides the equipment that will allow scientists to build comprehensive models of stochastically complex processes, provide a sensitivity analysis of the results and use simulation to explore the consequences of changes in policy on subsequent behaviour and the environment.

Data

The complexity of social survey data makes the issues of data curation and metadata management particularly difficult. The complexity of social surveys arises from: non-ignorable missing items (some individuals refuse to answer important questions), non-ignorable missing individuals (some important individuals refuse to participate, e.g. the socially excluded), measurement error (the answers to some questions are systematically distorted, e.g. household income) and the impact of the survey design (e.g. not all multistage cluster designs are orthogonal to the behaviours of interest while other designs create correlations in individual behaviour). Existing databases are under-utilised, with few researchers having sufficient resources to fully understand the complexities within the data. Social scientists therefore need procedures to manage the metadata for these complex surveys so that they can quickly decide on the most appropriate data sets for their research purposes. Scientists also want to integrate data from multiple sources, e.g. the creation and linkage of micro scale, spatially and temporally interpolated labour market data in a study of the factors determining the duration of individual unemployment spells or the real time continuous data from a large network of sensors. The replication of findings is a key component of good science. It currently involves the visual comparison of results from the relevant analyses. This process can be formalised in e-Science by developing data base management procedures that allow the same or similar analysis (meta analysis or comparative analysis) to be performed by the NGS on all the relevant data sets simultaneously.

Methodologies

Much of the quantitative technology presently used in the social sciences is old, dating back to the 1960s and 1970s. Many of the assumptions in this technology were made in order to minimise computation. These limitations need no longer apply as the NGS enables the use of computationally intensive statistical and modelling procedures that enable us to disentangle the effects of state dependence (everything depends on everything else), heterogeneity (there are always unobservable effects present) and non stationarity (the nature of the systematic relationships change over time) from the observable controls. Other developments will occur in: non-parametric and semi-parametric methods, the bootstrap, fuzzy logic, data mining and knowledge discovery. Qualitative researchers can use the NGS to provide visualisation and virtual reality to assist with the interpretation of data sets and the reconstruction of physical and social objects, structures or phenomena. The NGS will allow scientists to develop new methodologies that will enable them to address increasingly complex substantive research questions.

Computational culture

Social scientists currently lack extensive programming skills. The uptake of the NGS will help to increase the computational skill base of key social scientists so that they can tackle research problems not addressed by existing software.

The NGS will play a key role in developing middleware appropriate to e-Social Science, building on the Grid Starter Kit and related network activities. It is very difficult for many scientists to attain the desired skill levels in computing to deal with the complexities of contemporary GRID platforms such as Globus. OGSA is a step in the right direction with its emphasis on Grid services, but the level of complexity in dealing with distribution is still there. This argument is particularly pertinent for social scientists given the skills gap between such disciplines and computer science.




Presentation created by R.J. Allan
Hand Knitted Software 2005