Processing and Archiving large Data Sets
The IT infrastructure provides the means for processing and archiving data acquired by the institute's imaging facilities (in particular (PET- and MRI-Scanner). A single PET study (HRRT) generates approx. 30 gigabytes of raw data which is reconstructed into a quantified image volume (still about 50 MB in size). This effort is carried out by a dedicated compute cluster (see right figure: ElvesCluster). Data bases with MRI image data required for meta studies take up several hundred gigabytes, genomic data is of a similar dimension.
For data processing and archiving, a powerful infrastructure is required:
- Computing Resources
The reconstruction of HRRT data is a computing intensive application, so is the statistical analysis of fMRI data (FSL, SPM). Apart from the compute clusters, the institute employs multi-CPU systems (e.g. Sun V40z, Sun X2200) in a traditional server-client setup for jobs that are not suitable to run locally on modern PC-class systems.
Figure right (computing resources):
Important central services like Email and the institute’s firewall are deployed on Sun V220 systems. Our most powerful compute servers are the Sun V40z (4 x Opteron 250, 16 GB RAM) and the Sun X2200. In contrast, the Sun V490 server is optimized for IO-performance and part of a central file system (SAM-FS).
Efficient Storage with Failover
We use a HSM system (tape robot) for archiving. It uses LTO3-type tapes, which can hold almost a terabyte of data (1024 gigabytes = 1024 x 1024 megabytes) assuming a typical compression rate of 2:1.
- Netzwork Infrastructure
Processing and archiving large amounts of data requires a suitable network infrastructure. The institute’s three buildings are linked via parallel Gigabit Ethernet connections “EtherChannel“) using Cisco 65xx and 45xx series switches. All labs and offices have optical connections to the three server halls. Most workstations and other network devices are connected to mini-switches (fiber-to-the-office). Our heterogeneous network (MS Windows XP, Linux, MacOS X and Solaris) has been segmented into a number of subnets.
For the analysis of tomographical data and for numerical mathematics and computer algebra, a number of commercial software packages are available (institute-wide): Matlab, IDL and Mathematica. In addition, we develop software based on Trolltech’s Qt, see also: VINCI.
Software development is a team effort and Subversion (source code repository) is used for coordination and quality assurance.
The institute features a well-equipped DMZ (“demilitarized zone“) with the usual services for data exchange using the internet.
a group of eight powerful systems which typically work on one task simultaneously. This requires a suitable “parallelization“ of the programs that can run efficiently on a cluster. Depicted is the newer cluster (8 nodes with 2 Xeon PIV CPUs, 2 GB RAM, dual Gigabit Ethernert each) which is mostly used for reconstruction of HRRT data.
Compute and File Servers:
Part of the infrastructure for scientific computing, central file and archiving services, Internet services.
Scientific analysis of our experimental studies requires to access large amounts of data in a fast and secure way. We now provide a capacity of approx. 100 TB in a storage area network (SAN) consisting of StorageTek 6140 Arrays (multiple RAID configurations). Each slot contains one disk system, newer disks have a capacity of roughly 1 TB.