The use of sophisticated semantic algorithms implies much better results in many IR applications than simple text indexing. Applying these methods to large corpora, substantial computing capacity is needed. In fact, experimental handling of data on a Terabyte scale requires a supercomputing infrastructure.
The hardware infrastructure of the Institute of Software Technology and Interactive Systems is one of the most powerful systems worldwide that deals with semantic processing of text and audio. It is comprised of the following elements:
Large Data Collider (LDC)To fully exploit the potential of the LDC, parallel C, C++ and FORTRAN code are recommended. The Itanium processors are designed for optimal usage of the large shared memory in parallel computing, but are much less performant for serial or Java applications.
Supercluster - GPU serverProvides a total of 1792 NVIDIA GPU cores, for a total of 4.12 TeraFLOP single precision processing power (2.06 TeraFLOP double precision). Paired with a total of 24 Gigabytes of high-speed DDR5 RAM running at 1.5 GHz, this computer allows for fast computation of massive parallelisable processes. Support in Matlab and CUDA (a C++ derivative) allows for supporting many resource-intensive applications in data mining, machine learning and information retrieval.
Detailed specification:More details on the NVIDIA Tesla GPUs can be found at http://www.nvidia.com/object/personal-supercomputing.html
Supercluster - Auxiliary ServerFor more traditional computations that are not massively parallelisable to the GPU supercomputer, or need a lot of main memory, the auxiliary server provides a 32 core computing environment with 256 Gigabyte of RAM.
Detailed specification:
A storage server is provided for the GPU and auxiliary servers.
It currently provides a total of 36 Terabyte, and
can be connected to the other servers via fibre channel as SAN, or via a 10 Gigabit link as NAS. The storage server can
be extened to 48 Terabyte of storage; with additional shelfs, the capacity can be increased to 650 Terabyte.
This server provides bulk storage capabilities for large-scale data analysis. The current storage capacity consists of 94 TB on fibreChannel and SATA discs