The use of sophisticated semantic algorithms implies much better results in many IR applications than simple text indexing. Applying these methods to large corpora, substantial computing capacity is needed. In fact, experimental handling of data on a Terabyte scale requires a supercomputing infrastructure.

The hardware infrastructure of the Institute of Software Technology and Interactive Systems is one of the most powerful systems worldwide that deals with semantic processing of text and audio. It is comprised of the following elements:

Large Data Collider (LDC)
  • 320 Gbytes of main memory
  • 80 Itanium CPUs running at 1.4 GHz
  • 1 SGI Infinite Storage
  • 12 mptfc fibre channel controllers
  • 2 Broadcom BCM5704 Gigabit Ethernet interfaces

To fully exploit the potential of the LDC, parallel C, C++ and FORTRAN code are recommended. The Itanium processors are designed for optimal usage of the large shared memory in parallel computing, but are much less performant for serial or Java applications.

Supercluster - GPU server

Provides a total of 1792 NVIDIA GPU cores, for a total of 4.12 TeraFLOP single precision processing power (2.06 TeraFLOP double precision). Paired with a total of 24 Gigabytes of high-speed DDR5 RAM running at 1.5 GHz, this computer allows for fast computation of massive parallelisable processes. Support in Matlab and CUDA (a C++ derivative) allows for supporting many resource-intensive applications in data mining, machine learning and information retrieval.

Detailed specification:
  • Host System:
    • 12 cores (2 hex-core Intel Xeon X5680 @ 3.33GHz)
    • 96 Gigabytes of Main Memory (1333 Mhz)
  • GPU Supercomputer
    • 4 Tesla 2070 GPUs, each GPU provides
      • 448 CUDA cores
      • 6 GB memory (1.5 Ghz, Bandwith 144 GB/sec)
      • 1.03 TeraFLOP single precision, 515 GigFLOP double precision

More details on the NVIDIA Tesla GPUs can be found at

Supercluster - Auxiliary Server

For more traditional computations that are not massively parallelisable to the GPU supercomputer, or need a lot of main memory, the auxiliary server provides a 32 core computing environment with 256 Gigabyte of RAM.

Detailed specification:
  • 32 cores (4 opt-core Intel Xeon X7560 @ 2.27Ghz)
  • 256 Gigabyte of Main Memory (1333 Mhz)
  • 750 Gigabyte of fast 15.000 rpm discs for system and data storage
Supercluster - Storage Server:

A storage server is provided for the GPU and auxiliary servers.
It currently provides a total of 36 Terabyte, and can be connected to the other servers via fibre channel as SAN, or via a 10 Gigabit link as NAS. The storage server can be extened to 48 Terabyte of storage; with additional shelfs, the capacity can be increased to 650 Terabyte.

SGI IS4500 Storage Server

This server provides bulk storage capabilities for large-scale data analysis. The current storage capacity consists of 94 TB on fibreChannel and SATA discs