Computational biology is an important discipline for modern scientific research. In computational biology, the essence of life is revealed by using techniques like mathematical modeling, etc., based on data analysis and relevant theoretical methods. Today, biological data volume and complexity are increasing and data generated in gene research double every 14 months, so it is difficult to deal with these data simply by means of observation and experiments, so they must be dealt with large scale computational modeling technique. Computational demands become more complicated with the increase of biological data and complexity. For example, performance of computation relating to DNA sequencing requires PB level storage and sequence assembly application requires a single node to have a TB-level memory capacity.
BWA, Bowtie, NEWBLER, BFAST, SOAP,MAQ
Velvet, SOAPdenovo, Abyss
DOCK, AUTODOCK, ZDOCK, 3DDOCK
Electronic microscopy 3D reconstruction
EMAN, PROTOMO, SPIDER, I3
Application Performance Features
Because of different software and different algorithms, computational biology applications have different demands for computational resources. Would analyze these demands and the corresponding applications characteristics by taking the frequently used sequence assembly application Velvet, as an example.
Velvet can reach a disk r/w speed of 1GB/s in a single node and a memory usage of 120GB when running a medium size workload. Therefore, it has a great demand for disk r/w, and a single-node and large-capacity memory and high I/O throughput will be the optimum configuration for efficient operation of Velvet application, scaling-out with sequence splicing.
A higher bar shown below indicates a higher demand.