By Jason Stowe
High-performance computing (HPC) has come a long way for life sciences. Twenty years ago, expensive parallel supercomputers were required to render proteins in three dimensions (3-D) and run software that helped researchers understand their shape. Now 3-D rendering can be done on graphics cards in workstations, laptops, and even phones.
Compute clusters using many commodity servers have replaced expensive parallel supercomputers, but the data and problems being solved have grown to demand increased compute capacity. This leaves companies with large capital investments of fixed-size clusters that have all the traditional challenges of maximizing utilization, minimizing operational costs, and shortening time-to-result for users.
Cloud HPC Clusters-As-A-Service
Cloud computing promises to help solve these issues. It makes provisioning servers easier and cost-efficient. The cloud delivers virtualized servers and storage via the Internet, at large scale, where you’re billed only for what you use. However, getting started working in the cloud is not easy. For example, Amazon EC2 (Amazon Elastic Compute Cloud is a Web service that provides resizable compute capacity in the cloud) requires programming to provision nodes and administrators and security staff to manage the servers through its APIs.
This provisioning challenge has led to cloud HPC clusters built upon infrastructure providers like Amazon EC2. Instead of building out a datacenter, procuring servers, networking equipment, and hiring IT personnel, companies can tap into these compute clusters as a service, which are provisioned automatically.
Cloud HPC cluster users can start up clusters without having to worry about putting in place various applications, operating systems, security, encryption, and other software. Scientists can create clusters that automatically add servers when work is added and turn the servers off when the work is completed. This enables life science researchers to run calculations only when they need compute power.
Increasing Data, Computation, Time-To-Results
Modern scientific instruments, like mass spectrometers for proteomics or next-generation genomic sequencers, require this compute power. The data generated when sequencing a human genome is more than a terabyte in size. This scale of data puts the project in a unique place: It is large enough to be unwieldy to analyze for the labs that generate it, but small enough that with some data scheduling, it can be moved over the Internet. As this data comes off an instrument, it needs to be processed using differing numbers of computers and return results as quickly as possible.
This poses a problem for traditional, fixed clusters that cannot grow or shrink to efficiently run the calculations. It also increases costs. A traditional, in-house compute cluster with 30% utilization costs three times the amount of money to run per calculation consumed as a fully utilized cluster. However, fully utilized clusters are up to 60 times slower to complete the calculations because the cluster is not large enough to run the calculations as fast as possible. For drug discovery processes, clinical trial design, or bioinformatics, this 60-times-slower time-to-result translates to slower time-to-market, which costs money.
HPC Before Cloud Computing
To understand these costs, let us look at HPC clusters before cloud computing. When buying a cluster, researchers would size it to complete their largest set of calculations efficiently in a time they specified. With storage, the same planning and sizing would occur to ensure that enough space existed to hold the working data and final results of calculations.
Purchasing the cluster required large up-front capital expenditures for the machines and filers required to do calculations, as well as lengthy procurement and provisioning processes. So when a cluster is operational for the first time, it isn’t full to capacity. After the cluster is in production, researchers have a fixed-sized cluster to do their calculations. If they have a 4,000 compute-hour calculation to run on a 40-core cluster, it will always take at least 100 hours to get the result.
Faster Development Times At Less Cost
For compute clusters as a service, the math is different: Having 40 processors work for 100 hours costs the same as having 1,000 processors run for 4 hours. Yet with 1,000 processors, the result comes back the same day, rather than four days later. This kind of disruptive decrease in time-to-result can lead to shorter times to develop products, discover drugs, or isolate important genes in a genome. The results also come back tens of times quicker at no additional cost.
This key shift in high-performance calculations also applies to storage. A hard disk capable of storing a terabyte can be bought for $150 at a local office store. However, filers with redundancy, deduplication, and hundreds of terabytes of storage can cost $2,000 or more per terabyte. Traditional filers cost 10 times more per terabyte for large capacities and reliability than the cost of hard drives bought off the shelf. In the cloud, all storage is redundant and highly available. The cost per terabyte goes down at large scales.
Improving Time To Market
These advantages create great incentives that improve time-to-market and reduce costs. As an example, Varian Inc. is a leading producer of scientific instruments and ion traps. Researchers at the company run calculation-intensive Monte Carlo simulations to help develop better future products. In one instance, a simulation for a mass spectrometer was scheduled to take several thousand compute hours and nearly six calendar weeks on an internal pool of processors. With product design and conference deadlines looming, it needed to get results faster.
Rather than purchasing a traditional cluster, Varian Inc. was able to run this calculation using a cloud HPC cluster service on Amazon EC2 that helps companies run calculations easily and securely. The elastic cluster added nodes to run its calculations and stopped the servers when there was no more work left to compute. Utilizing a service to automate provisioning, security, encryption, administration, and support made using the cluster cost-effective and easy to use. With the cloud HPC cluster, this six-week calculation ran in less than one day.
Life Sciences Applications
For researchers in life sciences, these clusters can support all the applications that users expect on internal clusters. Both open-source and proprietary software applications can be run in the cloud. These applications must generally be tuned to work optimally in the cloud, which has a more flexible architecture than fixed internal clusters.
As an example, Schrödinger, a leading supplier of molecular-simulation and computational-chemistry software to the pharmaceutical industry, made its Glide docking program available on the cloud. Glide is used for virtual screening, a process that determines potential drug candidates from a large database of compounds based upon their fit with a given target site.
Shortening Product Pipelines
Molecular modeling and simulation are central to drug discovery, although it is often rate-limiting. Local computational resources are often insufficient to perform the massive burst-mode computations needed to bring a drug project forward in a timely fashion. Recently, Schrödinger decided to show how on-demand availability of large, secure, and trouble-free cloud computational resources can fill this gap. As test data, it screened 1.8 million candidate compounds against a target site to find potential matches. Using a 600-processor cloud HPC cluster, 18 months of screening was completed in 36 hours.
Cloud HPC clusters enable scientists to only consume calculations when they need them and pay for what they use. The scalability of the cloud allows them to size HPC clusters to their jobs to minimize the time to result. Storage in the cloud can get cheaper with economies of scale, not more expensive like traditional filers. Yet there are more benefits to HPC in the cloud.
Cloud HPC clusters are virtualized, meaning it is possible to provision repeatable clusters that have standardized images for qualification purposes and reliable application environments every time they are provisioned. Disaster recovery scenarios are easier to manage because the entire cluster environment is repeatable through virtualization.
Cloud HPC clusters are the same every time they are provisioned from their virtual machine images. Security can be handled in a consistent way, with guaranteed encryption and encryption-key management for sensitive data and applications at rest on disk or over the network between cluster servers. As an example, hard disks containing user data can be encrypted using the advanced encryption standard (AES) 128-bit, 192-bit, or 256-bit encryption. Data communicated to the cluster via Web services use the same SSL encryption that protects credit card information for holiday purchases.
HPC’s future is clear: It will be in the clouds. Calculations play an important role in helping researchers efficiently design better drugs, run more efficient clinical trials, and develop better crops. Increasing data from scientific instruments is requiring analysis for the large, transferrable data being generated. There is increased pressure to lower costs and speed up product development timelines for crops, clinical trials, and cures for diseases. These factors have led to the creation of cloud HPC clusters that have helped pharmaceutical companies perform calculations that led to better scientific results.
About The Author
Jason Stowe is the founder and CEO of Cycle Computing, a provider of high-performance computing (HPC) and open source technology in the cloud. A seasoned entrepreneur and experienced technologist, Jason attended Carnegie Mellon and Cornell Universities.