The High-Performance Computing (HPC) world is evolving rapidly. New workloads, such as pattern recognition, speech, video, and text processing, speech and facial recognition, deep learning, machine learning, and genomic sequencing, are being executed on HPC systems. The main motivation behind this evolution is economic and technical. As HPC systems became more powerful, agile, and less costly, they can be used for applications that have never had access to a high scale, low-cost infrastructure.
The cloud has accelerated this evolution because it is scalable and elastic, allowing self-service provisioning of one to thousands of processors in minutes. As a result, HPC users are preferring cloud storage with new and expanding application requirements and are seeing reduced time-to-results, faster speed to deployment, greater architectural flexibility, and reduced costs. Cloud computing is pushing HPC at the pace of computing innovation as users benefit from advances in microprocessors, GPUs, networking, and storage.
The cloud and the evolving HPC world
The HPC world has a need for more processing capability, which is driving HPC system development. The current HPC architecture, the cluster, was created for a common architecture and operating system that had price-performance benefits far beyond proprietary systems. Clusters with commodity processors were then doing production work for a number of companies and labs, which led to the explosion of clusters in HPC. Clusters have come a long way and have greatly increased access to HPC resources at an affordable price. This includes both embarrassingly parallel applications and tightly coupled applications.
Issues with traditional HPC fixed architectures.
The HPC cluster architecture is a relatively fixed architecture with a set of servers (nodes). Each server has a small amount of internal storage (if any at all), connected by a dedicated network, using software tools to manage user requests for resources. It is rare for any changes to be made to the system, such as adding nodes, processor upgrades, additional node storage, network topology, or technology changes. Once put in place, the vast majority of dedicated cluster systems never change architecture.
The rise of the Hadoop architecture, which addresses a large class of HPC problems, makes this inflexibility an even greater challenge. The Hadoop architecture (also known as the Map-Reduce architecture) calls for nodes with a lot of local storage and only uses TCP networks. The typical on-premises HPC system uses the smallest, least expensive, but reliable drive in each node. For Hadoop workloads, customers often procure a separate system specifically designed for Hadoop workloads. Employing this strategy would create two HPC architectures with conflicting configurations. However, this is unnecessary when cloud computing is the platform, as both rely on commodity systems, dynamically created clusters, and software stacks that are purpose-built for the needs of particular problems.
As HPC and Big data become increasingly ubiquitous and connected, we expect this type of synergy to deliver results that neither discipline could produce alone. We expect the boundaries to continue to blur as we strive towards better products, services, and outcomes.