“Are HPC and Big data mutually exclusive”?
In the nineties and early 2000, getting a bunch of computers together to process a single compute job was called HPC. Today, getting a bunch of computers together to process a single compute job is called Big Data.
However, there are some differences:
- Tools. HPC tools tend to center around platforms for distributing processing while using commodity distributed storage. They would use frameworks like PVM and MPI. Big Data tools generally center around Hadoop and its ecosystem.
- Paradigm. HPC tends to be processing centric. The focus was dividing, distributing and re-assembling workloads. Big Data tends to be storage centric. The focus was distributing the data across the cluster first then exploiting locality for processing.
- Durability. HPC projects didn’t tend to put focus on durability of data. Big Data via Hadoop has 3x replication built-in that puts emphasis on data durability.