Already a member?
Sign in
| Version | User | Scope of changes |
|---|---|---|
| May 7 2008, 9:23 PM EDT (current) | laytonjb | 15 words added, 9 words deleted |
| May 6 2008, 8:15 PM EDT | laytonjb |
Changes
Key: Additions Deletions
Estimating Cluster Performance
How you can estimate the Top500 performance of your cluster
Quick Introduction to Performance
I've been asked a few times about how to estimate the Top500 performance of a cluster. So I thought I would discuss this a little bit and also give you a spreadsheet to help you estimate the performance.
I won't discuss the Top500 benchmark in this blog. It is what it is. It is very useful and not useful at all, both at the same time. Nevertheless people are very interested in the Top500 performance of their cluster (one of the reasons is that it can help justify the cost of the cluster.cluster). So it's quite common to compute the theoretical performance of the cluster (either in GFLOPS or TFLOPS) and analso estimate of the performance of the cluster when running the Top500 benchmark (the actual name is the HPL benchmark). The good news is that it's fairly easy to estimate performance.
The current processor family from both AMD and Intel are capable of 4 operations per clock cycle (also called 4 ops per clock or 4 ops/clock). The AMD Opteron dual-core and the old Intel P4 Xeon chips are capable of 2 operations per clock cycle (what people call 2 ops per clock). So to determine performance you just multiple the clock speed of the chip by the number of operations per clock and then multiple by the number of cores. This will produce the theoretical performance. If the clock speed is in GHz, then the theoretical performance will be in GFLOPS (billions of floating-point operations per second). If the clock speed is in MHz, then the theoretical performance will be in MFLOPS (millions of floating-point operations per second).
Here's a quick formula:
Theoretical Performance (GFLOPS) = (4 ops/clock) * (clock speed in GHz) * (number of cores per socket) * (number of sockets per node) * (number of nodes)
This formula assumes that the cores are of the newer variety.variety (4 ops/clock).
HPL Performance
Now that we know the theoretical performance of the cluster it's pretty darn easy to estimate the performance of the cluster running HPL. I've done a fairly extensive analysis of the performance of the clusters in the Top500 as a function of their interconnect (network). TheThis analysis is the subject of another blog.blog Butbyt itself, but as a rule of thumb, clusters connected with Gigabit Ethernet (GigE) achieve about 50% of their theoretical performance. Infiniband clusters achieve about 75% of their theoretical performance. But both of these efficiencies are rulsrules of thumb. On the Top500 list I can find GigE clusters that are almost as high as 65% efficiency or as low as 24%. For Infiniband I can find efficiencies as high as 85% and as low as 29% (that number is really low). However, the two rules of thumb, 50% for GigE and 75% for IB, are pretty good estimates.
If you click on the comments link for this blog it will take you to the blog where you can download a simple spreadsheet. The spreadsheet assumes 4 ops/clock and allows you to input the clock speed of the core, number of cores per socket, and number of sockets per node. It will tell you the theoretical performance in GFLOPS and TFLOPS (trillions of floating point operations per second) and the estimateestimated performance for GigE and IB. It's really simple, so don't feel like you can't change things around.
But remember that Top500 performance does equal perfomance on your particular code.
How you can estimate the Top500 performance of your cluster
Quick Introduction to Performance
I've been asked a few times about how to estimate the Top500 performance of a cluster. So I thought I would discuss this a little bit and also give you a spreadsheet to help you estimate the performance.
I won't discuss the Top500 benchmark in this blog. It is what it is. It is very useful and not useful at all, both at the same time. Nevertheless people are very interested in the Top500 performance of their cluster (one of the reasons is that it can help justify the cost of the cluster.cluster). So it's quite common to compute the theoretical performance of the cluster (either in GFLOPS or TFLOPS) and analso estimate of the performance of the cluster when running the Top500 benchmark (the actual name is the HPL benchmark). The good news is that it's fairly easy to estimate performance.
The current processor family from both AMD and Intel are capable of 4 operations per clock cycle (also called 4 ops per clock or 4 ops/clock). The AMD Opteron dual-core and the old Intel P4 Xeon chips are capable of 2 operations per clock cycle (what people call 2 ops per clock). So to determine performance you just multiple the clock speed of the chip by the number of operations per clock and then multiple by the number of cores. This will produce the theoretical performance. If the clock speed is in GHz, then the theoretical performance will be in GFLOPS (billions of floating-point operations per second). If the clock speed is in MHz, then the theoretical performance will be in MFLOPS (millions of floating-point operations per second).
Here's a quick formula:
Theoretical Performance (GFLOPS) = (4 ops/clock) * (clock speed in GHz) * (number of cores per socket) * (number of sockets per node) * (number of nodes)
This formula assumes that the cores are of the newer variety.variety (4 ops/clock).
HPL Performance
Now that we know the theoretical performance of the cluster it's pretty darn easy to estimate the performance of the cluster running HPL. I've done a fairly extensive analysis of the performance of the clusters in the Top500 as a function of their interconnect (network). TheThis analysis is the subject of another blog.blog Butbyt itself, but as a rule of thumb, clusters connected with Gigabit Ethernet (GigE) achieve about 50% of their theoretical performance. Infiniband clusters achieve about 75% of their theoretical performance. But both of these efficiencies are rulsrules of thumb. On the Top500 list I can find GigE clusters that are almost as high as 65% efficiency or as low as 24%. For Infiniband I can find efficiencies as high as 85% and as low as 29% (that number is really low). However, the two rules of thumb, 50% for GigE and 75% for IB, are pretty good estimates.
If you click on the comments link for this blog it will take you to the blog where you can download a simple spreadsheet. The spreadsheet assumes 4 ops/clock and allows you to input the clock speed of the core, number of cores per socket, and number of sockets per node. It will tell you the theoretical performance in GFLOPS and TFLOPS (trillions of floating point operations per second) and the estimateestimated performance for GigE and IB. It's really simple, so don't feel like you can't change things around.
But remember that Top500 performance does equal perfomance on your particular code.

