High-performance computing (HPC) workloads have historically been the domain of some of the world’s fastest supercomputers. These are often large-scale, data-intensive workloads split into millions of operations that are run in parallel and use terabytes of data. These complex workloads are dedicated to solving some of humankind’s most challenging problems — weather and climate simulations; seismic modeling; chemical, physics and biological analysis; and more.
随着计算机体系结构的进步, these workloads have increasingly been hosted in very large “scale-out” clusters of high-performance servers. 这些集群需要最新最好的计算, fabric, 内存和存储基础设施来解决可伸缩性问题, 此类关键工作负载的低延迟和性能需求. 虽然服务器cpu在性能和吞吐量方面有所提高, the past several years have seen the bandwidth provided by DDR4 memory become a bottleneck. There is just not enough memory bandwidth to supply the growing number of high-performance cores.
美光DDR5内存和全新的AMD Zen 4服务器架构th 新一代AMD EPYC处理器改变了这一点. Now, server CPUs and memory can be in much better balance to unlock performance and efficiency for the most demanding workloads. DDR5 memory helps organizations reach those insights faster, whether on premises or in the cloud. Consider some of the following proof points generated while testing Micron DDR5 with the latest AMD Zen 4 96-core CPU with an industry-standard HPC workload benchmark. 我们所有的测试结果都显示了两倍的性能改进.
两倍的内存带宽与美光DDR5 + 4th 使用STREAM的Gen AMD EPYC处理器
STREAM1 是一个简单的,众所周知的基准,用于测量HPC计算机中的内存带宽. 它为HPC系统捕获峰值内存带宽
用于此工作负载的软件堆栈
Test results
天气研究及预报(WRF)4 采用美光DDR5运行速度快两倍
这个HPC工作负载代码被天气和气候社区使用, 该模型被广泛应用于气象领域. WRF typically performs well on traditional HPC architectures that support high floating-point processing, 高内存带宽和低延迟网络. 对于这一努力,美国大陆(CONUS)在2.横向分辨率选择5km.
用于此工作负载的软件堆栈
Test setup
Test results
OpenFOAM5 搭载美光DDR5的芯片运行速度快了两倍
OpenFOAM是用于计算流体动力学(CFD)的开源HPC工作负载。, 广泛用于各种行业,以减少开发时间和成本. It simulates physical interactions in applications ranging from consumer product design to aerospace design. One of the simulations included in the data set features a motorbike turbulence simulation. 对于这个模型,OpenFOAM计算了摩托车和骑手周围的稳定气流. OpenFOAM load balances calculations according to the number of processes specified by the user, 然后将网格分解成各个部分进行求解. 求解完成后,将网格和解重组为单个域.
用于此工作负载的软件堆栈
Test setup
Test results
Our tests demonstrated a 2.4倍于OpenFOAM相对增益, which is seen as among the top 5 HPC software platforms with a large open-source community. 广泛应用于高校和科研院所&D centers, the high parallelization nature of the software takes advantage of both memory (increased bandwidth) and CPU features like denser cores.
Molecular dynamics6 搭载美光DDR5的芯片运行速度快了两倍
CP2K is an open-source quantum chemistry tool that can be used for a number of applications, 包括固态生物系统的模拟. CP2K provides a general framework for different modeling methods such as DFT using the mixed Gaussian and plane wave approaches GPW and GAPW. The example that we looked at was linear-scaling density functional theory (DFT) of water (H2O) consisting of 6144 atoms in a 39-cubic-angstrom box (2048 water molecules in total).
用于此工作负载的软件堆栈
Test setup
Test results
Our tests demonstrated a 2.分子动力学的相对增益为1倍, 它可以很好地扩展更多的内核和更多的内存带宽.
Summary
上面的结果只是一个开始—并且只是HPC工作负载的几个示例. 更好地匹配高性能的能力, high-bandwidth memory with the incredible performance offered by new server processors such as the 4th 新一代AMD EPYC处理器将成为高性能计算客户的分水岭. We can expect to see many more such proof points that demonstrate how enterprise data center and cloud operators can use Micron DDR5 on these new platforms to unlock new levels of performance and efficiency. 我们期待在接下来的几个月里与你分享这些. 要了解有关Micron DDR5和数据中心工作负载优势的更多信息,请访问Micron.com/ddr5.
1 我们的STREAM基准 设置为2.50亿矢量大小STREAM基准- AMD 运行与1个CPU系统
2 AMD DDR4 system is an AMD EPYC 7763 64 core with DDR4-3200 MHz fully populated with 64GB RDIMMs
3 AMD DDR5 system is an AMD EPYC 9654 96 core with DDR5-4800 MHz fully populated with 64GB RDIMMs
4 WRF with a 12.5-km CONUS ran for 929 seconds on the DDR4 system and 287 seconds on the DDR5 system while counting storage I/O as well. 上面的例子来自WRF 2.5-km CONUS that ran 2.8533 time steps per second and 1.3567 time steps per second.
5 对于OpenFOAM,我们运行了三个变体:
1004040运行时间在DDR4系统上= 1144秒,在DDR5系统上= 478秒
5b 1084646运行时间= DDR4系统上的1,633秒,DDR5系统上的698秒
5c 1305252 runtimes = 2,522 seconds on DDR4 system and 1,091 seconds on the DDR5 system
6 分子动力学工作负载运行为2,在DDR4系统上为519秒,242 seconds on the DDR5 system