Memory

使用美光DDR5和4提升HPC工作负载th Gen AMD EPYC Processors 

作者:Krishna Yalamanchi, Sudharshan Vazhkudai - 2022-11-10
The goal of the AMD and Micron collaboration is to deliver best-in-class user experiences across client and data center platforms. To that end, 这两家公司在奥斯汀有一个联合服务器实验室, working to ensure we are reducing time to validate server memory and performing joint workload testing throughout validation and launch. In this blog, we look at some common HPC-workload benchmark results that use Micron DDR5 data center memory with 4th Gen AMD EPYCTM 这两种沙巴体育结算平台的处理器都已经发货了.

High-performance computing (HPC) workloads have historically been the domain of some of the world’s fastest supercomputers. These are often large-scale, data-intensive workloads split into millions of operations that are run in parallel and use terabytes of data. These complex workloads are dedicated to solving some of humankind’s most challenging problems — weather and climate simulations; seismic modeling; chemical, physics and biological analysis; and more.

随着计算机体系结构的进步, these workloads have increasingly been hosted in very large “scale-out” clusters of high-performance servers. 这些集群需要最新最好的计算, fabric, 内存和存储基础设施来解决可伸缩性问题, 此类关键工作负载的低延迟和性能需求. 虽然服务器cpu在性能和吞吐量方面有所提高, the past several years have seen the bandwidth provided by DDR4 memory become a bottleneck. There is just not enough memory bandwidth to supply the growing number of high-performance cores.

美光DDR5内存和全新的AMD Zen 4服务器架构th 新一代AMD EPYC处理器改变了这一点. Now, server CPUs and memory can be in much better balance to unlock performance and efficiency for the most demanding workloads. DDR5 memory helps organizations reach those insights faster, whether on premises or in the cloud. Consider some of the following proof points generated while testing Micron DDR5 with the latest AMD Zen 4 96-core CPU with an industry-standard HPC workload benchmark. 我们所有的测试结果都显示了两倍的性能改进. 

两倍的内存带宽与美光DDR5 + 4th 使用STREAM的Gen AMD EPYC处理器

STREAM1 是一个简单的,众所周知的基准,用于测量HPC计算机中的内存带宽. 它为HPC系统捕获峰值内存带宽

用于此工作负载的软件堆栈
  • Alma 9 Linux kernel 5.14
  • STREAM.f  11-29-2021 release
  • Test setup
  • DDR4 system 3rd 代AMD EPYC处理器64核和3.7 GHz; DDR4 3200 MHz system2 完全填充64GB RDIMM
  • DDR5 system 4th 代AMD EPYC处理器,96核和3核.7 GHz; DDR5 4800 MHz system3 完全填充64GB RDIMM

  • Test results
  • 是单插槽DDR5系统378 GB/s内存带宽的两倍
  • This result means that customers can run larger artificial intelligence/machine learning (AI/ML) projects or do more HPC computations with increased memory bandwidth from DDR5.

  • 天气研究及预报(WRF)4 采用美光DDR5运行速度快两倍

    这个HPC工作负载代码被天气和气候社区使用, 该模型被广泛应用于气象领域. WRF typically performs well on traditional HPC architectures that support high floating-point processing, 高内存带宽和低延迟网络. 对于这一努力,美国大陆(CONUS)在2.横向分辨率选择5km.

    用于此工作负载的软件堆栈 
  • Alma 9 Linux kernel 5.14 
  • WRF 2.3.5 & 4.3.3 
  • Open MPI v4.1.1

  • Test setup
  • DDR4 system 3rd 代AMD EPYC处理器64核和3.7 GHz; DDR4 3200 MHz system2 完全填充64GB RDIMM
  • DDR5 system 4th 代AMD EPYC处理器,96核和3核.7 GHz; DDR5 4800 MHz system3 完全填充64GB RDIMM

  • Test results
  • We were able to execute 1.使用Micron DDR5和4每秒3567个时间步th 代AMD EPYC处理器 与2相比.8533 time steps per second.
  • Faster execution time means weather forecasters can either choose bigger datasets or run more models. 这两项努力都改善了预测.

  • OpenFOAM5 搭载美光DDR5的芯片运行速度快了两倍

    OpenFOAM是用于计算流体动力学(CFD)的开源HPC工作负载。, 广泛用于各种行业,以减少开发时间和成本. It simulates physical interactions in applications ranging from consumer product design to aerospace design. One of the simulations included in the data set features a motorbike turbulence simulation. 对于这个模型,OpenFOAM计算了摩托车和骑手周围的稳定气流. OpenFOAM load balances calculations according to the number of processes specified by the user, 然后将网格分解成各个部分进行求解. 求解完成后,将网格和解重组为单个域.

    用于此工作负载的软件堆栈
  • OpenFOAM CFD软件 (v8)摩托车网格尺寸为600 x 240 x 240
  • Alma 9 Linux kernel 5.14 
  • Open MPI v4.1.1

  • Test setup
  • DDR4 system 3rd 代AMD EPYC处理器64核和3.7 GHz; DDR4 3200 MHz system2 完全填充64GB RDIMM
  • DDR5 system 4th 代AMD EPYC处理器,96核和3核.7 GHz; DDR5 4800 MHz system3 完全填充64GB RDIMM

  • Test results
    Our tests demonstrated a 2.4倍于OpenFOAM相对增益, which is seen as among the top 5 HPC software platforms with a large open-source community. 广泛应用于高校和科研院所&D centers, the high parallelization nature of the software takes advantage of both memory (increased bandwidth) and CPU features like denser cores.

    Molecular dynamics6 搭载美光DDR5的芯片运行速度快了两倍

    CP2K is an open-source quantum chemistry tool that can be used for a number of applications, 包括固态生物系统的模拟. CP2K provides a general framework for different modeling methods such as DFT using the mixed Gaussian and plane wave approaches GPW and GAPW. The example that we looked at was linear-scaling density functional theory (DFT) of water (H2O) consisting of 6144 atoms in a 39-cubic-angstrom box (2048 water molecules in total).

    用于此工作负载的软件堆栈
  • H2O-DFT-LS.NREP4 & H2O-DFT-LS
  • Alma 9 Linux kernel 5.14

  • Test setup
  • DDR4 system 3rd 代AMD EPYC处理器64核和3.7 GHz; DDR4 3200 MHz system2 完全填充64GB RDIMM
  • DDR5 system 4th 代AMD EPYC处理器,96核和3核.7 GHz; DDR5 4800 MHz system3 完全填充64GB RDIMM

  • Test results
    Our tests demonstrated a 2.分子动力学的相对增益为1倍, 它可以很好地扩展更多的内核和更多的内存带宽.

    Summary

    上面的结果只是一个开始—并且只是HPC工作负载的几个示例. 更好地匹配高性能的能力, high-bandwidth memory with the incredible performance offered by new server processors such as the 4th 新一代AMD EPYC处理器将成为高性能计算客户的分水岭. We can expect to see many more such proof points that demonstrate how enterprise data center and cloud operators can use Micron DDR5 on these new platforms to unlock new levels of performance and efficiency. 我们期待在接下来的几个月里与你分享这些. 要了解有关Micron DDR5和数据中心工作负载优势的更多信息,请访问Micron.com/ddr5.

    1 我们的STREAM基准 设置为2.50亿矢量大小STREAM基准- AMD 运行与1个CPU系统
    2 AMD DDR4 system is an AMD EPYC 7763 64 core with DDR4-3200 MHz fully populated with 64GB RDIMMs
    3 AMD DDR5 system is an AMD EPYC 9654 96 core with DDR5-4800 MHz fully populated with 64GB RDIMMs
    4 WRF with a 12.5-km CONUS ran for 929 seconds on the DDR4 system and 287 seconds on the DDR5 system while counting storage I/O as well. 上面的例子来自WRF 2.5-km CONUS that ran 2.8533 time steps per second and 1.3567 time steps per second.
    5 对于OpenFOAM,我们运行了三个变体:
    1004040运行时间在DDR4系统上= 1144秒,在DDR5系统上= 478秒
    5b 1084646运行时间= DDR4系统上的1,633秒,DDR5系统上的698秒
    5c 1305252 runtimes = 2,522 seconds on DDR4 system and 1,091 seconds on the DDR5 system
    6 分子动力学工作负载运行为2,在DDR4系统上为519秒,242 seconds on the DDR5 system
    Krishna Yalamanchi

    Krishna Yalamanchi

    Krishna is a Senior Manager in Compute and Networking Business Unit at Micron and is responsible for launching products into the market. 10月22日,他在数据中心发布了DDR5, 今年早些时候向市场宣布了我们的HBM和CXL沙巴体育结算平台. 此前在英特尔,克里希纳推出了3rd and 4th generation Intel Xeon for SAP workloads via their partner ecosystem for Global System Integrators, OEM和云服务提供商.

    Sudharshan Vazhkudai

    Sudharshan Vazhkudai

    Dr. Sudharshan S. Vazhkudai是美光公司系统架构/工作负载分析总监. 他领导的团队专注于理解内存/存储(DDR)的可组合性, CXL, HBM and NVMe) product hierarchy and optimize system architectures for data center workloads. 在加入美光之前,他在美国半导体公司工作了20年.S. Department of Energy national lab complex as a Director and Distinguished Scientist (mostly at Oak Ridge National Lab and also at Argonne National Lab), 建造一些世界上最快的超级计算机和存储系统, and systems software solutions. Sudharshan holds a Ph.D. in Computer Science and has published over a 100 peer-reviewed papers and also served as a faculty at the University of Tennessee.
    +