HyperAIHyperAI

Command Palette

Search for a command to run...

CP2K_Benchmark Performance Benchmark Dataset

Date

12 days ago

Organization

ETH Zurich
University of Basel
University of Zurich

Publish URL

github.com

Paper URL

doi.org

License

GPL

Join the Discord Community

*This dataset supports online use.Click here to jump.

The CP2K Benchmark dataset is a set of performance testing and validation inputs specifically designed for high-performance computing (HPC) environments. This dataset, derived from the open-source first-principles simulation software CP2K, is used to evaluate the performance of quantum chemistry and molecular dynamics calculations under different hardware platforms, parallelization strategies (MPI/OpenMP), and compilation optimization settings.

Unlike scientific experimental data, this dataset is mainly used for high-performance computing performance evaluation (running speed, parallel efficiency, scalability, etc.). It is maintained by the CP2K official team. The data and input files are located in the benchmarks/ directory of the source code.

The relevant paper results areCP2K: atomic simulations of condensed matter systems", published in 2013 by the Swiss Federal Institute of Technology in Zurich, the University of Zurich, the University of Basel and other institutions.

Dataset structure

The dataset is located in the benchmarks/ directory of the CP2K source code repository, which contains several subdirectories and input files:

benchmarks/
 ├── Fayalite-FIST/        # FIST 模块基准,Fayalite 体系
 ├── QS/                   # Quickstep 模块基准(DFT/MD)
 ├── QS_DM_LS/             # 线性标度 DFT 基准
 ├── QS_HFX/               # 混合泛函 HFX 基准
 ├── QS_diag/              # 对角化算法基准
 ├── QS_mp2_rpa/           # MP2 / RPA 高精度基准
 ├── QS_ot_ls/             # 轨道变换线性标度 DFT 基准
 ├── QS_pao_ml_tio2/       # PAO / ML 势 TiO₂ 基准
 ├── QS_rubri/             # Rubredoxin 蛋白质基准
 ├── QS_single_node/       # 单节点 Quickstep 基准
 ├── QS_stmv/              # STMV 病毒体系基准
 ├── README.md             # 各基准说明文档

Each subdirectory contains:

  • Architecture files (such as .xyz, .psf, .pdb)
  • CP2K input script (.inp)
  • Reference energy results and output templates
  • Run and performance record scripts

Dataset Contents

The dataset contains multiple standardized benchmark problems, each of which represents a typical scientific computing scenario, covering different calculation types from classical molecular dynamics to ab initio electronic structure.

  1. H₂O-64
  • Type: DFT + Molecular Dynamics
  • System: 64 water molecules (192 atoms/512 electrons)
  • Purpose: To test the performance and scalability of standard quantum chemistry simulations on medium-sized systems.
  1. Fayalite-FIST
  • Type: Classical Molecular Dynamics (FIST module)
  • System: Fayalite crystal (Fe₂SiO₄ supercell, approximately 28,000 atoms)
  • Purpose: To evaluate the performance of classical force field calculations and long-range electric field summation algorithms in large systems.
  1. LiH-HFX
  • Type: Hybrid DFT (HFX single-point calculation)
  • System: LiH crystal (216 atoms/432 electrons)
  • Purpose: To test the load and communication efficiency of hybrid exchange functionals under multi-threaded parallelism.
  1. H₂O-DFT-LS
  • Type: Linear Scaling DFT
  • System: 2048 water molecules (6144 atoms)
  • Purpose: To evaluate the scalability and memory requirements of linear scaling algorithms under massive parallelism.
  1. H₂O-64-RI-MP2
  • Type: MP2 method + RI approximation
  • System: Same as H₂O-64
  • Purpose: To test the performance and computational overhead of high-order ab initio methods on HPC.

Running Benchmarks

Before running some benchmarks, you may need to complete a preprocessing step to generate the required input files (such as wave function files).
These preparations are detailed in the README.md file in the corresponding benchmark subdirectory. It is recommended to read the relevant instructions before starting the test.

Benchmarks in datasets are typically run in a hybrid parallel mode, using both MPI and OpenMP for parallel computing. The following example shows how to assign two threads to each MPI process to execute a benchmark:

export OMP_NUM_THREADS=2
parallel_launcher launcher_options path_to_cp2k.psmp -i inputfile.inp -o logfile.log

in:

  • parallel_launcher is mpirun, mpiexec, or a variant such as aprun on Cray systems, or srun when using Slurm job scheduling.
  • launcher_options specifies the number of nodes, MPI processes, tasks per node, and threads per task for parallel execution (should be equal to the value of OMP_NUM_THREADS). If the parallel execution configuration is automatically set by the job environment, you do not need to specify it manually.

Get benchmark results

After running the benchmark, users can obtain the actual running time (walltime) of CP2K by outputting the internal timing information in the log file.
All benchmarks provided by the dataset will record this timing information in log files, making it easier for users to compare performance on different hardware or parallel configurations.
The following is an example of how to obtain it:

grep "CP2K     "  *.log

Additionally, several performance statistics are included at the end of the log file:

  • DBCSR STATISTICS: Displays the computational and communication performance statistics of the DBCSR module. The first few rows include the number of floating-point operations (FLOPs) for different small matrix block sizes, and the distribution of these computations on BLAS, SMM (small matrix multiplier), and GPU (ACC).
  • DBCSR MESSAGE PASSING PERFORMANCE: Displays MPI call performance statistics in the DBCSR module.
  • MESSAGE PASSING PERFORMANCE: Displays MPI communication performance statistics for other parts of CP2K.
  • TIMING: Lists the calling time and number of calls of each function.

Drawing

The project also provides Python scripts for generating scaling charts for visualizing the dataset, which can be found in the following directory:

cp2k/tools/benchmark_plots/

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
CP2K_Benchmark Performance Benchmark Dataset | Datasets | HyperAI