Numerical Methods, Solvers & HPC
This section addresses: how to discretize physical equations into computable forms, how to design solvers for stable and efficient convergence, and how to leverage multi-core, multi-node parallelism for large-scale cases.
1. Discretization Methods
Different frameworks (FVM / FEM / DG) have different applicable scenarios. Selection must consider both physical problem characteristics and subsequent solver complexity and parallel-friendliness.
- 1.1 Finite Volume Method (FVM) Control-volume based integral form, naturally conservative; suitable for convection-dominated CFD problems. Includes: reconstruction (linear, WENO), flux computation (Roe, AUSM+, HLLC), limiters.
- 1.2 Finite Element Method (FEM) Variational principle based, suitable for elliptic/parabolic problems, widely used in structural mechanics. In CFD, more common for incompressible flow, heat transfer.
- 1.3 Discontinuous Galerkin (DG) Combines FVM conservation and FEM high accuracy; suitable for high-precision needs and multiphysics coupling. Higher computational cost but advantages in specific scenarios.
- 1.4 Grid Types & Adaptivity Structured/unstructured, polyhedral meshes, adaptive refinement; how grid types affect discretization and solving.
2. Time Integration
- 2.1 Explicit Methods Runge-Kutta multi-step, timestep constraints (CFL condition), stability analysis. Suitable for convection-dominated, short timescale problems.
- 2.2 Implicit Methods Backward Euler, Crank-Nicolson, implicit Runge-Kutta. Larger timesteps possible but require linear/nonlinear system solving.
- 2.3 Dual Time Stepping Using explicit methods for pseudo-time in implicit framework; balances stability and efficiency. Widely used in compressible flow solvers.
- 2.4 Timestep Control Adaptive timesteps, local timesteps, multi-timescale problems (e.g., fast/slow reactions in combustion).
3. Linear Solvers
Core of implicit methods is solving sparse linear systems—this is the performance bottleneck and key to parallelization.
-
3.1 Krylov Subspace Methods
- CG (Conjugate Gradient): symmetric positive definite systems, common in structural mechanics.
- GMRES / BiCGStab: non-symmetric systems, typical for CFD Jacobian matrices.
- Restart strategies, convergence criteria, residual monitoring.
-
3.2 Preconditioning
- Classical: Jacobi, Gauss-Seidel, ILU.
- Multigrid: V-cycle, W-cycle, highly effective in CFD.
- Approximate inverses, sparse approximate inverses (SPAI).
- Parallelization: how to parallelize without sacrificing effectiveness.
- 3.3 Direct vs Iterative When to use direct (LU, Cholesky), when iterative is necessary. Trade-offs for large sparse systems.
4. Nonlinear Solvers
- 4.1 Newton-Raphson Linearization of nonlinear systems, Jacobian construction (analytical vs numerical), convergence criteria.
- 4.2 Quasi-Newton BFGS, Broyden—avoid explicit Jacobian construction, suitable when Jacobian computation is expensive.
- 4.3 Nonlinear Preconditioning & Damping Line search, trust region methods, improving Newton robustness.
- 4.4 Fully Coupled vs Segregated Pressure-velocity coupling (SIMPLE, PISO, Coupled), multiphysics coupling strategies.
5. HPC & Parallel Computing
From single-core to multi-core, single-node to clusters: how to make algorithms fully utilize hardware resources.
-
5.1 Parallel Models
- OpenMP: shared memory, suitable for single-node multi-core.
- MPI: distributed memory, suitable for multi-node clusters.
- Hybrid: MPI + OpenMP, common on supercomputers.
- GPU: CUDA / OpenCL, suitable for data-parallel operators.
-
5.2 Domain Decomposition & Load Balancing
- METIS, ParMETIS for partitioning.
- Static vs dynamic load balancing for non-uniform computational loads.
- Minimizing inter-process communication, improving parallel efficiency.
-
5.3 Performance Optimization
- Cache-friendly: data locality, memory access pattern optimization.
- Vectorization: SIMD instructions, compiler optimization.
- Profiling: gprof, perf, Intel VTune.
- Bottleneck identification: computation vs communication vs I/O.
6. My Practice
Currently focused on:
- Implicit scheme design and implementation in compressible CFD solvers.
- Krylov solvers and multigrid preconditioners for sparse linear systems.
- Load balancing and communication optimization in MPI parallel contexts.
- Matrix operator acceleration and performance tuning on custom chips.
Specific algorithm implementations, performance analysis, and optimization experience will be detailed in projects and technical notes.