Numerical Methods, Solvers & HPC

This section addresses: how to discretize physical equations into computable forms, how to design solvers for stable and efficient convergence, and how to leverage multi-core, multi-node parallelism for large-scale cases.

1. Discretization Methods

Different frameworks (FVM / FEM / DG) have different applicable scenarios. Selection must consider both physical problem characteristics and subsequent solver complexity and parallel-friendliness.

1.1 Finite Volume Method (FVM) Control-volume based integral form, naturally conservative; suitable for convection-dominated CFD problems. Includes: reconstruction (linear, WENO), flux computation (Roe, AUSM+, HLLC), limiters.
1.2 Finite Element Method (FEM) Variational principle based, suitable for elliptic/parabolic problems, widely used in structural mechanics. In CFD, more common for incompressible flow, heat transfer.
1.3 Discontinuous Galerkin (DG) Combines FVM conservation and FEM high accuracy; suitable for high-precision needs and multiphysics coupling. Higher computational cost but advantages in specific scenarios.
1.4 Grid Types & Adaptivity Structured/unstructured, polyhedral meshes, adaptive refinement; how grid types affect discretization and solving.

2. Time Integration

2.1 Explicit Methods Runge-Kutta multi-step, timestep constraints (CFL condition), stability analysis. Suitable for convection-dominated, short timescale problems.
2.2 Implicit Methods Backward Euler, Crank-Nicolson, implicit Runge-Kutta. Larger timesteps possible but require linear/nonlinear system solving.
2.3 Dual Time Stepping Using explicit methods for pseudo-time in implicit framework; balances stability and efficiency. Widely used in compressible flow solvers.
2.4 Timestep Control Adaptive timesteps, local timesteps, multi-timescale problems (e.g., fast/slow reactions in combustion).

3. Linear Solvers

Core of implicit methods is solving sparse linear systems—this is the performance bottleneck and key to parallelization.

3.1 Krylov Subspace Methods
- CG (Conjugate Gradient): symmetric positive definite systems, common in structural mechanics.
- GMRES / BiCGStab: non-symmetric systems, typical for CFD Jacobian matrices.
- Restart strategies, convergence criteria, residual monitoring.
3.2 Preconditioning
- Classical: Jacobi, Gauss-Seidel, ILU.
- Multigrid: V-cycle, W-cycle, highly effective in CFD.
- Approximate inverses, sparse approximate inverses (SPAI).
- Parallelization: how to parallelize without sacrificing effectiveness.
3.3 Direct vs Iterative When to use direct (LU, Cholesky), when iterative is necessary. Trade-offs for large sparse systems.

4. Nonlinear Solvers

4.1 Newton-Raphson Linearization of nonlinear systems, Jacobian construction (analytical vs numerical), convergence criteria.
4.2 Quasi-Newton BFGS, Broyden—avoid explicit Jacobian construction, suitable when Jacobian computation is expensive.
4.3 Nonlinear Preconditioning & Damping Line search, trust region methods, improving Newton robustness.
4.4 Fully Coupled vs Segregated Pressure-velocity coupling (SIMPLE, PISO, Coupled), multiphysics coupling strategies.

5. HPC & Parallel Computing

From single-core to multi-core, single-node to clusters: how to make algorithms fully utilize hardware resources.

5.1 Parallel Models
- OpenMP: shared memory, suitable for single-node multi-core.
- MPI: distributed memory, suitable for multi-node clusters.
- Hybrid: MPI + OpenMP, common on supercomputers.
- GPU: CUDA / OpenCL, suitable for data-parallel operators.
5.2 Domain Decomposition & Load Balancing
- METIS, ParMETIS for partitioning.
- Static vs dynamic load balancing for non-uniform computational loads.
- Minimizing inter-process communication, improving parallel efficiency.
5.3 Performance Optimization
- Cache-friendly: data locality, memory access pattern optimization.
- Vectorization: SIMD instructions, compiler optimization.
- Profiling: gprof, perf, Intel VTune.
- Bottleneck identification: computation vs communication vs I/O.

6. My Practice

Currently focused on:

Implicit scheme design and implementation in compressible CFD solvers.
Krylov solvers and multigrid preconditioners for sparse linear systems.
Load balancing and communication optimization in MPI parallel contexts.
Matrix operator acceleration and performance tuning on custom chips.

Specific algorithm implementations, performance analysis, and optimization experience will be detailed in projects and technical notes.