Designing a Compressible CFD Solver for Custom Scientific Computing Chips

Draft · Language: English · Author: Xu Bin

This note summarizes how I think about designing a compressible CFD solver that runs efficiently on a custom scientific computing chip, while still behaving like a serious engineering tool: verifiable, robust, and maintainable.

1. Context and Goals

In many CFD projects, the solver is written with a generic CPU cluster in mind. When we introduce a custom scientific computing chip into the picture, several things change at once:

From my experience, a reasonable goal is to:

2. Equations and Discretization: What Must Not Change

On the mathematical side, I try not to “simplify away” the problem just to fit the chip. For compressible flow, the core choice is:

These choices are largely hardware-agnostic and driven by physics and engineering requirements. The chip enters the picture when we look at:

3. Data Layout and Memory Bandwidth

On a bandwidth-limited accelerator, data layout is part of the numerical method. For a cell-centered FVM solver, typical choices include:

The goal is not just to “make it faster”, but to make memory access patterns predictable enough so that the chip’s prefetchers, local memories, and DMA engines can actually be used.

4. Sparse Linear Solvers on the Chip

In an implicit compressible solver, most of the time is spent in sparse linear solvers: typically BiCGStab, CG, or related Krylov methods, plus a preconditioner.

On a custom chip, I see three layers:

4.1 Algorithmic layer

At this level we decide:

4.2 Kernel layer

Here we look at kernels such as:

These kernels must be designed with:

4.3 Mapping layer

Finally, we decide how to map the mesh and linear algebra objects onto:

5. Verification, Regression, and “Not Lying to Yourself”

When chasing performance, it is easy to accidentally change the math. To avoid this, I try to build a verification and regression stack that includes:

The idea is that every time we change data layout, kernel implementation, or chip mapping, we can re-run a selected set of cases and confirm that the engineering answers remain within acceptable tolerances.

6. Towards a Full Engine Simulation Workflow

Ultimately, a solver is only useful if it fits into an end-to-end workflow:

On a custom chip, this usually implies:

7. Closing Remarks

This note is intentionally high-level. In future updates, I plan to fill in more details on:

If you are working on similar problems and would like to compare notes, feel free to reach out at xubinlab@gmail.com.