Implementing High-Resolution Cloth and Origami Simulations on GPU using XPBD

Clint Wang · May 1, 2026

Introduction and Theory

I recently took a graduate course on physical simulations, taught by the incredibly talented Etienne Vouga. I wanted to apply what I learned, so I built a real-time physical simulation framework for rigid and cloth objects using extended position-based dynamics [XPBD, Macklin et al.].

To evaluate the strengths and limitations of XPBD, I extended a baseline shape-matching PBD cloth simulation with self-collision handling, friction constraints, and collisions against rigid bodies. After discovering the limitations of shape-matching and triangular meshes, I implemented a particle cloth that uses distance constraints. I then created a real-time origami simulation by adding stiff hinge constraints to the cloth model.

The framework is written in Rust that is compiled to WebAssembly (Wasm), and rendered using WebGPU. It can be run directly in the browser or as a headless binary for advanced debugging. A lightweight browser interface exposes key simulation parameters for interactive tuning. The source code is available on GitHub at https://github.com/txst54/wasm-cloth-sim, and a live browser demo is hosted at https://cloth.clintwang.com .

Position-based Dynamics

The baseline cloth simulation uses position-based dynamics (PBD), which can be thought of as a Verlet-like position prediction step, followed by an implicit-style constraint projection. In most implementations, the solver computes a target position, then interpolates between the Verlet predicted position and target position. Let kjk_j be a stiffness parameter for constraint jj. The position update is as follows:

xixi1+vΔt\textbf{x}_i \gets \textbf{x}_{i-1} + \textbf{v}\Delta t x^i(1kj)xi+kjxtarget\hat{\textbf{x}}_i \gets (1-k_j)\textbf{x}_i + k_j\textbf{x}_{\text{target}}

More formally, we can say that given a constraint scoring function C(x)C(x), the constraint solver computes a per-constraint position delta:

Δx=kjsjM1Cj(xi)\Delta x = k_js_j\textbf{M}^{-1}\nabla C_j(\textbf{x}_i) x^ixi+Δx\hat{\textbf{x}}_i \gets \textbf{x}_i + \Delta x

The subscript ii denotes the iteration index, jj denotes the constraint index, kj[0,1]k_j \in [0, 1] is the constraint stiffness, and the scaling factor sjs_j is derived:

sj=Cj(xi)CjM1CjTs_j = \frac{-C_j(\textbf{x}_i)}{\nabla C_j\textbf{M}^{-1}\nabla C_j^T}

Therefore, the issue with PBD is that stiffness depends on the number of constraint projections performed.

In a cloth simulation, the cloth typically has stretching and bending constraints. We refer to the rest position of a cloth as the initial position a cloth started in.

  1. The stretching constraint defines how individual edges in the cloth interact with respect to the rest position.
  2. The bending constraint defines how multiple edges in the cloth interact with respect to the rest position.
Triangle mesh cloth simulation: PBD shape-matching cloth at resolution=64.

For the PBD cloth simulation stretching and bending constraints, I use shape matching [Meshless Deformations, Muller et al.]. Each triangle in the cloth is constrained to match the corresponding triangle in the rest position via a rotation and a translation. Solving shape-matching is equivalent to solving the orthogonal Procrustes problem, which states that given matrices AA and BB, how can we map AA to BB using an orthogonal matrix.

The solver computes a target motion directly, which makes it non-trivial to express the constraint as a scoring function C(x). Rather than deriving motion from the gradient of a constraint energy, the method produces the target configuration explicitly. This distinction does not affect the PBD implementation, but becomes problematic in the XPBD formulation, which relies on an explicit constraint function. Additionally, solving the orthogonal Procrustes problem requires an SVD, which is computationally expensive.

Extended Position-based Dynamics

Earlier I mentioned that when using PBD, stiffness depends on the number of iterations. XPBD solves this by introducing compliance α\alpha (block diagonal compliance matrix corresponding to inverse-stiffness) and a Lagrange multiplier per constraint:

Δx=M1C(xi)TΔλ\Delta \textbf{x} = \textbf{M}^{-1}\nabla \textbf{C}(\textbf{x}_i)^T \Delta \lambda

XPBD solves Δλ\Delta \lambda by approximating each constraint as a local linear system and performing a Gauss-Seidel update:

Δλ=Cj(xi)α~jλijCjM1CjT+α~j\Delta \lambda = \frac{-C_j(\textbf{x}_i)-\tilde\alpha_j\lambda_{ij}}{\nabla C_j \textbf{M}^{-1}\nabla C_j^T + \tilde\alpha_j}

XPBD folds the time step into compliance, defining α~=αΔt2\tilde\alpha = \frac{\alpha}{\Delta t^2}.

Deriving XPBD

Macklin et al. provide a beautiful derivation of XPBD from Newton's equations of motion, specifically the 2nd law of motion:

F=maF = ma

If we subject the equation to forces derived from an energy potential U(x)U(x), we get equivalently:

UT(x)=Mx¨-\nabla U^T(x) = M\ddot{x}

We want to discretize x¨\ddot{x}. Recall the central difference from high school calculus:

x˙(t)x(t+Δt)x(tΔt)Δt2\dot{x}(t) \approx \frac{x(t+\Delta t) - x(t-\Delta t)}{\Delta t^2}

If we differentiate again, we get:

x¨(t)x(t+Δt)2x(t)+x(tΔt)Δt2\ddot{x}(t) \approx \frac{x(t+\Delta t) - 2x(t) + x(t-\Delta t)}{\Delta t^2}

Where superscript nn refers to the time step index. Plugging into the equation of motion, we get:

UT(xn+1)=M(xn+12xn+xn1Δt2)-\nabla U^T(x^{n+1}) = M\left(\frac{x^{n+1} - 2x^n + x^{n-1}}{\Delta t^2}\right)

Recall from high school physics that elastic potential energy is represented by U=12kx2U = \frac{1}{2}kx^2 where kk is stiffness. Because the constraints C(x)C(x) represent displacement from natural position, we can reformulate UU as follows:

U(x)=12C(x)Tα1C(x)U(x) = \frac{1}{2} C(x)^T \alpha^{-1} C(x)

Macklin et al. introduce α\alpha, which represents a block diagonal compliance matrix corresponding to inverse stiffness. Therefore, α1=k\alpha^{-1} = k. We get:

felastic=xUT=CTα1Cf_{elastic} = -\nabla_x U^T = -\nabla C^T \alpha^{-1} C

Plugging this back into the equations of motion, we get:

C(xn+1)Tα1C(xn+1)=M(xn+12xn+xn1Δt2)-\nabla C(x^{n+1})^T \alpha^{-1} C(x^{n+1}) = M\left(\frac{x^{n+1} - 2x^n + x^{n-1}}{\Delta t^2}\right)

We can define α~=αΔt2\tilde{\alpha} = \frac{\alpha}{\Delta t^2} by folding in Δt2\Delta t^2 from the right side of the equation. We can also redefine force elastic using the semantics of a Lagrange multiplier: λ=α~1C(x)\lambda = -\tilde{\alpha}^{-1} C(x). Finally, we can define x~=2xnxn1=xn+Δtvn\tilde{x} = 2x^n - x^{n-1} = x^n + \Delta t v^n, which is the predicted position computed by Verlet integration. Now we get the following systems of equations:

M(xn+1x~)C(xn+1)Tλn+1=0M(x^{n+1} - \tilde{x}) - \nabla C(x^{n+1})^T \lambda^{n+1} = 0 C(xn+1)+α~λn+1=0C(x^{n+1}) + \tilde{\alpha}\lambda^{n+1} = 0

To solve this coupled nonlinear system, we treat both xn1x^{n_1} and λn+1\lambda^{n+1} as unknowns and define:

g(x,λ)=M(xx~)C(x)Tλg(x,\lambda)=M(x-\tilde{x})-\nabla C(x)^T \lambda h(x,λ)=C(x)+α~λh(x,\lambda)=C(x)+\tilde{\alpha}\lambda

so that the system becomes:

g(x,λ)=0,h(x,λ)=0g(x,\lambda)=0, \quad h(x,\lambda)=0

However, both equations are nonlinear in xx due to C(x)C(x) and C(x)\nabla C(x), so we solve them using Newton’s method:

[gxgλhxhλ][ΔxΔλ]=[g(xi,λi)h(xi,λi)]\begin{bmatrix} \frac{\partial g}{\partial x} & \frac{\partial g}{\partial \lambda} \\ \frac{\partial h}{\partial x} & \frac{\partial h}{\partial \lambda} \end{bmatrix} \begin{bmatrix} \Delta x \\ \Delta \lambda \end{bmatrix} = - \begin{bmatrix} g(x_i,\lambda_i) \\ h(x_i,\lambda_i) \end{bmatrix}

The resulting linear system is:

[KC(xi)TC(xi)α~][ΔxΔλ]=[g(xi,λi)h(xi,λi)]\begin{bmatrix} K & -\nabla C(x_i)^T \\ \nabla C(x_i) & \tilde{\alpha} \end{bmatrix} \begin{bmatrix} \Delta x \\ \Delta \lambda \end{bmatrix} = - \begin{bmatrix} g(x_i,\lambda_i) \\ h(x_i,\lambda_i) \end{bmatrix}

We can then solve Δx\Delta x and Δλ\Delta \lambda and perform the following update:

λi+1=λi+Δλ\lambda_{i+1} = \lambda_i + \Delta \lambda xi+1=xi+Δxx_{i+1}=x_i+\Delta x

Macklin et al. claim that we can set KMK\approx M with an error on the order of O(Δt2)O(\Delta t^2). They also claim that we can assume g(xi,λi)=0g(x_i,\lambda_i)=0 since this is true given the initial values x0=x~x_0=\tilde x and λ0=0\lambda_0 = 0. We get:

[MC(xi)TC(xi)α~][ΔxΔλ]=[0h(xi,λi)]\begin{bmatrix} M & -\nabla C(x_i)^T \\ \nabla C(x_i) & \tilde{\alpha} \end{bmatrix} \begin{bmatrix} \Delta x \\ \Delta \lambda \end{bmatrix} = - \begin{bmatrix} 0 \\ h(x_i,\lambda_i) \end{bmatrix}

By doing a bit of algebra, we can solve for Δx\Delta x:

Δx=M1C(xi)TΔλ\Delta x =M^{-1}\nabla C(x_i)^T\Delta \lambda

Then we can substitute Δx\Delta x to get the λ\lambda update:

[C(xi)M1C(xi)T+α~]Δλ=C(xi)α~λi[\nabla C(x_i)M^-1\nabla C(x_i)^T + \tilde \alpha]\Delta \lambda = -C(x_i)-\tilde \alpha \lambda_i Δλj=Cj(xi)α~jλijCjM1CjT+α~j\Delta \lambda_j = \frac{-C_j(x_i)-\tilde\alpha_j\lambda_{ij}}{\nabla C_j M^{-1}\nabla C_j^T + \tilde\alpha_j}

By implementing a Gauss-Seidel solution for the systems of linear equations, we can calculate Δλj\Delta \lambda_j for a single constraint, then update positions.

XPBD Implementation for Cloth

Because shape-matching does not easily provide a scalar CC, I chose to keep shape-matching in PBD while creating a new distance constraint option in XPBD:

C(x)=xixjL0C(\textbf{x}) = |\textbf{x}_i - \textbf{x}_j| - L_0

Users can toggle between the two.

image

image

Particle cloth simulation (XPBD distance constraint): High-resolution cloth at 30 FPS, resolution=150, substeps = 1

Particle cloth simulation (XPBD distance constraint): High-resolution cloth at 20 FPS, resolution=150, substeps = 10

From the above figures, we see that using more substeps shows less stretching and higher material stiffness.

Self-Collision and Rigid Bodies

For both the triangle and particle cloth collision-detection, I used a spatial hash. The triangle cloth hashes triangles and runs vertex-triangle continuous collision detection (CCD). The particle cloth hashes particles and runs vertex-vertex and sub-stepping.

My implementation of continuous-collision detection predicts collisions along a path by solving for the earliest exact tt in x^=x+dxΔt\hat x = x+dx\Delta t where x^\hat x intersects any object. I discovered that CCD with triangles is computationally expensive and assumes linear approximation of motion resulting in tunneling.

To solve this, I implemented a particle cloth simulation where collisions are tested betIen vertex particles with a set radius. Since XPBD makes stiffness independent of iteration, I can set the number of iterations at a low number. Macklin et al. state that instead of running 1 timestep and nn iterations per frame, they obtained a significant improvement by running nn sub-steps and 1 iteration per frame [Small Steps, Macklin et al.]. By decreasing Δt\Delta t through sub-stepping, I find that tunneling is virtually eliminated without implementing CCD.

I implemented a stable cloth-cloth friction that blends tangential velocities of contacting points:

xixi+kfriction(vavgvi)Δt\textbf{x}_i \gets \textbf{x}_i + k_{\text{friction}}(\textbf{v}_{\text{avg}} - \textbf{v}_i)\Delta t

The total average velocity remains conserved and the relative velocity always shrinks.

For rigid bodies, I copied the numerical integrators from the rigid body assignment for the base implementation. I implemented a BVH for complex mesh collision and a signed distance field for simple rigid body collisions. I implemented a cloth-rigid body frictional constraint satisfying Coulomb's law:

Cf(x)=(InnT)(ΔxiΔxj)C_f(\textbf{x}) = \|(I-\textbf{n}\textbf{n}^T)(\Delta \textbf{x}_i - \Delta \textbf{x}_j) \| Δλ=min(μΔλn,Δλf)\Delta \lambda = \min(\mu \Delta \lambda_n, \Delta \lambda_f)

Where μ\mu is the frictional coefficient and λn\lambda_n is the normal Lagrange multiplier.

GPU

To simulate a high-resolution cloth, I found the CPU to be insufficient. Logically, XPBD fits the GPU SIMD model: compute velocity, constraints, and position updates in parallel. However, distance constraints for stretching and bending are co-dependent. Gauss-Seidel updates the positions themselves:

repeat xixi+Δx\text{repeat }\textbf{x}_i \gets \textbf{x}_i + \Delta \textbf{x}

Each update influences neighboring constraints. An alternative is a Jacobi solve that performs the following:

di0\textbf{d}_i \gets 0 repeat didi+Δx\text{repeat }\textbf{d}_i \gets \textbf{d}_i + \Delta \textbf{x} xixi+sdi\textbf{x}_i \gets \textbf{x}_i + s\textbf{d}_i

The Jacobi solve updates the initial position produced by Verlet integration. Consequently, the solve takes longer to converge, overshoots, and violates momentum conservation. Therefore, I adopted a hybrid approach. I used greedy graph coloring to iteratively select and concurrently process a subset of independent distance constraints using Gauss-Seidel. Because graph-coloring would yield a large number of colors (small sets) for collision constraints, I handle collision constraints using a parallel Jacobi solve.

Origami

The goal of the origami simulation is to map a mesh of creases on an unfolded sheet of paper to the folded version. Typically, the crease mesh is called a crease pattern and will contain 3 types of edges:

  1. Boundary edges represent the edge of the paper and typically form a square.

  2. Mountain fold edges represent creases that fold away from you.

  3. Valley fold edges represent creases that fold towards you.

I base our implementation on the numerical integrator developed by Ghassei et al. [Origami Simulator, Ghassei et al.]. Ghassei et al. model a force Fcrease=kcrease(θθtarget)θp\textbf{F}_{crease}=-k_{crease}(\theta - \theta_{target})\frac{\partial\theta}{\partial \textbf{p}} based on a dihedral crease constraint, which naturally maps to an XPBD constraint. I evaluate the constraint on hinge diamonds, which are diamonds that overlay creases. In origami, the dihedral angle is more usefully defined as the angle between the two plane normals, rather than betIen the planes themselves. A flat unfolded hinge diamond will have a dihedral angle of 0 according to our convention. To simulate fold progress, I set θtarget=πcprogresst\theta_{\text{target}} = \pi c_{\text{progress}}\textbf{t}, where cprogress[0,1]c_\text{progress} \in [0,1] and t\textbf{t} is a vector the size of the number of hinges, containing 11 if hinge ii is a valley fold, or 1-1 if hinge ii is a mountain fold.

Dihedral crease constraint from Ghassei et al.

The constraint is rather simple: C=θθgoalC = \theta - \theta_{goal} I also want to know Ci\nabla C_i, the gradient of CC at point xi\textbf{x}_i. I get:

C=[θx1,θx2,θx3,θx4]\nabla C = [\frac{\partial \theta}{\partial \textbf{x}_1},\frac{\partial \theta}{\partial \textbf{x}_2},\frac{\partial \theta}{\partial \textbf{x}_3},\frac{\partial \theta}{\partial \textbf{x}_4}]

Following the orientation given by Figure 1, the derivatives for flap vertices x1,x2\textbf{x}_1, \textbf{x}_2 are proportional to the normals of their respective triangles: θx1=n1h1,θx2=n2h2\frac{\partial \theta}{\partial \textbf{x}_1} = \frac{\textbf{n}_1}{h_1}, \frac{\partial \theta}{\partial \textbf{x}_2} = \frac{\textbf{n}_2}{h_2} The derivatives for edge vertices x3,x4\textbf{x}_3, \textbf{x}_4 are the weighted sum of the flap vertex derivatives.

θx3=cotα4,31cotα3,14+cotα4,31n1h1+cotα4,23cotα3,42+cotα4,23n2h2\frac{\partial \theta}{\partial \textbf{x}_3} = \frac{-\cot\alpha_{4, 31}}{\cot \alpha_{3,14}+\cot \alpha_{4,31}}\frac{\textbf{n}_1}{h_1} + \frac{-\cot\alpha_{4, 23}}{\cot \alpha_{3,42}+\cot \alpha_{4,23}}\frac{\textbf{n}_2}{h_2} θx4=cotα3,14cotα4,31+cotα3,14n1h1+cotα3,42cotα4,23+cotα3,42n2h2\frac{\partial \theta}{\partial \textbf{x}_4} = \frac{-\cot\alpha_{3, 14}}{\cot \alpha_{4,31}+\cot \alpha_{3,14}}\frac{\textbf{n}_1}{h_1} + \frac{-\cot\alpha_{3, 42}}{\cot \alpha_{4,23}+\cot \alpha_{3,42}}\frac{\textbf{n}_2}{h_2}

Origami in PBD versus XPBD

Triangle mesh paper simulation (XPBD shape-matching dihedral angle constraint): Waterbomb tessellation at resolution=1.

Particle paper simulation (XPBD distance + dihedral angle constraint): Waterbomb base in a high resolution paper, resolution = 64.

One limitation of the particle XPBD paper simulation is that in high resolutions, the distance constraints become slightly unstable. Notice the small bumps in the paper in Figure 4. When I ran simulations in the triangle mesh PBD shape-matching paper simulation, the paper was smooth. This is because shape-matching provides high-fidelity projections onto the rest shapes.

Implementation Details

Architecture

The Rust core of the simulation is located in /rust. All architecture-specific details (Wasm versus native binary) can be found in /rust/src/arch. GPU-support is only enabled in the Wasm package, and the native binary is several commits behind the Wasm package, so outputs may not look the same.

The Wasm package can be built using cd rust && wasm-pack build --target web --features gpu. Since Wasm packages are compiled as a library, the "main" function of the Wasm package can be found in /rust/src/arch/wasm/harness.rs.

Since the native binary is compiled as a binary, the binary entry point can be found at /rust/bin/native.rs and wraps /rust/src/arch/native/harness.rs.

If the GPU compute shaders in particle cloth or particle paper cause issues, the simulation can be reverted to the serial Gauss-Seidel CPU implementation by omitting the gpu feature flag. All code relating to the user interface wrapping the Wasm package is located in /src.

Available Simulations

URLSimulation
/Triangle Mesh Cloth Simulation + Rotating BVH Rigid Body
/paperTriangle Mesh Paper Simulation
/particle_clothGPU Particle Cloth Simulation + Static SDF Rigid Body
/particle_paperGPU Particle Paper Simulation

I offer 4 simulations located at different sub-URLs. The triangle mesh simulations run shape-matching PBD constraints by default while the particle cloth simulations only run distance XPBD constraints for cloth structure. I have a UI that allows the user to adjust several parameters relating to the simulation. Since scales for parameters ki[0,1]k_i \in [0, 1] may be very small, or on the exponential magnitude, I recommend manually entering numbers into inputs if the sliders do not prove sufficient. Damping and timestep are examples of this.

Available Crease Patterns

Crease PatternDescription
line.cpA single vertical crease
lines.cpTwo vertical creases
waterbomb.cpA traditional origami base
miura_ori.cpA classic and easy to fold origami tessellation
waterbomb_tess.cpA classic and harder to fold origami tessellation
hex_torso.cpModern origami "hex-pleat" model. Designed by Clint.

Table: Available crease patterns for paper simulation