前端工程

WebGPU Compute Shaders: The Real Killer Feature

Most people think of WebGPU as "the WebGL replacement." That misses the point entirely — Compute Shader is WebGPU's true killer feature.

Compute Shader gives browsers native GPGPU capability for the first time. This isn't an incremental improvement — it's a paradigm shift.

In the WebGL era, GPGPU meant stuffing data into textures, using fragment shaders to simulate computation, then reading results back to the CPU. WebGPU Compute Shader breaks this limitation entirely.


WGSL Core Syntax

WebGPU introduces WGSL (WebGPU Shading Language), replacing GLSL. Here are the essentials:

@group(0) @binding(0) var<storage, read> inputData: array<f32>;
@group(0) @binding(1) var<storage, read_write> outputData: array<f32>;

@compute @workgroup_size(64)
fn main(@builtin(global_invocation_id) global_id: vec3<u32>) {
    let idx = global_id.x;
    if (idx >= arrayLength(&inputData)) { return; }
    outputData[idx] = inputData[idx] * 2.0;
}

Workgroup Size Recommendations

Hardware Recommended Size Reason
NVIDIA 128-256 Warp size is 32
AMD 64-128 Wavefront size is 64
Intel 32-64 Smaller subgroup size
Safe default 64 Compatible with all

Compute Shader vs WebGL Transform Feedback

Dimension WebGL TF WebGPU Compute
Programming model Disguised rendering Native compute
Data transfer Texture/VBO Storage Buffer
Thread control Vertex count hack Precise dispatch
Data readback Slow readPixels Fast mapAsync
Random write Not supported Atomic + random write
Shared memory None Workgroup shared

Hands-On: Gaussian Blur with Compute Shader

JavaScript processes a 1080p Gaussian blur in ~450ms. Compute Shader does it in ~5ms — roughly 50x faster.

@group(0) @binding(0) var srcTexture: texture_2d<f32>;
@group(0) @binding(1) var dstTexture: texture_storage_2d<rgba8unorm, write>;
@group(0) @binding(2) var<uniform> params: Params;

struct Params {
    width: u32,
    height: u32,
    radius: f32,
}

@compute @workgroup_size(16, 16)
fn main(@builtin(global_invocation_id) global_id: vec3<u32>) {
    let pixel_coords = vec2<u32>(global_id.x, global_id.y);
    if (pixel_coords.x >= params.width || pixel_coords.y >= params.height) { return; }

    let sigma = params.radius / 3.0;
    var color = vec4<f32>(0.0);
    var total_weight = 0.0;
    let radius_i = i32(params.radius);

    for (var dx: i32 = -radius_i; dx <= radius_i; dx++) {
        for (var dy: i32 = -radius_i; dy <= radius_i; dy++) {
            let weight = exp(-(f32(dx)*f32(dx) + f32(dy)*f32(dy)) / (2.0*sigma*sigma));
            let sx = clamp(i32(pixel_coords.x) + dx, 0, i32(params.width) - 1);
            let sy = clamp(i32(pixel_coords.y) + dy, 0, i32(params.height) - 1);
            color += textureLoad(srcTexture, vec2<u32>(sx, sy), 0) * weight;
            total_weight += weight;
        }
    }
    textureStore(dstTexture, pixel_coords, color / total_weight);
}

Performance Benchmarks

Approach 1080p (r=10) 4K (r=10)
JavaScript ~450ms ~1800ms
Web Worker (4x) ~130ms ~520ms
WebGL Fragment ~12ms ~45ms
WebGPU Compute ~5ms ~18ms

Hands-On: 100K Particle Simulation at 60fps

Compute Shader's classic use case — updating 100K particles per frame is impossible on CPU.

struct Particle {
    pos: vec2<f32>,
    vel: vec2<f32>,
    color: vec4<f32>,
    life: f32,
}

@group(0) @binding(0) var<storage, read_write> particles: array<Particle>;
@group(0) @binding(1) var<uniform> time: f32;

@compute @workgroup_size(256)
fn updateParticles(@builtin(global_invocation_id) global_id: vec3<u32>) {
    let idx = global_id.x;
    if (idx >= arrayLength(&particles)) { return; }
    var p = particles[idx];
    p.vel.y += -9.8 * 0.016;
    p.vel *= 0.999;
    p.pos += p.vel * 0.016;
    if (p.pos.y < 0.0) { p.pos.y = 0.0; p.vel.y *= -0.7; }
    p.life -= 0.016;
    if (p.life <= 0.0) {
        p.pos = vec2<f32>(0.5, 0.8);
        p.life = 3.0;
    }
    particles[idx] = p;
}

Key trick: the same buffer is used as STORAGE | VERTEX, enabling zero-copy compute-to-render pipeline.


Browser Compatibility

Browser Support Version
Chrome ✅ Full 113+
Edge ✅ Full 113+
Firefox ⚠️ Experimental Nightly
Safari ⚠️ Partial 17.4+

WebGPU + Web Worker Collaboration

For scenarios requiring both GPU compute and CPU logic:

Main Thread (WebGPU Compute) → mapAsync → SharedArrayBuffer → Web Worker (logic)

This pattern excels when GPU handles heavy computation while CPU makes decisions — like physics simulations where GPU computes collisions and CPU runs game logic.


Conclusion

WebGPU Compute Shader brings true GPGPU to the web. With Chrome 113+ stable support and a maturing ecosystem, 2026 is the year to adopt it. Watch for upcoming WebGPU Subgroups and Ray Tracing proposals at W3C.

Try these browser-local tools — no sign-up required →

#WebGPU#Compute Shader#GPU计算#WGSL#并行计算