WebGL GPU Compute in Practice: Browser-Side Parallel Image Processing Architecture
Why Use WebGL for Image Processing?
Browser-side image processing has three paths with dramatically different performance:
| Approach | Compute Unit | Parallelism | Typical Throughput | Best For |
|---|---|---|---|---|
| Canvas 2D + ImageData | CPU single thread | 1 | ~50 MP/s | Simple transforms, compression |
| OffscreenCanvas + Worker | CPU multi-thread | 4-16 | ~200 MP/s | Batch compression |
| WebGL Fragment Shader | GPU | Thousands of cores | ~2000 MP/s | Filters, convolution, real-time |
GPU is naturally suited for image processing—each pixel's computation is fully independent, and Fragment Shader processes thousands of pixels simultaneously, delivering 10-100x throughput over CPU.
Core Architecture: WebGL Image Processing Pipeline
Source Image → texImage2D Upload Texture → Fragment Shader Parallel Compute → readPixels Readback → Result ImageData
Initialize WebGL Context
const canvas = document.createElement('canvas');
const gl = canvas.getContext('webgl2') ?? canvas.getContext('webgl')!;
if (!gl) throw new Error('WebGL not available');
Compile and Link Shaders
function createShaderProgram(gl: WebGLRenderingContext, vertSrc: string, fragSrc: string) {
const vert = gl.createShader(gl.VERTEX_SHADER)!;
gl.shaderSource(vert, vertSrc);
gl.compileShader(vert);
const frag = gl.createShader(gl.FRAGMENT_SHADER)!;
gl.shaderSource(frag, fragSrc);
gl.compileShader(frag);
const program = gl.createProgram()!;
gl.attachShader(program, vert);
gl.attachShader(program, frag);
gl.linkProgram(program);
return program;
}
Texture Upload: Image to GPU
const texture = gl.createTexture();
gl.bindTexture(gl.TEXTURE_2D, texture);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.LINEAR);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.LINEAR);
// Upload from ImageBitmap (zero-copy, faster than HTMLImageElement)
gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA, gl.RGBA, gl.UNSIGNED_BYTE, imageBitmap);
Performance tip: Use
createImageBitmap()to get an ImageBitmap before uploading—2-3x faster than using an Image element directly, since it skips HTML decoding.
Fragment Shader: GPU Parallel Pixel Computation
Gaussian Blur (5×5 Convolution Kernel)
precision mediump float;
uniform sampler2D u_image;
uniform vec2 u_textureSize;
varying vec2 v_texCoord;
void main() {
vec2 onePixel = vec2(1.0) / u_textureSize;
vec4 sum = vec4(0.0);
for (int x = -2; x <= 2; x++) {
for (int y = -2; y <= 2; y++) {
vec2 offset = vec2(float(x), float(y)) * onePixel;
sum += texture2D(u_image, v_texCoord + offset);
}
}
gl_FragColor = sum / 25.0;
}
Color Space Transform (Grayscale + Contrast Enhancement)
precision mediump float;
uniform sampler2D u_image;
uniform float u_contrast;
varying vec2 v_texCoord;
void main() {
vec4 color = texture2D(u_image, v_texCoord);
float gray = dot(color.rgb, vec3(0.299, 0.587, 0.114));
gray = (gray - 0.5) * u_contrast + 0.5;
gl_FragColor = vec4(vec3(gray), color.a);
}
Sobel Edge Detection
precision mediump float;
uniform sampler2D u_image;
uniform vec2 u_textureSize;
varying vec2 v_texCoord;
void main() {
vec2 px = vec2(1.0) / u_textureSize;
float tl = dot(texture2D(u_image, v_texCoord + vec2(-px.x, -px.y)).rgb, vec3(0.33));
float tc = dot(texture2D(u_image, v_texCoord + vec2(0.0, -px.y)).rgb, vec3(0.33));
float tr = dot(texture2D(u_image, v_texCoord + vec2(px.x, -px.y)).rgb, vec3(0.33));
float bl = dot(texture2D(u_image, v_texCoord + vec2(-px.x, px.y)).rgb, vec3(0.33));
float br = dot(texture2D(u_image, v_texCoord + vec2(px.x, px.y)).rgb, vec3(0.33));
float gx = -tl - 2.0 * tc - tr + bl + 2.0 * br + br;
float gy = -tl - 2.0 * bl - br + tr + 2.0 * tr + tr;
float edge = sqrt(gx * gx + gy * gy);
gl_FragColor = vec4(vec3(edge), 1.0);
}
Readback: GPU to CPU
const pixels = new Uint8Array(width * height * 4);
gl.readPixels(0, 0, width, height, gl.RGBA, gl.UNSIGNED_BYTE, pixels);
const imageData = new ImageData(new Uint8ClampedArray(pixels.buffer), width, height);
Note:
readPixelstriggers a synchronous GPU→CPU readback—the slowest step in the pipeline. Only call it when you need the result; avoid frequent readbacks.
Performance Comparison: WebGL vs Canvas 2D
Benchmark: 5×5 Gaussian blur on a 4096×4096 image:
| Metric | Canvas 2D (ImageData) | WebGL Fragment Shader |
|---|---|---|
| Processing time | ~320ms | ~8ms |
| Throughput | ~52 MP/s | ~2100 MP/s |
| Main thread blocking | 320ms | 8ms (+ readPixels 15ms) |
| Memory usage | 64MB (ImageData) | 64MB (texture + readback) |
WebGL delivers a 40x performance advantage for convolution operations.
Multi-Pass Rendering: Chained Filters
Complex effects require multiple passes (e.g., blur → edge detection → composite):
function renderPass(gl: WebGLRenderingContext, program: WebGLProgram, inputTexture: WebGLTexture) {
gl.useProgram(program);
gl.bindTexture(gl.TEXTURE_2D, inputTexture);
gl.drawArrays(gl.TRIANGLES, 0, 6);
}
// Pass 1: Gaussian blur → write to FBO texture
gl.bindFramebuffer(gl.FRAMEBUFFER, blurFBO);
renderPass(gl, blurProgram, sourceTexture);
// Pass 2: Edge detection → write to Canvas
gl.bindFramebuffer(gl.FRAMEBUFFER, null);
renderPass(gl, edgeProgram, blurTexture);
Use Framebuffer Objects (FBO) to store intermediate results in textures, chaining passes together.
Practical Use Cases
These ToolsKu tools can benefit from WebGL acceleration:
- Image Compress: Color space transform + downsampling via GPU parallelism
- Image Resize: Bilinear/bicubic interpolation via Fragment Shader
- Image Transform: Rotation, flip via vertex shader matrix transforms
Common Questions
WebGL or WebGPU?
WebGPU is WebGL's successor, offering Compute Shaders and lower CPU overhead. However, WebGL has better compatibility (95%+ vs 70%+), making it the safer GPU acceleration choice today.
readPixels is too slow—what to do?
If the result only needs to be displayed on the page, render directly to Canvas—no readPixels needed. Only read back when exporting ImageData/Blob.
How to handle large images?
GPU textures have size limits (typically 4096×4096 or 8192×8192). For oversized images, use tiled processing: split into multiple textures, process separately, then stitch.
Summary
WebGL Fragment Shader transforms image processing from CPU serial to GPU massively parallel, achieving 10-100x speedups for convolution and color transform operations. Mastering the full pipeline—texture upload → shader computation → readPixels readback—and FBO multi-pass rendering is essential for building high-performance browser-side image processing tools.
Try these browser-local tools — no sign-up required →