WebGL GPU Compute in Practice: Browser-Side Parallel Image Processing Architecture

Why Use WebGL for Image Processing?

Browser-side image processing has three paths with dramatically different performance:

Approach	Compute Unit	Parallelism	Typical Throughput	Best For
Canvas 2D + ImageData	CPU single thread	1	~50 MP/s	Simple transforms, compression
OffscreenCanvas + Worker	CPU multi-thread	4-16	~200 MP/s	Batch compression
WebGL Fragment Shader	GPU	Thousands of cores	~2000 MP/s	Filters, convolution, real-time

GPU is naturally suited for image processing—each pixel's computation is fully independent, and Fragment Shader processes thousands of pixels simultaneously, delivering 10-100x throughput over CPU.

Core Architecture: WebGL Image Processing Pipeline

Source Image → texImage2D Upload Texture → Fragment Shader Parallel Compute → readPixels Readback → Result ImageData

Initialize WebGL Context

const canvas = document.createElement('canvas');
const gl = canvas.getContext('webgl2') ?? canvas.getContext('webgl')!;

if (!gl) throw new Error('WebGL not available');

Compile and Link Shaders

function createShaderProgram(gl: WebGLRenderingContext, vertSrc: string, fragSrc: string) {
  const vert = gl.createShader(gl.VERTEX_SHADER)!;
  gl.shaderSource(vert, vertSrc);
  gl.compileShader(vert);

  const frag = gl.createShader(gl.FRAGMENT_SHADER)!;
  gl.shaderSource(frag, fragSrc);
  gl.compileShader(frag);

  const program = gl.createProgram()!;
  gl.attachShader(program, vert);
  gl.attachShader(program, frag);
  gl.linkProgram(program);
  return program;
}

Texture Upload: Image to GPU

const texture = gl.createTexture();
gl.bindTexture(gl.TEXTURE_2D, texture);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.LINEAR);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.LINEAR);

// Upload from ImageBitmap (zero-copy, faster than HTMLImageElement)
gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA, gl.RGBA, gl.UNSIGNED_BYTE, imageBitmap);

Performance tip: Use createImageBitmap() to get an ImageBitmap before uploading—2-3x faster than using an Image element directly, since it skips HTML decoding.

Fragment Shader: GPU Parallel Pixel Computation

Gaussian Blur (5×5 Convolution Kernel)

precision mediump float;
uniform sampler2D u_image;
uniform vec2 u_textureSize;
varying vec2 v_texCoord;

void main() {
  vec2 onePixel = vec2(1.0) / u_textureSize;
  vec4 sum = vec4(0.0);

  for (int x = -2; x <= 2; x++) {
    for (int y = -2; y <= 2; y++) {
      vec2 offset = vec2(float(x), float(y)) * onePixel;
      sum += texture2D(u_image, v_texCoord + offset);
    }
  }

  gl_FragColor = sum / 25.0;
}

Color Space Transform (Grayscale + Contrast Enhancement)

precision mediump float;
uniform sampler2D u_image;
uniform float u_contrast;
varying vec2 v_texCoord;

void main() {
  vec4 color = texture2D(u_image, v_texCoord);
  float gray = dot(color.rgb, vec3(0.299, 0.587, 0.114));
  gray = (gray - 0.5) * u_contrast + 0.5;
  gl_FragColor = vec4(vec3(gray), color.a);
}

Sobel Edge Detection

precision mediump float;
uniform sampler2D u_image;
uniform vec2 u_textureSize;
varying vec2 v_texCoord;

void main() {
  vec2 px = vec2(1.0) / u_textureSize;
  float tl = dot(texture2D(u_image, v_texCoord + vec2(-px.x, -px.y)).rgb, vec3(0.33));
  float tc = dot(texture2D(u_image, v_texCoord + vec2(0.0, -px.y)).rgb, vec3(0.33));
  float tr = dot(texture2D(u_image, v_texCoord + vec2(px.x, -px.y)).rgb, vec3(0.33));
  float bl = dot(texture2D(u_image, v_texCoord + vec2(-px.x, px.y)).rgb, vec3(0.33));
  float br = dot(texture2D(u_image, v_texCoord + vec2(px.x, px.y)).rgb, vec3(0.33));

  float gx = -tl - 2.0 * tc - tr + bl + 2.0 * br + br;
  float gy = -tl - 2.0 * bl - br + tr + 2.0 * tr + tr;
  float edge = sqrt(gx * gx + gy * gy);

  gl_FragColor = vec4(vec3(edge), 1.0);
}

Readback: GPU to CPU

const pixels = new Uint8Array(width * height * 4);
gl.readPixels(0, 0, width, height, gl.RGBA, gl.UNSIGNED_BYTE, pixels);

const imageData = new ImageData(new Uint8ClampedArray(pixels.buffer), width, height);

Note: readPixels triggers a synchronous GPU→CPU readback—the slowest step in the pipeline. Only call it when you need the result; avoid frequent readbacks.

Performance Comparison: WebGL vs Canvas 2D

Benchmark: 5×5 Gaussian blur on a 4096×4096 image:

Metric	Canvas 2D (ImageData)	WebGL Fragment Shader
Processing time	~320ms	~8ms
Throughput	~52 MP/s	~2100 MP/s
Main thread blocking	320ms	8ms (+ readPixels 15ms)
Memory usage	64MB (ImageData)	64MB (texture + readback)

WebGL delivers a 40x performance advantage for convolution operations.

Multi-Pass Rendering: Chained Filters

Complex effects require multiple passes (e.g., blur → edge detection → composite):

function renderPass(gl: WebGLRenderingContext, program: WebGLProgram, inputTexture: WebGLTexture) {
  gl.useProgram(program);
  gl.bindTexture(gl.TEXTURE_2D, inputTexture);
  gl.drawArrays(gl.TRIANGLES, 0, 6);
}

// Pass 1: Gaussian blur → write to FBO texture
gl.bindFramebuffer(gl.FRAMEBUFFER, blurFBO);
renderPass(gl, blurProgram, sourceTexture);

// Pass 2: Edge detection → write to Canvas
gl.bindFramebuffer(gl.FRAMEBUFFER, null);
renderPass(gl, edgeProgram, blurTexture);

Use Framebuffer Objects (FBO) to store intermediate results in textures, chaining passes together.

Practical Use Cases

These ToolsKu tools can benefit from WebGL acceleration:

Image Compress: Color space transform + downsampling via GPU parallelism
Image Resize: Bilinear/bicubic interpolation via Fragment Shader
Image Transform: Rotation, flip via vertex shader matrix transforms

Common Questions

WebGL or WebGPU?

WebGPU is WebGL's successor, offering Compute Shaders and lower CPU overhead. However, WebGL has better compatibility (95%+ vs 70%+), making it the safer GPU acceleration choice today.

readPixels is too slow—what to do?

If the result only needs to be displayed on the page, render directly to Canvas—no readPixels needed. Only read back when exporting ImageData/Blob.

How to handle large images?

GPU textures have size limits (typically 4096×4096 or 8192×8192). For oversized images, use tiled processing: split into multiple textures, process separately, then stitch.

Summary

WebGL Fragment Shader transforms image processing from CPU serial to GPU massively parallel, achieving 10-100x speedups for convolution and color transform operations. Mastering the full pipeline—texture upload → shader computation → readPixels readback—and FBO multi-pass rendering is essential for building high-performance browser-side image processing tools.