WebCodecs API in Practice: Low-Latency Audio/Video Encoding and Decoding in the Browser

技术架构(Updated Jun 14, 2026)

The Evolution of Browser Audio/Video Processing

Approach Latency Frame-Level Control Codec Access Best For
<video> + MSE 2-5s Container format Playback, streaming
FFmpeg.wasm 1-3s Full Transcoding (but 30MB+ size)
WebCodecs <100ms Native Real-time processing, low-latency transcoding

WebCodecs provides direct access to the browser's built-in codecs—no WASM wrapper needed, 10-50x lower latency.


Core Object Overview

VideoFrame  ←→  VideoEncoder  ←→  EncodedVideoChunk
                                    ↓ Network / Storage
AudioData   ←→  AudioEncoder  ←→  EncodedAudioChunk
                                    ↓
VideoFrame  ←→  VideoDecoder  ←→  EncodedVideoChunk
AudioData   ←→  AudioDecoder  ←→  EncodedAudioChunk

VideoFrame: Zero-Copy Frame Operations

// Create VideoFrame from canvas (zero-copy)
const frame = new VideoFrame(canvas, { timestamp: performance.now() * 1000 });

// Create from ImageBitmap
const bitmap = await createImageBitmap(videoElement);
const frame2 = new VideoFrame(bitmap, { timestamp: 0 });

// Access frame metadata
console.log(frame.width, frame.height, frame.duration, frame.timestamp);

// MUST close when done (otherwise GPU memory leak)
frame.close();

Critical: VideoFrame holds a GPU texture reference. You MUST call close() to release it, or GPU memory will leak.


VideoEncoder: Video Encoding

const encoder = new VideoEncoder({
  output: (chunk, metadata) => {
    const data = new Uint8Array(chunk.byteLength);
    chunk.copyTo(data);
    handleEncodedChunk(chunk.type, data, chunk.timestamp, metadata);
  },
  error: (e) => console.error('Encode error:', e),
});

encoder.configure({
  codec: 'avc1.64001E', // H.264 Main Profile
  width: 1920,
  height: 1080,
  bitrate: 5_000_000, // 5 Mbps
  framerate: 30,
});

// Encode a frame
encoder.encode(frame, { keyFrame: true });
frame.close();

Supported Codecs

Codec codec String Characteristics
H.264 avc1.64001E Widest compatibility
H.265 hev1.1.6.L93.B0 40% better compression, limited support
VP8 vp8 WebM format
VP9 vp9 YouTube default
AV1 av01.0.01M.08 Next-gen, highest compression

VideoDecoder: Video Decoding

const decoder = new VideoDecoder({
  output: (frame) => {
    ctx.drawImage(frame, 0, 0);
    frame.close(); // Must close
  },
  error: (e) => console.error('Decode error:', e),
});

decoder.configure({
  codec: 'avc1.64001E',
  codedWidth: 1920,
  codedHeight: 1080,
});

const chunk = new EncodedVideoChunk({
  type: isKeyFrame ? 'key' : 'delta',
  timestamp: timestamp,
  data: encodedData,
});
decoder.decode(chunk);

AudioEncoder / AudioDecoder

const audioEncoder = new AudioEncoder({
  output: (chunk) => {
    const data = new Uint8Array(chunk.byteLength);
    chunk.copyTo(data);
    handleAudioChunk(data, chunk.timestamp);
  },
  error: (e) => console.error(e),
});

audioEncoder.configure({
  codec: 'aac',
  sampleRate: 48000,
  numberOfChannels: 2,
  bitrate: 128000,
});

const audioData = new AudioData({
  format: 'f32-planar',
  sampleRate: 48000,
  numberOfFrames: 1024,
  numberOfChannels: 2,
  timestamp: 0,
  data: float32Array,
});
audioEncoder.encode(audioData);
audioData.close();

Practice: Video Format Transcoding Pipeline

The core transcoding flow for ToolsKu's Video Convert:

async function transcodeVideo(inputChunks: EncodedVideoChunk[], config: TranscodeConfig) {
  const decodedFrames: VideoFrame[] = [];

  // 1. Decode source video
  const decoder = new VideoDecoder({
    output: (frame) => decodedFrames.push(frame),
    error: (e) => { throw e; },
  });
  decoder.configure(inputConfig);

  for (const chunk of inputChunks) {
    decoder.decode(chunk);
  }
  await decoder.flush();

  // 2. Encode to target format
  const encoder = new VideoEncoder({
    output: (chunk) => muxer.addChunk(chunk),
    error: (e) => { throw e; },
  });
  encoder.configure(config.outputCodec);

  for (const frame of decodedFrames) {
    encoder.encode(frame);
    frame.close();
  }
  await encoder.flush();

  return muxer.finalize();
}

Comparison with MSE: Low-Latency Scenarios

Metric MSE (MediaSource) WebCodecs
First-frame latency 2-5s (buffering strategy) <100ms
Frame-level control ❌ Only appendBuffer ✅ Per-frame encode/decode
Real-time processing ❌ Needs extra muxing ✅ Native support
Memory usage High (buffer queue) Low (per-frame processing)
Best for Video playback Real-time communication, transcoding, analysis

Performance Benchmark: Video Compression

H.264→H.265 transcoding of 1080p 30fps 60s video:

Approach Transcode Time Output Size Latency
FFmpeg.wasm 180s 12MB 3s (WASM init)
WebCodecs 45s 12MB <100ms

WebCodecs leverages native browser codecs—4x faster, and no 30MB+ WASM download needed.


Common Questions

What container formats does WebCodecs support?

WebCodecs only handles raw encoded chunks, not container muxing. You need libraries like mp4-muxer or webm-muxer to package into MP4/WebM.

How to get encoded video data from a file?

Use an MP4 demuxer (e.g., mp4box.js) to demux the MP4 file and extract EncodedVideoChunks to feed into the Decoder.

Can VideoFrame be transferred across Workers?

Yes. VideoFrame implements Transferable—use postMessage(frame, [frame]) for zero-copy transfer to a Worker.


Summary

The WebCodecs API gives browsers direct access to audio/video codecs, enabling zero-copy frame processing and <100ms latency encoding/decoding. VideoEncoder/Decoder handle video, AudioEncoder/Decoder handle audio, and VideoFrame/AudioData provide frame-level operations. Combined with demuxer/muxer libraries, you can build complete low-latency video transcoding pipelines entirely in the browser.

Try these browser-local tools — no sign-up required →

#WebCodecs#视频编解码#AudioData#VideoFrame#低延迟