pdf-lib Architecture: Pure JavaScript PDF Create, Edit, and Merge

Why pdf-lib?

Browser PDF libraries are judged on strict criteria:

Library	Size	Create	Edit	Merge	Fonts	Maintenance
pdf-lib	~350KB	✅	✅	✅	✅	Active
pdfjs-dist	~2MB	❌	❌	❌	❌	Mozilla
jsPDF	~300KB	✅	❌	❌	Partial	Active
PDFKit	~1MB	✅	❌	❌	✅	Node-first

pdf-lib is the only pure JS library that can both create and modify PDFs in the browser.

PDF File Format Basics

Internal structure

%PDF-1.7                          ← version header
1 0 obj                           ← object 1
  << /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj                           ← object 2
  << /Type /Pages /Kids [3 0 R] /Count 1 >>
endobj
3 0 obj                           ← object 3
  << /Type /Page /Parent 2 0 R /MediaBox [0 0 612 792] >>
endobj
xref                               ← cross-reference table
0 4
0000000000 65535 f
0000000009 00000 n
0000000058 00000 n
0000000115 00000 n
trailer
  << /Size 4 /Root 1 0 R >>
startxref
190
%%EOF

Core concepts

Concept	Description
Indirect object	`1 0 obj ... endobj`, referenced by number
Dictionary	`<< /Key /Value >>`, like a JSON object
Stream	`stream ... endstream`, binary data (often FlateDecode compressed)
Cross-reference (xref)	Byte offset per object for O(1) random access
Page tree	Nested `/Pages` nodes forming a tree

pdf-lib Architecture

Module layers

PDFDocument (top-level API)
  ├── PDFPage (page operations)
  ├── PDFFont (fonts)
  ├── PDFImage (images)
  └── PDFCatalog (document structure)
       └── PDFContext (low-level object model)
            ├── PDFObject
            ├── PDFDict
            ├── PDFStream
            ├── PDFRef
            └── PDFCrossRefSection

Object model

pdf-lib centers on PDFContext, which holds the document object graph:

class PDFContext {
  // Registry of all indirect objects
  objects: Map<PDFRef, PDFObject>;

  // Assign new object numbers
  assign(ref: PDFRef, object: PDFObject): void;

  // Lookup
  lookup(ref: PDFRef): PDFObject;

  // Delete
  delete(ref: PDFRef): void;
}

How PDF merge works

// Core logic of PDFDocument.copyPages
copyPages(srcDoc: PDFDocument, indices: number[]): PDFPage[] {
  const pages: PDFPage[] = [];

  for (const index of indices) {
    // 1. Get page object from source
    const srcPage = srcDoc.getPage(index);

    // 2. Deep-copy page and all referenced objects
    const copiedPage = this.context.copy(srcPage.node);

    // 3. Register all copied objects in target doc
    //    Including: page dict, content streams, resources, fonts, etc.
    //    Critical: remap all PDFRef to new object numbers

    pages.push(PDFPage.of(copiedPage));
  }

  return pages;
}

Hard part: Deep copy must remap every indirect reference. If object A references B via 2 0 R, B may get a new number in the target doc—you need a mapping table.

Stream Compression

PDF content streams often use FlateDecode (zlib/deflate):

class PDFStream {
  dictionary: PDFDict;
  contents: Uint8Array;

  getContentsString(): string;
  getContentsSize(): number;
}

// Compressed write
const compressed = pako.deflate(rawBytes);
stream.dictionary.set(PDFName.of('Filter'), PDFName.of('FlateDecode'));
stream.contents = compressed;

pdf-lib uses pako (pure JS zlib) for compress/decompress.

Font Embedding

Standard vs custom fonts

PDF defines 14 standard fonts (Helvetica, Times-Roman, etc.) that render without embedding. CJK fonts must be embedded.

ToolsKu font strategy

// Pre-bundled CJK fonts
const fonts = {
  sourceHanSans: await pdfDoc.embedFont(
    await fetch('/fonts/CN/SourceHanSansCN-Regular.otf')
  ),
  sourceHanSansBold: await pdfDoc.embedFont(
    await fetch('/fonts/CN/SourceHanSansCN-Bold.otf')
  ),
};

Font subsetting: pdf-lib subsets fonts to only glyphs used in the document—dramatically smaller files.

Case	Full font	Subset	Reduction
10 CJK chars	~7MB	~15KB	99.8%
100 CJK chars	~7MB	~80KB	98.9%

ToolsKu PDF Toolchain

20+ tools mapped to APIs

Tool	pdf-lib API	Extra libs
Merge	`copyPages()` + `addPage()`	-
Split	New doc + `copyPages()`	-
Rotate	`page.setRotation()`	-
Watermark	`page.drawText()` with opacity	-
Page numbers	`page.drawText()` loop	-
Encrypt	-	@pdfsmaller/pdf-encrypt-lite
Extract text	-	pdfjs-dist
PDF to image	-	pdfjs-dist + canvas
Compress	Strip metadata + optimize streams	-

Encryption: beyond pdf-lib

pdf-lib does not support PDF encryption. ToolsKu uses @pdfsmaller/pdf-encrypt-lite:

import { encrypt } from '@pdfsmaller/pdf-encrypt-lite';

const encryptedPdf = await encrypt(pdfBytes, {
  userPassword: 'user123',
  ownerPassword: 'owner456',
  permissions: {
    printing: true,
    copying: false,
    modifying: false,
  },
});

Performance Practices

Large files

// For 100+ page PDFs, process page-by-page to cap memory
async function processLargePdf(file: File) {
  const pdfDoc = await PDFDocument.load(await file.arrayBuffer());
  const totalPages = pdfDoc.getPageCount();

  for (let i = 0; i < totalPages; i++) {
    const page = pdfDoc.getPage(i);
    // per-page work...
    updateProgress(i / totalPages);
  }
}

Streaming load

pdf-lib does not stream—the entire PDF must load into memory. Files 100MB+ can stress memory.

Mitigation: Show progress for large files and recommend desktop tools for very large PDFs.

Summary

At ~350KB, pdf-lib delivers browser-side PDF create and edit—a remarkable engineering feat. Its PDFContext object graph, indirect reference remapping, and stream compression reflect deep PDF spec knowledge.

ToolsKu builds PDF merge, split, rotate, watermark, and page numbers—20+ tools, all in the browser. With pdfjs-dist (render/extract) and pdf-encrypt-lite (encryption), we have a complete PDF toolchain.