pdf-lib Architecture: Pure JavaScript PDF Create, Edit, and Merge
Why pdf-lib?
Browser PDF libraries are judged on strict criteria:
| Library | Size | Create | Edit | Merge | Fonts | Maintenance |
|---|---|---|---|---|---|---|
| pdf-lib | ~350KB | ✅ | ✅ | ✅ | ✅ | Active |
| pdfjs-dist | ~2MB | ❌ | ❌ | ❌ | ❌ | Mozilla |
| jsPDF | ~300KB | ✅ | ❌ | ❌ | Partial | Active |
| PDFKit | ~1MB | ✅ | ❌ | ❌ | ✅ | Node-first |
pdf-lib is the only pure JS library that can both create and modify PDFs in the browser.
PDF File Format Basics
Internal structure
%PDF-1.7 ← version header
1 0 obj ← object 1
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj ← object 2
<< /Type /Pages /Kids [3 0 R] /Count 1 >>
endobj
3 0 obj ← object 3
<< /Type /Page /Parent 2 0 R /MediaBox [0 0 612 792] >>
endobj
xref ← cross-reference table
0 4
0000000000 65535 f
0000000009 00000 n
0000000058 00000 n
0000000115 00000 n
trailer
<< /Size 4 /Root 1 0 R >>
startxref
190
%%EOF
Core concepts
| Concept | Description |
|---|---|
| Indirect object | 1 0 obj ... endobj, referenced by number |
| Dictionary | << /Key /Value >>, like a JSON object |
| Stream | stream ... endstream, binary data (often FlateDecode compressed) |
| Cross-reference (xref) | Byte offset per object for O(1) random access |
| Page tree | Nested /Pages nodes forming a tree |
pdf-lib Architecture
Module layers
PDFDocument (top-level API)
├── PDFPage (page operations)
├── PDFFont (fonts)
├── PDFImage (images)
└── PDFCatalog (document structure)
└── PDFContext (low-level object model)
├── PDFObject
├── PDFDict
├── PDFStream
├── PDFRef
└── PDFCrossRefSection
Object model
pdf-lib centers on PDFContext, which holds the document object graph:
class PDFContext {
// Registry of all indirect objects
objects: Map<PDFRef, PDFObject>;
// Assign new object numbers
assign(ref: PDFRef, object: PDFObject): void;
// Lookup
lookup(ref: PDFRef): PDFObject;
// Delete
delete(ref: PDFRef): void;
}
How PDF merge works
// Core logic of PDFDocument.copyPages
copyPages(srcDoc: PDFDocument, indices: number[]): PDFPage[] {
const pages: PDFPage[] = [];
for (const index of indices) {
// 1. Get page object from source
const srcPage = srcDoc.getPage(index);
// 2. Deep-copy page and all referenced objects
const copiedPage = this.context.copy(srcPage.node);
// 3. Register all copied objects in target doc
// Including: page dict, content streams, resources, fonts, etc.
// Critical: remap all PDFRef to new object numbers
pages.push(PDFPage.of(copiedPage));
}
return pages;
}
Hard part: Deep copy must remap every indirect reference. If object A references B via 2 0 R, B may get a new number in the target doc—you need a mapping table.
Stream Compression
PDF content streams often use FlateDecode (zlib/deflate):
class PDFStream {
dictionary: PDFDict;
contents: Uint8Array;
getContentsString(): string;
getContentsSize(): number;
}
// Compressed write
const compressed = pako.deflate(rawBytes);
stream.dictionary.set(PDFName.of('Filter'), PDFName.of('FlateDecode'));
stream.contents = compressed;
pdf-lib uses pako (pure JS zlib) for compress/decompress.
Font Embedding
Standard vs custom fonts
PDF defines 14 standard fonts (Helvetica, Times-Roman, etc.) that render without embedding. CJK fonts must be embedded.
ToolsKu font strategy
// Pre-bundled CJK fonts
const fonts = {
sourceHanSans: await pdfDoc.embedFont(
await fetch('/fonts/CN/SourceHanSansCN-Regular.otf')
),
sourceHanSansBold: await pdfDoc.embedFont(
await fetch('/fonts/CN/SourceHanSansCN-Bold.otf')
),
};
Font subsetting: pdf-lib subsets fonts to only glyphs used in the document—dramatically smaller files.
| Case | Full font | Subset | Reduction |
|---|---|---|---|
| 10 CJK chars | ~7MB | ~15KB | 99.8% |
| 100 CJK chars | ~7MB | ~80KB | 98.9% |
ToolsKu PDF Toolchain
20+ tools mapped to APIs
| Tool | pdf-lib API | Extra libs |
|---|---|---|
| Merge | copyPages() + addPage() |
- |
| Split | New doc + copyPages() |
- |
| Rotate | page.setRotation() |
- |
| Watermark | page.drawText() with opacity |
- |
| Page numbers | page.drawText() loop |
- |
| Encrypt | - | @pdfsmaller/pdf-encrypt-lite |
| Extract text | - | pdfjs-dist |
| PDF to image | - | pdfjs-dist + canvas |
| Compress | Strip metadata + optimize streams | - |
Encryption: beyond pdf-lib
pdf-lib does not support PDF encryption. ToolsKu uses @pdfsmaller/pdf-encrypt-lite:
import { encrypt } from '@pdfsmaller/pdf-encrypt-lite';
const encryptedPdf = await encrypt(pdfBytes, {
userPassword: 'user123',
ownerPassword: 'owner456',
permissions: {
printing: true,
copying: false,
modifying: false,
},
});
Performance Practices
Large files
// For 100+ page PDFs, process page-by-page to cap memory
async function processLargePdf(file: File) {
const pdfDoc = await PDFDocument.load(await file.arrayBuffer());
const totalPages = pdfDoc.getPageCount();
for (let i = 0; i < totalPages; i++) {
const page = pdfDoc.getPage(i);
// per-page work...
updateProgress(i / totalPages);
}
}
Streaming load
pdf-lib does not stream—the entire PDF must load into memory. Files 100MB+ can stress memory.
Mitigation: Show progress for large files and recommend desktop tools for very large PDFs.
Summary
At ~350KB, pdf-lib delivers browser-side PDF create and edit—a remarkable engineering feat. Its PDFContext object graph, indirect reference remapping, and stream compression reflect deep PDF spec knowledge.
ToolsKu builds PDF merge, split, rotate, watermark, and page numbers—20+ tools, all in the browser. With pdfjs-dist (render/extract) and pdf-encrypt-lite (encryption), we have a complete PDF toolchain.
Try these browser-local tools — no sign-up required →