Learn Regex From Scratch: Master Text Processing in 30 Minutes

开发教程

Why Learn Regex?

One regex line replaces 20 lines of string manipulation code.

Need to extract all IP addresses from 1000 log lines? Without regex—split strings, iterate, validate format—at least 20 lines. With regex: \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}. One line.

This tutorial assumes zero prior knowledge. In 30 minutes, you'll handle 80% of daily text matching needs.


Basic Syntax

Literals and Metacharacters

Character Meaning Example
abc Exact match "abc" hello → matches "hello"
. Any single char (except newline) h.t → hat, hot, hit
\d Digit [0-9] \d{3} → 123, 456
\w Word char [a-zA-Z0-9_] \w+ → hello, test123
\s Whitespace (space, tab, newline) a\sb → "a b"
\D Non-digit \D+ → hello
\W Non-word char \W → @, #, !

Quantifiers

Quantifier Meaning Example
* 0 or more ab*c → ac, abc, abbc
+ 1 or more ab+c → abc, abbc (not ac)
? 0 or 1 colou?r → color, colour
{n} Exactly n \d{4} → 4-digit number
{n,} At least n \d{4,} → 4+ digits
{n,m} n to m \d{3,5} → 3-5 digits

Character Classes

[a-z]     Lowercase letters
[A-Z]     Uppercase letters
[0-9]     Digits
[a-zA-Z]  All letters
[^0-9]    Not a digit (^ inside [] = negation)
[abc]     Match a or b or c

Anchors

Anchor Meaning Example
^ Start of line ^Hello → lines starting with Hello
$ End of line world$ → lines ending with world
\b Word boundary \bcat\b → standalone cat, not catch

Groups and Capturing

(\d{3})-(\d{4})-(\d{4})  Match US phone: 555-0123-4567
$1 = 555, $2 = 0123, $3 = 4567

In ToolsKu Regex Tester, use replacement: $1-****-$3 to mask the middle digits.

Non-Capturing Groups

(?:\d{3})  Groups without capturing (no $1/$2)

Lookahead & Lookbehind (Advanced)

Syntax Meaning Example
(?=...) Followed by... \d+(?=USD) → matches 100 in "100USD"
(?!...) Not followed by... \d+(?!USD) → skip if followed by USD
(?<=...) Preceded by... (?<=\$)\d+ → matches 100 in "$100"
(?<!...) Not preceded by... (?<!\$)\d+ → skip if preceded by $

Practical Regex Recipes

Email:      [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
Phone:      \d{3}-\d{3}-\d{4}
URL:        https?://[^\s/$.?#].[^\s]*
Date:       \d{4}-\d{2}-\d{2}
IP address: \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
Hex color:  #[0-9a-fA-F]{6}

Common Pitfalls

1. Greedy vs Lazy

Greedy: <.*>   Matches "<div>hello</div>" → entire string
Lazy:   <.*?>  Matches "<div>hello</div>" → only <div> and </div>

Add ? after quantifier to make it lazy.

2. Catastrophic Backtracking

# Dangerous
(a+)+b  Matching "aaaaaaaaaaaaaaaaaaaa" → performance explosion

# Safe
a+b     Simple, no nested quantifiers

3. . Doesn't Match Newlines

# Match multiline content
[\s\S]*  Instead of .* (matches everything including newlines)

Practice Exercises

Use ToolsKu Regex Tester for these:

  1. Extract all phone numbers from: "Call 555-0123 or 555-4567 for support"
  2. Replace "2026-06-03" with "06/03/2026"
  3. Find all duplicate words in a text

Complete these three exercises and you've learned regex. Pair with Text Replace and Text Diff for a complete text processing workflow.

Try these browser-local tools — no sign-up required →

#正则表达式#Regex#文本处理#编程入门#开发工具