Learn Regex From Scratch: Master Text Processing in 30 Minutes
开发教程
Why Learn Regex?
One regex line replaces 20 lines of string manipulation code.
Need to extract all IP addresses from 1000 log lines? Without regex—split strings, iterate, validate format—at least 20 lines. With regex: \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}. One line.
This tutorial assumes zero prior knowledge. In 30 minutes, you'll handle 80% of daily text matching needs.
Basic Syntax
Literals and Metacharacters
| Character | Meaning | Example |
|---|---|---|
abc |
Exact match "abc" | hello → matches "hello" |
. |
Any single char (except newline) | h.t → hat, hot, hit |
\d |
Digit [0-9] | \d{3} → 123, 456 |
\w |
Word char [a-zA-Z0-9_] | \w+ → hello, test123 |
\s |
Whitespace (space, tab, newline) | a\sb → "a b" |
\D |
Non-digit | \D+ → hello |
\W |
Non-word char | \W → @, #, ! |
Quantifiers
| Quantifier | Meaning | Example |
|---|---|---|
* |
0 or more | ab*c → ac, abc, abbc |
+ |
1 or more | ab+c → abc, abbc (not ac) |
? |
0 or 1 | colou?r → color, colour |
{n} |
Exactly n | \d{4} → 4-digit number |
{n,} |
At least n | \d{4,} → 4+ digits |
{n,m} |
n to m | \d{3,5} → 3-5 digits |
Character Classes
[a-z] Lowercase letters
[A-Z] Uppercase letters
[0-9] Digits
[a-zA-Z] All letters
[^0-9] Not a digit (^ inside [] = negation)
[abc] Match a or b or c
Anchors
| Anchor | Meaning | Example |
|---|---|---|
^ |
Start of line | ^Hello → lines starting with Hello |
$ |
End of line | world$ → lines ending with world |
\b |
Word boundary | \bcat\b → standalone cat, not catch |
Groups and Capturing
(\d{3})-(\d{4})-(\d{4}) Match US phone: 555-0123-4567
$1 = 555, $2 = 0123, $3 = 4567
In ToolsKu Regex Tester, use replacement: $1-****-$3 to mask the middle digits.
Non-Capturing Groups
(?:\d{3}) Groups without capturing (no $1/$2)
Lookahead & Lookbehind (Advanced)
| Syntax | Meaning | Example |
|---|---|---|
(?=...) |
Followed by... | \d+(?=USD) → matches 100 in "100USD" |
(?!...) |
Not followed by... | \d+(?!USD) → skip if followed by USD |
(?<=...) |
Preceded by... | (?<=\$)\d+ → matches 100 in "$100" |
(?<!...) |
Not preceded by... | (?<!\$)\d+ → skip if preceded by $ |
Practical Regex Recipes
Email: [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
Phone: \d{3}-\d{3}-\d{4}
URL: https?://[^\s/$.?#].[^\s]*
Date: \d{4}-\d{2}-\d{2}
IP address: \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
Hex color: #[0-9a-fA-F]{6}
Common Pitfalls
1. Greedy vs Lazy
Greedy: <.*> Matches "<div>hello</div>" → entire string
Lazy: <.*?> Matches "<div>hello</div>" → only <div> and </div>
Add ? after quantifier to make it lazy.
2. Catastrophic Backtracking
# Dangerous
(a+)+b Matching "aaaaaaaaaaaaaaaaaaaa" → performance explosion
# Safe
a+b Simple, no nested quantifiers
3. . Doesn't Match Newlines
# Match multiline content
[\s\S]* Instead of .* (matches everything including newlines)
Practice Exercises
Use ToolsKu Regex Tester for these:
- Extract all phone numbers from: "Call 555-0123 or 555-4567 for support"
- Replace "2026-06-03" with "06/03/2026"
- Find all duplicate words in a text
Complete these three exercises and you've learned regex. Pair with Text Replace and Text Diff for a complete text processing workflow.
Try these browser-local tools — no sign-up required →
#正则表达式#Regex#文本处理#编程入门#开发工具