The first time you write a "find all emails" regex, you reach for something like:
\w+@\w+\.\w+
It works on [email protected]. It fails on [email protected]. The character classes are too restrictive in two places, the TLD assumption is wrong, and you have not thought about quoted display names yet. Here is the version I actually use.
The pattern
[\w.+-]+@[\w-]+(?:\.[\w-]+)*\.[a-zA-Z]{2,}
Pieces:
[\w.+-]+- local part. Word characters plus dots, plus signs, and hyphens. This catches gmail-style aliases (name+tag@) and most corporate addresses.@- literal.[\w-]+- first piece of the hostname.(?:\.[\w-]+)*- zero or more additional dot-separated subdomains. This is what makesmail.dept.example.comwork.\.[a-zA-Z]{2,}- dot, then a two-or-more-letter TLD.
It is not RFC-strict. It catches almost every email you encounter in real data. For RFC-strict matching you would need a regex three lines long that nobody can read; the trade-off is not worth it.
The three pitfalls
1. Punctuation at the end
Free-form text often puts an email at the end of a sentence: "Contact [email protected]." The trailing dot is not part of the email, but a naive regex will grab it. Worse, "[email protected]," includes the comma.
The pattern above is safe because the trailing [a-zA-Z]{2,} only accepts letters, so a trailing dot or comma stops the match cleanly. If you use a looser TLD like \w{2,} you re-introduce the problem.
2. Display names wrapped around addresses
Email headers often look like:
"Smith, Alice" <[email protected]>
The pattern above will find [email protected] inside the angle brackets, which is usually what you want. The display name and angle brackets are stripped automatically because they are outside the match.
If you specifically need the display name too, that is a different parsing problem and regex is the wrong tool. Use a real email-parser library.
3. Pseudo-emails in code or URLs
Sometimes a string like name@version or [email protected] appears in code samples or URLs, and you get false positives. The TLD-letters requirement helps here - name@version usually does not match because version includes no dot. [email protected] does match, because example is two-plus letters.
For the last case, the only fix is post-filtering: after matching, throw out any address whose TLD is on a deny-list (example, local, internal, etc.).
The on-phone workflow
Three steps, end to end, on iOS:
- Get the wall of text onto your clipboard or into a regex/text app.
- Run the pattern with the global flag on.
- Copy the matches to a new list.
In Regex Tool or TextLab on iPhone:
- Paste the source text.
- Paste the pattern.
- Open the Matches panel.
- "Copy all matches" puts every result on the clipboard, one per line.
- Paste into Notes, Mail, or wherever.
If the source is too big for the clipboard (anything over ~50 KB on older iPhones), share the text file into the app directly via Share Sheet instead.
Dedup and clean
Real lists are messy. The same email might appear 30 times in the source. Two more passes worth running:
- Lowercase the list.
[email protected]and[email protected]are the same address by the standard. Most tools have a "lowercase" action. - Deduplicate the list. TextLab\'s "Unique lines" action sorts and removes duplicates. Same idea on the command line:
sort -u.
This pair turns a list of 850 raw matches into 312 unique addresses.
One subtle thing about Unicode
Real email addresses can contain non-ASCII characters in the local part and internationalised domain names (IDNs) in the host. alice@δΊ¬.example is a valid address in modern standards.
The pattern above will miss these. Most of the time that is fine - they are rare in English-language data sources. If you specifically need IDN support, swap the TLD piece for something like [\p{L}]{2,} if your regex engine supports Unicode property classes. iOS\' NSRegularExpression does, with the Unicode flag turned on.
Why not just use an "email finder" tool
You can, for one-off jobs. The reason I do it with regex on a phone is that the data sometimes sits on the device and I do not want to upload it. Logs, customer lists, exported CSVs - anything that came from work is something you should not be pasting into a random web tool. Once the regex is in your head, you can run it locally in seconds.