聊一聊 各种验证邮件地址的正则表达式
用正则表达式验证邮件地址似乎是一件简单的事情,但是如果要完美的验证一个合规的邮件地址,其实也许很复杂。邮件地址的规范来自于 RFC 5322 。http://www.linuxprobe.com/wp-content/uploads/2017/02/155648gil3ndk3885gn331-2.jpg
聊一聊 各种验证邮件地址的正则表达式聊一聊 各种验证邮件地址的正则表达式邮件地址的规范来自于 RFC 5322 。有一个网站 emailregex.com 专门列出各种编程语言下的验证邮件地址的正则表达式,其中很多正则表达式都是我听说过而从未见过的复杂——我想说,做这个网站的程序员是被邮件验证这件事伤害了多深啊!
其实,在产品环境中,一般来说并不需要这么复杂的正则表达式来做到99.99%正确。一般来说,从执行效率和测试覆盖率来说,只需要一个简单的版本即可:
/^+@+/.{2,4}$/i
那么下面我们来看看这些更严谨、更复杂的正则表达式吧:
验证邮件地址的通用正则表达式(符合 RFC 5322 标准)
(?:+(?:/.+)*|"(?:|//)*")@(?:(?:(?:*)?/.)+(?:*)?|/[(?:(?:25|2|??)/.){3}(?:25|2|??|*:(?:|//)+)/])
由于各种语言对正则表达式的支持不同、语法差异和覆盖率不同,所以,不同语言里面的正则表达式也不同:
Python
这个是个简单的版本:
r"(^+@+/.+$)"
Javascript
这个有点复杂了:
/^[-a-z0-9~!$%^&*_=+}{/'?]+(/.[-a-z0-9~!$%^&*_=+}{/'?]+)*@([-a-z0-9_]*(/.[-a-z0-9_]+)*/.(aero|arpa|biz|com|coop|edu|gov|info|int|mil|museum|name|net|org|pro|travel|mobi|)|({1,3}/.{1,3}/.{1,3}/.{1,3}))(:{1,5})?$/i
Swift
+@+//.{2,6}
PHP
PHP 的这个版本就更复杂了,覆盖率就更大一些:
/^(?!(?:(?:/x22?/x5C/x22?)|(?:/x22?[^/x5C/x22]/x22?)){255,})(?!(?:(?:/x22?/x5C/x22?)|(?:/x22?[^/x5C/x22]/x22?)){65,}@)(?:(?:+)|(?:/x22(?:|(?:/x5C))*/x22))(?:/.(?:(?:+)|(?:/x22(?:|(?:/x5C))*/x22)))*@(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?+(?:-+)*/.){1,126}){1,}(?:(?:*)|(?:(?:xn--)+))(?:-+)*)|(?:/[(?:(?:IPv6:(?:(?:{1,4}(?::{1,4}){7})|(?:(?!(?:.*[:/]]){7,})(?:{1,4}(?::{1,4}){0,5})?::(?:{1,4}(?::{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:{1,4}(?::{1,4}){5}:)|(?:(?!(?:.*:){5,})(?:{1,4}(?::{1,4}){0,3})?::(?:{1,4}(?::{1,4}){0,3}:)?)))?(?:(?:25)|(?:2)|(?:1{2})|(?:?))(?:/.(?:(?:25)|(?:2)|(?:1{2})|(?:?))){3}))/]))$/iD
Perl / Ruby
对与 PHP 的版本,Perl 和 Ruby 表示不服,可以更严谨:
(?:(?:/r/n)?[ /t])*(?:(?:(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|"(?:[^/"/r//]|//.|(?:(?:/r/n)?[ /t]))*"(?:(?:/r/n)?[ /t])*)(?:/.(?:(?:/r/n)?[ /t])*(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|"(?:[^/"/r//]|//.|(?:(?:/r/n)?))*"(?:(?:/r/n)?[ /t])*))*@(?:(?:/r/n)?[ /t])*(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[ /t])*)(?:/.(?:(?:/r/n)?[ /t])*(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[ /t])*))*|(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|"(?:[^/"/r//]|//.|(?:(?:/r/n)?[ /t]))*"(?:(?:/r/n)?[ /t])*)*/<(?:(?:/r/n)?[ /t])*(?:@(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?)*)(?:/.(?:(?:/r/n)?[ /t])*(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[ /t])*))*(?:,@(?:(?:/r/n)?[ /t])*(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?)+|/Z|(?=[/["()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[ /t])*)(?:/.(?:(?:/r/n)?[ /t])*(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[ /t])*))*)*:(?:(?:/r/n)?[ /t])*)?(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|"(?:[^/"/r//]|//.|(?:(?:/r/n)?[ /t]))*"(?:(?:/r/n)?[ /t])*)(?:/.(?:(?:/r/n)?[ /t])*(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|"(?:[^/"/r//]|//.|(?:(?:/r/n)?[ /t]))*"(?:(?:/r/n)?[ /t])*))*@(?:(?:/r/n)?[ /t])*(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[ /t])*)(?:/.(?:(?:/r/n)?[ /t])*(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[ /t])*))*/>(?:(?:/r/n)?[ /t])*)|(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|"(?:[^/"/r//]|//.|(?:(?:/r/n)?[ /t]))*"(?:(?:/r/n)?[ /t])*)*:(?:(?:/r/n)?[ /t])*(?:(?:(?:[^()<>@,;://"./[/]/000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|"(?:[^/"/r//]|//.|(?:(?:/r/n)?[ /t]))*"(?:(?:/r/n)?[ /t])*)(?:/.(?:(?:/r/n)?[ /t])*(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|"(?:[^/"/r//]|//.|(?:(?:/r/n)?[ /t]))*"(?:(?:/r/n)?[ /t])*))*@(?:(?:/r/n)?[ /t])*(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[ /t])*)(?:/.(?:(?:/r/n)?[ /t])*(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[ /t])*))*|(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|"(?:[^/"/r//]|//.|(?:(?:/r/n)?[ /t]))*"(?:(?:/r/n)?[ /t])*)*/<(?:(?:/r/n)?[ /t])*(?:@(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[ /t])*)(?:/.(?:(?:/r/n)?[ /t])*(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[ /t])*))*(?:,@(?:(?:/r/n)?[ /t])*(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[ /t])*)(?:/.(?:(?:/r/n)?[ /t])*(?:[^()<>@,;://"./[/]/000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[ /t])*))*)*:(?:(?:/r/n)?[ /t])*)?(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|"(?:[^/"/r//]|//.|(?:(?:/r/n)?[ /t]))*"(?:(?:/r/n)?[ /t])*)(?:/.(?:(?:/r/n)?[ /t])*(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|"(?:[^/"/r//]|//.|(?:(?:/r/n)?[ /t]))*"(?:(?:/r/n)?[ /t])*))*@(?:(?:/r/n)?[ /t])*(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[ /t])*)(?:/.(?:(?:/r/n)?[ /t])*(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[ /t])*))*/>(?:(?:/r/n)?[ /t])*)(?:,/s*(?:(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|"(?:[^/"/r//]|//.|(?:(?:/r/n)?[ /t]))*"(?:(?:/r/n)?[ /t])*)(?:/.(?:(?:/r/n)?[ /t])*(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|"(?:[^/"/r//]|//.|(?:(?:/r/n)?[ /t]))*"(?:(?:/r/n)?[ /t])*))*@(?:(?:/r/n)?[ /t])*(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[ /t])*)(?:/.(?:(?:/r/n)?[ /t])*(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[ /t])*))*|(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|"(?:[^/"/r//]|//.|(?:(?:/r/n)?[ /t]))*"(?:(?:/r/n)?[ /t])*)*/<(?:(?:/r/n)?[ /t])*(?:@(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[ /t])*)(?:/.(?:(?:/r/n)?[ /t])*(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[ /t])*))*(?:,@(?:(?:/r/n)?)*(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[ /t])*)(?:/.(?:(?:/r/n)?[ /t])*(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[ /t])*))*)*:(?:(?:/r/n)?[ /t])*)?(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|"(?:[^/"/r//]|//.|(?:(?:/r/n)?[ /t]))*"(?:(?:/r/n)?[ /t])*)(?:/.(?:(?:/r/n)?[ /t])*(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|"(?:[^/"/r//]|//.|(?:(?:/r/n)?[ /t]))*"(?:(?:/r/n)?[ /t])*))*@(?:(?:/r/n)?[ /t])*(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[ /t])*)(?:/.(?:(?:/r/n)?[ /t])*(?:[^()<>@,;://"./[/] /000-/031]+(?:(?:(?:/r/n)?[ /t])+|/Z|(?=[/["()<>@,;://"./[/]]))|/[([^/[/]/r//]|//.)*/](?:(?:/r/n)?[ /t])*))*/>(?:(?:/r/n)?[ /t])*))*)?;/s
Perl 5.10 及以后版本
上面的版本,嗯,我可以说是天书吗?反正我是没有解读的想法了。当然,新版本的 Perl 语言还有一个更易读的版本(你是说真的么?)
/(?(DEFINE)
(?<address> (?&mailbox) | (?&group))
(?<mailbox> (?&name_addr) | (?&addr_spec))
(?<name_addr> (?&display_name)? (?&angle_addr))
(?<angle_addr> (?&CFWS)? < (?&addr_spec) > (?&CFWS)?)
(?<group> (?&display_name) : (?:(?&mailbox_list) | (?&CFWS))? ;
(?&CFWS)?)
(?<display_name> (?&phrase))
(?<mailbox_list> (?&mailbox) (?: , (?&mailbox))*)
(?<addr_spec> (?&local_part) /@ (?&domain))
(?<local_part> (?&dot_atom) | (?"ed_string))
(?<domain> (?&dot_atom) | (?&domain_literal))
(?<domain_literal> (?&CFWS)? /[ (?: (?&FWS)? (?&dcontent))* (?&FWS)?
/] (?&CFWS)?)
(?<dcontent> (?&dtext) | (?"ed_pair))
(?<dtext> (?&NO_WS_CTL) | )
(?<atext> (?&ALPHA) | (?&DIGIT) | [!#/$%&'*+-/=?^_`{|}~])
(?<atom> (?&CFWS)? (?&atext)+ (?&CFWS)?)
(?<dot_atom> (?&CFWS)? (?&dot_atom_text) (?&CFWS)?)
(?<dot_atom_text> (?&atext)+ (?: /. (?&atext)+)*)
(?<text> )
(?<quoted_pair> // (?&text))
(?<qtext> (?&NO_WS_CTL) | )
(?<qcontent> (?&qtext) | (?"ed_pair))
(?<quoted_string> (?&CFWS)? (?&DQUOTE) (?:(?&FWS)? (?&qcontent))*
(?&FWS)? (?&DQUOTE) (?&CFWS)?)
(?<word> (?&atom) | (?"ed_string))
(?<phrase> (?&word)+)
# Folding white space
(?<FWS> (?: (?&WSP)* (?&CRLF))? (?&WSP)+)
(?<ctext> (?&NO_WS_CTL) | )
(?<ccontent> (?&ctext) | (?"ed_pair) | (?&comment))
(?<comment> /( (?: (?&FWS)? (?&ccontent))* (?&FWS)? /) )
(?<CFWS> (?: (?&FWS)? (?&comment))*
(?: (?:(?&FWS)? (?&comment)) | (?&FWS)))
# No whitespace control
(?<NO_WS_CTL> )
(?<ALPHA> )
(?<DIGIT> )
(?<CRLF> /x0d /x0a)
(?<DQUOTE> ")
(?<WSP> )
)
(?&address)/x
Ruby (简单版)
Ruby 表示,其实人家还有个简单版本:
//A(.?)+@+(/.+)*/.+/z/i
.NET
这样的版本谁没有啊——.NET 说:
^/w+([-+.']/w+)*@/w+([-.]/w+)*/./w+([-.]/w+)*$
grep 命令
用 grep 命令在文件中查找邮件地址,我想你不会写个若干行的正则表达式吧,意思一下就行了:
$ grep -E -o "/b+@+/.{2,6}/b" filename.txt
SQL Server
在 SQL Server 中也是可以用正则表达式的,不过这个代码片段应该是来自某个产品环境中的,所以,还体贴的照顾了那些把邮件地址写错的人:
select email
from table_name where
patindex ('%[ &'',":;!+=//()<>]%', email) > 0 -- Invalid characters
or patindex ('[@.-_]%', email) > 0 -- Valid but cannot be starting character
or patindex ('%[@.-_]', email) > 0 -- Valid but cannot be ending character
or email not like '%@%.%' -- Must contain at least one @ and one .
or email like '%..%' -- Cannot have two periods in a row
or email like '%@%@%' -- Cannot have two @ anywhere
or email like '%.@%' or email like '%@.%' -- Cannot have @ and . next to each other
or email like '%.cm' or email like '%.co' -- Camaroon or Colombia? Typos.
or email like '%.or' or email like '%.ne' -- Missing last letter
Oracle PL/SQL
这个是不是有点偷懒?尤其是在那些“复杂”的正则表达式之后:
SELECT email
FROM table_name
WHERE REGEXP_LIKE (email, '+@+/.{2,4}');
MySQL
好吧,看来最后也一样懒:
SELECT * FROM `users` WHERE `email` NOT REGEXP '^+@+/.{2,4}$';
那么,你有没有关于验证邮件地址的正则表达式分享给大家?
原文来:http://www.linuxprobe.com/regular-expression-emailregex.html
页:
[1]