頂客論壇 » 程式設計 » Perl, CGI » Perl的基本語法 - Regular Expressions

Perl的基本語法 - Regular Expressions

蔡逸竹

孤獨者 - 心在黑暗裡旅行 ...... ...

至尊會員

Rank: 7 Rank: 7 Rank: 7 Rank: 7 Rank: 7 Rank: 7 Rank: 7

遊蕩的過客 - 枉入紅塵若許年 .. ...

UID: 18273
帖子: 12122
精華: 1273
積分: 240737
金幣: 492641
威望: 5001
推廣: 0
閱讀權限: 99
來自: 尋覓中
註冊時間: 2006-6-24
最後登錄: 2015-5-27

總版主典獄長榮譽勳章陽光會員宣傳大使榮退獎章糾察隊長鍾愛一生勳章大學博士勳章守護天使勳章忠誠勳章高級糾察社區建設獎章社區巡守獎章社界宗師勳章資源專家勳章傳學大師勳章文壇大家勳章點評專家勳章終身成就獎章熱心助人獎章最佳分享獎章

發短消息
加為好友
當前離線

1樓大中小發表於 2006-11-30 23:06

Perl的基本語法 - Regular Expressions

Regular Expressions Regular Expression通常是用來尋找特定的字串樣式(pattern)，也就是所謂格式辨認(pattern-matching)的功能。它的運算子是『=~』和『!~』，可以把它念做match和not match。 Syntax: $string =~ /regular expression/expression modifier 例：$sentence =~ /Hello/
[size=+1](a) Modifiers：修飾選項可有可無，它是用來對整個敘述作修正的。

g	Match globally, i.e. find all occurrences.
i	Makes the search case-insensitive.
m	If the string has new-line characters embedded within it, the metacharacters ^ and $ will not work correctly. This modifier tells Perl to treat this line as a multiple line.
o	Only compile pattern once.
s	The character . matches any character except a new line. This modifier treats this line as a single line, which allows . to match a new-line character.
x	Allows white space in the expression.

[size=+1](b) Metacharacter：下面這些字元都具有特殊意義，可以讓你建立更複雜的搜尋樣式(searching pattern)。

\	Tells Perl to accept the following characters as a regular character; this removes special meanings from any metacharacter.
^	Matches the beginning of the string, unless /m is used.
.	Matches any character except a new line character, unless /s is used.
$	Matches the end of the string, unless /m is used.
\|	Expresses alternation. This means the expressions will search for multiple patterns in the same string.
( )	Groups expressions to assist in alternation and back referencing.
[ ]	Looks for a set of characters.

[size=+1](c) Pattern Quantifier：用來表示字元的數量關係。

*	Matchs 0 or more times.
+	Matchs 1 or more times.
?	Matchs 0 or 1 times.
{n}	Matches exactly n times.
{n,}	Matches at least n times.
{n,m}	Matches at least n times but no more than m times.

[size=+1](d) Character Patterns：下列的sequence用來match一些特定格式的字元：

\r	Carriage return(CR), ASCII 13(十進位)
\n	New line, UNIX中代表ASCII 10(十進位), DOS(Windows)系統中則是ASCII 13 + ASCII 10(十進位).
\t	Tab, ASCII 9(十進位)
\w	Matches an alphanumericcharacter. Alphanumeric also includes _. 即 [A-Za-z0-9_].
\W	Matches a nonalphanumeric character. 即 [^A-Za-z0-9_].
\s	Matches a white space character. This includes space, tab, FormFeed and CR/LF. 即 [\ \t\f\r\n].
\S	Matches a non-whote space character. 即 [^\ \t\f\r\n].
\d	Matches a digit. 即 [0-9].
\D	Matches a nondigit character. 即 [^0-9].
\b	Matches a word boundary.
\B	Matches a nonword boundary.
\033	octal char
\x1B	hex char

[size=+1](e) Examples：Regular Expression這個東東非常強大、非常重要，但是對初學者來說簡直是個惡夢，記得我當初剛接觸時也是霧煞煞的，就算現在的我也不敢說全懂了:p 但你若瞭解了它的基本技巧後，包你愛不釋手，每每為它強大的功能讚歎。上面那些表格相信你也是有看沒有懂，這種東西要借由範例入門比較快，下面我列出一些基本範例，希望能幫助你瞭解它的基本技巧。 [size=+1]/abc/ 找到含有abc的字串[size=+1]/^abc/ 找到開頭是abc的字串[size=+1]/abc$/ 找到結尾是abc的字串[size=+1]/a|b/ 找到有a或b的字串，也可以用來找整個字(word)[size=+1]/ab{2,4}c/ 找到a後面跟著2-4個b，再跟著c的字串，若只有/ab{2,}c/則會找二個以上的b[size=+1]/ab*c/ 找到a後面跟著0個或多個b，再跟著c的字串，如同/ab{0,}c/[size=+1]/ab+c/ 找到a後面跟著一個以上的b，再跟著c的字串，如同/ab{1,}c/[size=+1]/a.c/ .可以代表任何字元，除了new line字元(\n)外。[size=+1]/[abc]/ 找到含有這三個字元中任何一個的字串[size=+1]/\d/ 找到含有數字的字串，如同/[0-9]/[size=+1]/\w/ 找到含有字母的字串，如同/[a-zA-Z0-9_]/[size=+1]/\s/ 找到含有white space的字串，如同/[ \t\r\n\f]/[size=+1]/[^abc]/ 找到沒有abc任一字元的字串[size=+1]/\*/ 找到含有字元*的字串，在反斜線"\"後面的字元Perl會把它當作普通字元看待。若你不確定這個符號是否為特殊字元，乾脆全加上\以策安全。[size=+1]/abc/i 忽略abc的大小寫[size=+1]/(\d+)\.(\d+)\.(\d+)\.(\d+)/ 找到類似IP的字串，並將IP的四個數字分別存在$1,$2,$3,$4四個特殊變數中，以便在其後加以利用。例： [size=+1]if ($x =~ /(\d+\.\d+)\.\d+\.\d+/) {
print "海洋大學" if ($1 eq "140.121");
}

[size=+1]m//gimosx m命令可以讓你自訂pattern的分隔符號，而gimosx則是它的修飾選項，請參看(a)Modifiers。例如： [size=+1]$url="my.machine.tw:8080/cgi-bin/test.pl";
($host, $port, $file)=($url=~m|http://([^/:]+):{0,1}(\d*)(\S*)$|); 這個Regular Expression相當複雜，主要目的是分析指定的URL，然後取得host名稱、port號碼及對應的檔案。我一項項慢慢解釋： [size=+1]$url=~m||
m後面跟著的就是分隔符號，| |裡面的就是pattern。 [size=+1]([^/:]+)
match一個字串，裡面沒有/和:字元。找到的字串存在$1中。 [size=+1]:{0,1}(\d*)
match 0或1個:，後面跟著一串數字或nothing。找到的字串存在$2中，若找不到，$2就是空的。 [size=+1](\S*)$
match一串非空白字元，並以找到的字串為結尾。找到的字串存在$3中。 [size=+1]()=()
($host, $port, $file)=($1, $2, $3)
即$host="my.machine.tw"
$port=8080
$file="/cgi-bin/test.pl"
[size=+1]s/PATTERN/REPLACEMENT/egimox 沒錯，這就是取代的命令。它會尋找符合PATTERN的字串，並取代成REPLACEMENT字串。它的修飾選項多了e選項，其他的和上面都一樣，我將它列表如下：

e	Evaluate the right side as an expression.
g	Replace globally, i.e. all occurrences.
i	Do case-insensitive pattern matching.
m	Treat string as multiple lines.
o	Only compile pattern once.
s	Treat string as single line.
x	Use extended regular expressions.

例：
[size=+1]$x =~ s/\s+//g
把所有的white space全部去除掉 [size=+1]$x =~ s/([^ ]*):*([^ ]*)/$2

1/
把用":"分開的兩個欄位互相對調 [size=+1]$path =~ s|/usr/bin|/usr/local/bin|
它也可以讓你自訂分隔符號哦
[size=+1]tr/SEARCHLIST/REPLACEMENTLIST/cds 這是也是取代的命令，和上一個不同的是SEARCHLIST和REPLACEMENTLIST只能是普通字串，而不是Regular Expression，所以速度比較快。它的修飾選項也比較少：

c	Complement the SEARCHLIST.
d	Delete found but unreplaced characters.
s	Squash duplicate replaced characters.

例：
[size=+1]$x =~ tr/this/that/
把"this"替換成"that" [size=+1]$x =~ tr/a-z/A-Z/
把小寫字母全部替換成大寫字母 [size=+1]$count = $x =~ tr/*/*/
計算$x中有幾個"*"

TOP

‹‹ 上一主題 | 下一主題 ››

本站所有圖文均屬網友發表,僅代表作者的觀點與本站無關,如有侵權請通知版主會盡快刪除。