發新話題

Perl的基本語法 - Regular Expressions

Perl的基本語法 - Regular Expressions

Regular Expressions Regular Expression通常是用來尋找特定的字串樣式(pattern),也就是所謂格式辨認(pattern-matching)的功能。它的運算子是『=~』和『!~』,可以把它念做match和not match。 Syntax: $string =~ /regular expression/expression modifier 例:$sentence =~ /Hello/
[size=+1](a) Modifiers:修飾選項可有可無,它是用來對整個敘述作修正的。
g Match globally, i.e. find all occurrences.
i Makes the search case-insensitive.
m If the string has new-line characters embedded within it, the metacharacters ^ and $ will not work correctly. This modifier tells Perl to treat this line as a multiple line.
o Only compile pattern once.
s The character . matches any character except a new line. This modifier treats this line as a single line, which allows . to match a new-line character.
x Allows white space in the expression.

[size=+1](b) Metacharacter:下面這些字元都具有特殊意義,可以讓你建立更複雜的搜尋樣式(searching pattern)。
\Tells Perl to accept the following characters as a regular character; this removes special meanings from any metacharacter.
^ Matches the beginning of the string, unless /m is used.
. Matches any character except a new line character, unless /s is used.
$ Matches the end of the string, unless /m is used.
| Expresses alternation. This means the expressions will search for multiple patterns in the same string.
( ) Groups expressions to assist in alternation and back referencing.
[ ] Looks for a set of characters.

[size=+1](c) Pattern Quantifier:用來表示字元的數量關係。
* Matchs 0 or more times.
+ Matchs 1 or more times.
? Matchs 0 or 1 times.
{n} Matches exactly n times.
{n,} Matches at least n times.
{n,m} Matches at least n times but no more than m times.

[size=+1](d) Character Patterns:下列的sequence用來match一些特定格式的字元:
\r Carriage return(CR), ASCII 13(十進位)
\n New line, UNIX中代表ASCII 10(十進位), DOS(Windows)系統中則是ASCII 13 + ASCII 10(十進位).
\t Tab, ASCII 9(十進位)
\w Matches an alphanumericcharacter. Alphanumeric also includes _. 即 [A-Za-z0-9_].
\W Matches a nonalphanumeric character. 即 [^A-Za-z0-9_].
\s Matches a white space character. This includes space, tab, FormFeed and CR/LF. 即 [\ \t\f\r\n].
\S Matches a non-whote space character. 即 [^\ \t\f\r\n].
\d Matches a digit. 即 [0-9].
\D Matches a nondigit character. 即 [^0-9].
\b Matches a word boundary.
\B Matches a nonword boundary.
\033 octal char
\x1B hex char

[size=+1](e) Examples:Regular Expression這個東東非常強大、非常重要,但是對初學者來說簡直是個惡夢,記得我當初剛接觸時也是霧煞煞的,就算現在的我也不敢說全懂了:p 但你若瞭解了它的基本技巧後,包你愛不釋手,每每為它強大的功能讚歎。上面那些表格相信你也是有看沒有懂,這種東西要借由範例入門比較快,下面我列出一些基本範例,希望能幫助你瞭解它的基本技巧。 [size=+1]/abc/ 找到含有abc的字串[size=+1]/^abc/ 找到開頭是abc的字串[size=+1]/abc$/ 找到結尾是abc的字串[size=+1]/a|b/ 找到有a或b的字串,也可以用來找整個字(word)[size=+1]/ab{2,4}c/ 找到a後面跟著2-4個b,再跟著c的字串,若只有/ab{2,}c/則會找二個以上的b[size=+1]/ab*c/ 找到a後面跟著0個或多個b,再跟著c的字串,如同/ab{0,}c/[size=+1]/ab+c/ 找到a後面跟著一個以上的b,再跟著c的字串,如同/ab{1,}c/[size=+1]/a.c/ .可以代表任何字元,除了new line字元(\n)外。[size=+1]/[abc]/ 找到含有這三個字元中任何一個的字串[size=+1]/\d/ 找到含有數字的字串,如同/[0-9]/[size=+1]/\w/ 找到含有字母的字串,如同/[a-zA-Z0-9_]/[size=+1]/\s/ 找到含有white space的字串,如同/[ \t\r\n\f]/[size=+1]/[^abc]/ 找到沒有abc任一字元的字串[size=+1]/\*/ 找到含有字元*的字串,在反斜線"\"後面的字元Perl會把它當作普通字元看待。若你不確定這個符號是否為特殊字元,乾脆全加上\以策安全。[size=+1]/abc/i 忽略abc的大小寫[size=+1]/(\d+)\.(\d+)\.(\d+)\.(\d+)/ 找到類似IP的字串,並將IP的四個數字分別存在$1,$2,$3,$4四個特殊變數中,以便在其後加以利用。例: [size=+1]if ($x =~ /(\d+\.\d+)\.\d+\.\d+/) {
print "海洋大學" if ($1 eq "140.121");
}

[size=+1]m//gimosx m命令可以讓你自訂pattern的分隔符號,而gimosx則是它的修飾選項,請參看(a)Modifiers。例如: [size=+1]$url="my.machine.tw:8080/cgi-bin/test.pl";
($host, $port, $file)=($url=~m|http://([^/:]+):{0,1}(\d*)(\S*)$|);
這個Regular Expression相當複雜,主要目的是分析指定的URL,然後取得host名稱、port號碼及對應的檔案。我一項項慢慢解釋: [size=+1]$url=~m||
m後面跟著的就是分隔符號,| |裡面的就是pattern。 [size=+1]([^/:]+)
match一個字串,裡面沒有/和:字元。找到的字串存在$1中。 [size=+1]:{0,1}(\d*)
match 0或1個:,後面跟著一串數字或nothing。找到的字串存在$2中,若找不到,$2就是空的。 [size=+1](\S*)$
match一串非空白字元,並以找到的字串為結尾。找到的字串存在$3中。 [size=+1]()=()
($host, $port, $file)=($1, $2, $3)
即$host="my.machine.tw"
$port=8080
$file="/cgi-bin/test.pl"
[size=+1]s/PATTERN/REPLACEMENT/egimox 沒錯,這就是取代的命令。它會尋找符合PATTERN的字串,並取代成REPLACEMENT字串。它的修飾選項多了e選項,其他的和上面都一樣,我將它列表如下:
e Evaluate the right side as an expression.
g Replace globally, i.e. all occurrences.
i Do case-insensitive pattern matching.
m Treat string as multiple lines.
o Only compile pattern once.
s Treat string as single line.
x Use extended regular expressions.
例:
[size=+1]$x =~ s/\s+//g
把所有的white space全部去除掉 [size=+1]$x =~ s/([^ ]*):*([^ ]*)/$21/
把用":"分開的兩個欄位互相對調 [size=+1]$path =~ s|/usr/bin|/usr/local/bin|
它也可以讓你自訂分隔符號哦
[size=+1]tr/SEARCHLIST/REPLACEMENTLIST/cds 這是也是取代的命令,和上一個不同的是SEARCHLIST和REPLACEMENTLIST只能是普通字串,而不是Regular Expression,所以速度比較快。它的修飾選項也比較少:
c Complement the SEARCHLIST.
d Delete found but unreplaced characters.
s Squash duplicate replaced characters.
例:
[size=+1]$x =~ tr/this/that/
把"this"替換成"that" [size=+1]$x =~ tr/a-z/A-Z/
把小寫字母全部替換成大寫字母 [size=+1]$count = $x =~ tr/*/*/
計算$x中有幾個"*"

TOP

發新話題

本站所有圖文均屬網友發表,僅代表作者的觀點與本站無關,如有侵權請通知版主會盡快刪除。