Friday, August 21, 2009

Perl regular expressions(RE) - Part1

Perl is used by many people just because of Perl. It gives Perl very powerful capability of matching and substitution operations on text.
Regular expressions uses special characters to match text, which makes it very powerful for Text processing.

The Matching Operator : m/PATTERN/cgismxo
Substitution Operator : s/PATTERN/REPLACEMENT/egismxo
transliteration Operator : tr/SEARCHLIST/REPLACEMENTLIST/cds

all above operators generally works on $_
Following modifers are related to interpretation of PATTERN. Modifiers that alter the way a regular expression is used by Perl are detailed in PerlRE quote like operators
modifiers for matching:
/i : Ignore alphabetical case(case insens
/m : let ^ and $ match next to embedded \n.
/s : Let . match newline and ignore deprecated $*.
/x : Ignores white space and permit comments in patterns.
/o : to substitute variable only once and it will compile pattern only once .
/g : Globally find all matches, within a string all matches are searched for matching operator.
/gc : Global matching and keep current position after failed match, because failed match or change in target string resets the position, so to avoid, use this modifier.For further explanation refer to using regular expression in perlre tutorial

In substitution operation //c is not used instead //e.
/e : Evaluates right side as expression.

Transliteration Modifier:
/c : Complement SEARCHLIST.(character set in SEARCHLIST complemented so effective list will contain charcters not present in SEARCHLIST)
/d : Delete found but unreplaced characters(Any character specified in SEARCHLIST but not given a replacement in REPLACEMENTLIST are deleted).
/s : Squash duplicate replaced characters( sequences of characters converted to the same character are squashed down to a single instance of the character)
$word =~ tr/a-zA-Z//s;       # bookkeeper -> bokeper
Use of transliteration:
Converting from uppercase to lower case.
$tag = "This is My BLOG";
$tag_c = $tag;
$tag_c =~ tr/A-Z/a-z/; #prints "this is my blog"

$count = ($tag =~ tr/A-Z//); # counts the capital characters in $tag.
$count = ($tag =~ tr/A-Za-z//); # counts all the characters in $tag.
$count = ($tag =~ tr/aeiouAEIOU//); # counts all the vowels in $tag.
=~ and =! are called binding operator
=~ means True if pattern matches and
=! means True if pattern doesn't match.

These are used when you want to use m// or s/// or tr/// to operate on other than $_, so the string to be serached is bound with binding operator.
Some useful links for regular expressions:
RegExpression Full Tutorial in Perldoc
Regular expressions quick tutorial in Perldoc
Regular Expressions (a tutorial)

