[thelist] Reg Ex - everything except a phrase
Sam
sam at sam-i-am.com
Wed Sep 10 10:27:05 CDT 2003
> $d =~ s/^(.*)(?:fred)(.*)$/$1$2/ while $d =~ /fred/; print $d, "\n";
As long as the string $d contains fred,
in $d, match any character up to the first fred followed by whatever
follows.
Truncate the string back to and including the first fred.
then print out $d.
I've not tried this but it looks like you'll get stuck in a loop to me,
as matches $1 and $2 are the characters up to fred, and fred
respectively. And as fred ($2) is replaced back in $d =~ /fred/ is
always going to be true. Maybe you meant s/^(.*)(?:fred)(.*)$/$1$3/ ?
Translating the perlisms, you get
while($d matches in /fred/) {
replace $d with 1st and 2nd sub matches from /^(.*)(?:fred)(.*)$/
}
print "$d\n"
I still think the iterator might be wrong here. This should do it?
# start perl to remove fred
$str = "My name is fred.\nFred for short.";
while($str =~ /(.*?)fred/gi) {
# the 'g' modifier in a loop context means your
# match starts where the last one left off
$str =~ s/fred//i;
}
print $str;
# end
This problem reminds me of the problem when matching and stripping block
comments (e.g /* comment here
over 2 lines */)
For which Jeffrey Friedl recommends:
# strip C style comments
undef $/;
$_ = join('', <>);
s{
# first we'll list things we want to match, but not throw away
(
[^"'/]+ # other stuff
| # or
" (\\.|[^"\\])* " # double quoted string
| # or
' (\\.|[^'\\])* ' # single qouted string
)
| # OR...
# we'll match a comment. Since it's not in the $1 parentheses above,
# the comments will disappear when we use $1 as the replacement text
/\* .*? \*/ # Traditional C comments
| # or
//[^\n]* # C++ // style comments
}{$1}gsx;
print;
from Mastering Regular Expressions, p293 (1st edition)
This handles cases where you want to skip /* and */ sequences in quoted
strings. (I rekeyed from the book, but I tested it and it seems to work).
Sam
More information about the thelist
mailing list