[thelist] regexp question
Bob Forgey
rforgey at alumni.caltech.edu
Thu Mar 29 13:09:15 CST 2001
>>>>> "Sam-I-Am" == Sam-I-Am <sam at sam-i-am.com> writes:
Sam-I-Am> I'm trying to match repeated characters. And capture how many times they
Sam-I-Am> are repeated.
Sam-I-Am> e.g given string '--------_____' I want to know that there are 8 '-'
Sam-I-Am> characters, followed by 5 '_' characters.
Sam-I-Am> I even seem to recall this being one of the first examples in on of
Sam-I-Am> o'reilly's perl books... but can't find it now.
Sam-I-Am> I've been trying variations on this theme
Sam-I-Am> /(.)+/ # which *I* read as any single character repeatd 1+ times (but
Sam-I-Am> the regexp engine reads as any number of any characters except newline
Sam-I-Am> /(.)+\1/ # again, I read this as any single character repeated 1+ times,
Sam-I-Am> followed by any character except the one I first matched...
Sam-I-Am> anyway, so none of this works as expected (surprise! isn't that the joy
Sam-I-Am> of regexp :)
Sam-I-Am> Anyone have a clue??
Close! Try this:
#!/usr/bin/perl -w
use strict;
my $a = "-----____\naaaaabbb ccc";
LOOP:
{
$a =~ m/\G(.)\1+/gsc and do {
print "Got <$&>\n";
redo LOOP;
};
$a =~ m/./gsc and do {
print "Got Single: <$&>\n";
redo LOOP;
};
print "Done\n";
}
Should get you:
Got <----->
Got <____>
Got Single: <
>
Got <aaaaa>
Got <bbb>
Got Single: < >
Got <ccc>
Done
---------------
The trickery here is in the /gsc flags to the match operator and the
\G zero-width assertion. See the perlop man page (on UNIX, not sure
about windows) for more information and examples.
Bob
More information about the thelist
mailing list