[thelist] regexp question

Bob Forgey rforgey at alumni.caltech.edu
Thu Mar 29 13:09:15 CST 2001


>>>>> "Sam-I-Am" == Sam-I-Am  <sam at sam-i-am.com> writes:

    Sam-I-Am> I'm trying to match repeated characters. And capture how many times they
    Sam-I-Am> are repeated. 
    Sam-I-Am> e.g given string '--------_____' I want to know that there are 8 '-'
    Sam-I-Am> characters, followed by 5 '_' characters. 
    Sam-I-Am> I even seem to recall this being one of the first examples in on of
    Sam-I-Am> o'reilly's perl books... but can't find it now. 

    Sam-I-Am> I've been trying variations on this theme
    Sam-I-Am> /(.)+/	# which *I* read as any single character repeatd 1+ times (but
    Sam-I-Am> the regexp engine reads as any number of any characters except newline

    Sam-I-Am> /(.)+\1/ # again, I read this as any single character repeated 1+ times,
    Sam-I-Am> followed by any character except the one I first matched...

    Sam-I-Am> anyway, so none of this works as expected (surprise! isn't that the joy
    Sam-I-Am> of regexp :)
    Sam-I-Am> Anyone have a clue??

Close! Try this:

#!/usr/bin/perl -w
use strict;
my $a = "-----____\naaaaabbb ccc";

LOOP:
{
  $a =~ m/\G(.)\1+/gsc and do {
        print "Got <$&>\n";
        redo LOOP;
        };
  $a =~ m/./gsc and do {
        print "Got Single: <$&>\n";
        redo LOOP;
        };
  print "Done\n";
}

Should get you:

Got <----->
Got <____>
Got Single: <
>
Got <aaaaa>
Got <bbb>
Got Single: < >
Got <ccc>
Done


---------------

The trickery here is in the /gsc flags to the match operator and the
\G zero-width assertion. See the perlop man page (on UNIX, not sure
about windows) for more information and examples.


Bob




More information about the thelist mailing list