[thelist] Re: Regex remove repeats

Chris Nicholls chris at axe.dircon.co.uk
Wed Jan 26 08:59:50 CST 2005


 >>I'm stuck on writing a regex that is going to lok ahead to see if the
 >>next item in the comma separated list is the same as the last and, if 
 >>it is, to remove it

This isn't a regex, but one solution, simpler and more scalable, 
depending on your scripting language, would be:

1. Split string into list
2. Create an empty "holding" array. Also create an empty hash/struct
3. Loop through list see if value exists as a key in your new struct.
    If not, add value to holding array, and set flag in struct saying 
value has been "seen"
4. Write holding array, which now contains only unique values, back out 
to new list.

In Perl:

my @new_list;
  # will hold unique values
my %hash;
  # will hold flags marking "seen" values

my $input='ac/dc,ac/dc,ac/dc,ac/dc,david , david bowie,dixie hicks,dixie';
  # your input list

for my $key (split /\s*,\s*/ , $input){
    #split on comma and optional spaces before or after comma
    if (!$hash{$key}){
       push @new_list,$key;
         # add not-yet-encountered values to new list array
       $hash{$key}=1;
         # set flag in hash to say we've now seen this value
    }
}

my $unique_list=join ',', @new_list;
  # turn array into new list


-Chris




More information about the thelist mailing list