[thelist] Regex

Burhan Khalid burhankhalid at members.evolt.org
Thu Feb 28 16:23:01 CST 2002


Okay, got another Perl question (probably what I should have asked the
first time around). Here is what I'm trying to do :

1. Parse an HTML file
2. Print out the tags (and nothing else, hence the regex question from earlier)
3. Tabbing them in the correct order (nesting)

What I've got so far (not a Perl user, so please, be kind) :

$file = <STDIN>;

open (INPUTFILE, $file) || die "Cannot open $file!";

while ($data = <INPUTFILE>){
chomp($data);
$line1 = "(<[^>]*>)";
$level = 0;
if (($data =~ /$line1/)) {
         print "$data";
}
if (!($data =~ /$line1/)) {
         print "NO MATCH";
         $level = $level + 1;
}

What I need to come out (more or less) :

<HTML>
         <HEAD>
                 <TITLE>
                 </TITLE>

and so on and so forth.

         </HEAD>
...
..
.
</HTML>

I think I could deal with the tabbing, but apparently, Perl isn't doing
what its supposed to. Any help? If this were PHP, it would have been a snap.

:(

Burhan Khalid

At 04:11 PM 2/28/02 -0600, you wrote:
>Hey Burhan -
>
>This is from a sed script i hve, but should work for you..
>
><[^>]*>
><[^<]*>
>
>'[^>]' specifies a "non-'>' character and the '*' after it completes
>this expression to mean zero or more non-'>' characters
>
>same for <




More information about the thelist mailing list