[thelist] JS RegEx Problems

jeff jeff at members.evolt.org
Tue May 1 12:51:12 CDT 2001


greetings,

i'm trying to use regular expressions to parse the contents of a string,
identifying the values of various attributes and then storing those
attributes in an array.  i seem to have run into a small problem.
apparently when using the match() method it causes all the subsequent calls
to the match() method to fail if one of the matches returns a match with
more than 42 characters.  i'm hoping someone else here can confirm or deny
this behavior or perhaps tell me what i'm doing wrong.  here's the code i'm
using:


vIn = new Array();
htmlText  = '<a'
         += ' href="http://www.evolt.org/article/about_us/9741/9955/"'
         += ' title="About Us"'
         += '>About Us</a>';

var aHREF = htmlText.match(/<a.*href=['"]*([^"' ]+)['"]*/i);
var aTarget = htmlText.match(/<a.*target=['"]*([^"' ]+)['"]*/i);
var aTitle = htmlText.match(/<a.*title=['"]{1}([^"']+)['"]{1}/i);
if(aTitle == null)
{
  aTitle = htmlText.match(/<a.*title=([^>"]+)/i);
}

vIn['href'] = aHREF[1];
vIn['protocol'] = aHREF[1].match(/([^:]+:)/i)[1];
vIn['target'] = (aTarget != null ? aTarget[1] : '');
vIn['title'] = (aTitle != null ? aTitle[1] : '');

theoretically i should have the following values, based on the string above:

vIn['href'] = 'http://www.evolt.org/article/about_us/9741/9955/';
vIn['protocol'] = 'http://';
vIn['target'] = '';
vIn['title'] = 'About Us';


i really don't want to have to attack this string with a brute force string
parsing method, but i will if i have to.

in addition, i'd like to be able to grab the string between the <a></a> tags
as well.  the difficult part is that it might be just text or it could
contain <img> tag(s).  any thoughts on how to extract that would be
appreciated as well.

fwiw, this code does not have to be cross-browser -- it's being used in a
controlled environment that's accessed only by win/ie5.0.

thanks,

.jeff

name://jeff.howden
game://web.development
http://www.evolt.org/
mailto:jeff at members.evolt.org





More information about the thelist mailing list