[thelist] Regex Riddle

Frank Marion lists at frankmarion.com
Sun Aug 29 00:05:14 CDT 2010

On 2010-08-29, at 12:12 AM, Bill Moseley wrote:
>> Essentially, what I want to do is to replace ampersands ( & and  
>> &) and
>> equal signs (=) with a forward slash (/). So essentially, I'm going  
>> from
>> index.cfm?foo=bar&poo=bear to index.cfm?foo/bar/poo/bear
> I guess I'd take a different approach.  I'd use code that knows  
> about URLs
> and then pull out the parts.   Not 100% clear what you are after,  
> though --
> what is a "search engine safe" url?

Search engine safe url: remember the time when search engines would  
choke on queries? People figured out to make the query look like a  
folder path. Now I just want to do it to make the urls short,  
memorable, and easy to type.

So instead of

http://www.example.com/index.cfm?foo=bar&poo=bear I can reduce it to

The way that I'm approaching it, because users can add their own  
content, they might add an internal link, that I'm filtering the final  
generated HTML and doing a search and replace on the whole thing.

> And likewise you could pull out the query keys and values and join  
> them with
> a slash.

Oh! That might be a good lead to follow up on.

> 07 <a href="index.cfm?foo=bar&regex=fun">
>> 08 <a href="/index.cfm?foo=bar&amp;regex=fun">
> 07 is incorrect, of course.  Not that it's not common practice to  
> forget to
> escape in hrefs.  Depending on the tools you use to extract the href  
> from
> the markup it may or may not be un-escaped already.  But if not, you  
> should
> do that first.

The content that I'm parsing comes from a content editor, and I don't  
really have control over some of the things that the users may enter,  
thus the handling of cases like 07.

Frank Marion
lists [_at_] frankmarion.com

More information about the thelist mailing list