[thelist] Regex Riddle

Frank Marion lists at frankmarion.com
Sun Aug 29 00:05:14 CDT 2010


On 2010-08-29, at 12:12 AM, Bill Moseley wrote:
>> Essentially, what I want to do is to replace ampersands ( & and  
>> &) and
>> equal signs (=) with a forward slash (/). So essentially, I'm going  
>> from
>> index.cfm?foo=bar&poo=bear to index.cfm?foo/bar/poo/bear
>
> I guess I'd take a different approach.  I'd use code that knows  
> about URLs
> and then pull out the parts.   Not 100% clear what you are after,  
> though --
> what is a "search engine safe" url?

Search engine safe url: remember the time when search engines would  
choke on queries? People figured out to make the query look like a  
folder path. Now I just want to do it to make the urls short,  
memorable, and easy to type.

So instead of

http://www.example.com/index.cfm?foo=bar&poo=bear I can reduce it to
http://www.example.com/foo/bar/poo/bear

The way that I'm approaching it, because users can add their own  
content, they might add an internal link, that I'm filtering the final  
generated HTML and doing a search and replace on the whole thing.

> And likewise you could pull out the query keys and values and join  
> them with
> a slash.

Oh! That might be a good lead to follow up on.

> 07 <a href="index.cfm?foo=bar&regex=fun">
>> 08 <a href="/index.cfm?foo=bar&amp;regex=fun">
>>
>
> 07 is incorrect, of course.  Not that it's not common practice to  
> forget to
> escape in hrefs.  Depending on the tools you use to extract the href  
> from
> the markup it may or may not be un-escaped already.  But if not, you  
> should
> do that first.


The content that I'm parsing comes from a content editor, and I don't  
really have control over some of the things that the users may enter,  
thus the handling of cases like 07.

--
Frank Marion
lists [_at_] frankmarion.com








More information about the thelist mailing list