[Javascript] RegExp for parsing search strings

Paul Novitski paul at juniperwebcraft.com
Fri Apr 28 16:22:57 CDT 2006


>Paul Novitski wrote:
>>Given a typical search string potentially consisting of any number 
>>of single terms and quoted phrases, such as these examples:
>>design a regular expression that splits it into its components, e.g.:
>>source:
>>     item1 "phrase two" item3 item4 "phrase five 5" "phrase six" item7
>>result:
>>     item1
>>     phrase two
>>     item3
>>     item4
>>     phrase five 5
>>     phrase six
>>     item7

At 01:54 PM 4/28/2006, Triche Osborne wrote:
>Okay, I'm going to ask a question which I'm sure has a reasonable 
>answer, but I can't help being curious: Why hasn't the source been 
>impregnated with a delimiter other than a space? This would make it 
>a simple matter of exploding (PHP) or splitting (JS) the string, 
>which avoids the drag that regex imposes on optimization in PHP.


Hey Triche,

I'm happy to report that I haven't the foggiest idea what you're 
talking about.  My problem isn't that I've never encountered the term 
'impregnated' in a programming context before, which I haven't, but 
rather that I'm curious to know how you would prep a string like this 
to be more easily parsable.

The source string might be straight from a get or post.

I can imagine how to split the string on space, locate quotes, and 
reassemble phrases, but this would take several lines of logic and 
I'm hoping to do it in one regexp statement.

More to the point, I wasn't able to compose a regular expression that 
separated single words from quoted phrases and I'd like to learn how 
as part of my overall regexp education.

Perhaps it can't be done in one statement and would require at least 
two (one for the quoted phrases and one for the rest), but given the 
power of regular expressions I'm still hopeful.

My first attempt was something like this:

         /((\"[^\"]+\")|( [^ ]+ ))+/

one or more
         (quotes surrounding one or more non-quote characters)
or
         (spaces surrounding one or more non-space characters)

...hoping that the default greedy nature of regexp would give me the 
quoted phrases first, but it failed miserably.  I can't figure out 
how to grab the quoted expressions and the unquoted words without 
getting them mixed together.

Cheers,
Paul 




More information about the Javascript mailing list