[thelist] php parse/reg ex

John Hicks johnlist at gulfbridge.net
Tue Apr 25 15:01:46 CDT 2006


Marshall Wood wrote:
> Yes I totally forgot the new line/line ending.  That is ------.
> 
> So that would be
> 
> Name13:::::|||||||||||||||-Value1~|||||Value2~|||||Value4~|||||~|||||
> -----
> Name14:::::||||||||||Value1~|||||
> 
> Name1:::::Value1~|||||
> ------
> Name23:::::Value1~Value2~Value1~|||||
> ------
> Name10:::::~|||||
> 
> Hows that?


Clear as mud, I'm afraid. :)

Are you saying that '------' occurs *instead* of line endings. In other 
words, there are no line endings in the data?

And what does the empty line between Name 14 and Name1 mean?
There's no '------' separating them.

You're the one with the data. It's your job to find a pattern in the 
data that will allow the regex to identify the Names.

Also, you said nothing in answer to my other two questions:

 >> --Do Name and Value follow any other rules we can parse by? Perhaps 
Name is alphanumeric? Or maybe Value is numeric? Anything like that would
make things a lot simpler. If you can specify the character set used for
for Name and Value, it would be much easier than using only ':::::' and
'~' and '|||||' as delimiters.

 >> --Can you give us some sample data to play with? :)

--John


> On 4/25/06, John Hicks <johnlist at gulfbridge.net> wrote:
>> Marshall Wood wrote:
>>> I have a field in my database that contains a block of data.  The data
>>> is formatted so it can be parsed.
>>>
>>> Here is an example of data I'll give the names of the values instead
>>> of the values themselves, there is usually 30 of them in one field.
>>> Name:::::Value~|||||, the ::::: seperates the Name from the Value, the
>>> ~ seperates the Value from the Value, if there is more then one Value
>>> per Name, and ||||| seperates the Name from the Next name, but not in
>>> all cases.  This is an example of one that has no value.
>>> Name:::::~|||||.  Occaasionally they get a bit out of wack, down
>>> further in the field, here is an example of one that is out of wack,
>>> but I still need the values that it holds.
>>> Name:::::|||||||||||||||Value~|||||Value~|||||Value~|||||~|||||.
>>>
>>> I have been trying explode() but that does no good, I think its fine
>>> for the first "Name", the the Value ends up being the rest of the
>>> data.  :(  Any and all help would be appreciated, I am thinking RegEx
>>> might be the best bet but a help push start would be awesome.
>> Interesting puzzle.
>>
>> This could be a hairy regular expression, so it would help to have a
>> little more information:
>>
>> --You didn't mention line endings? Do they occur and do they mean
>> anything? Are we to assume there can be more than one Name on a line?
>> Generally regular expressions deal with one line at a time. One
>> limitation is that they can only deal with 100 pattern matches at a
>> time. (Although the preg_match_all() function might not be subject to
>> that limitation. I'm not sure.)
>>
>> --Do Name and Value follow any other rules we can parse by? Perhaps Name
>> is alphanumeric? Or maybe Value is numeric? Anything like that would
>> make things a lot simpler. If you can specify the character set used for
>> for Name and Value, it would be much easier than using only ':::::' and
>> '~' and '|||||' as delimiters.
>>
>> --Can you give us some sample data to play with? :)
>>
>> Generally, I would build a regular expression one step at a time,
>> testing it as I go on real data.
>>
>> Barring that, I'll take a stab at it.
>>
>> Let's start with parsing for 'Name'. It looks like we can identify a
>> Name as a sequence of characters preceded by '|||||' and followed by
>> ':::::'. Since both '|' and ':' have special meaning in regular
>> expressions, we have to escape them.
>>
>> (?<=\|\|\|\|\|)Name(?=\:\:\:\:\:)
>>
>> or (?<=\|\|\|\|\|)(.+)(?=\:\:\:\:\:)
>>
>> and to find all occurrences of Name, you'd make that into a repeating
>> pattern:
>>
>> or (:(?<=\|\|\|\|\|)(.+)(?=\:\:\:\:\:))+
>> or ((?<=\|\|\|\|\|).+(?=\:\:\:\:\:))+
>>
>> But I believe the final '+' is redundant on the preg_match_all() function.
>>
>> So we plug this into php:
>>
>> $MyRegEx = '((?<=\|\|\|\|\|).+(?=\:\:\:\:\:))';
>> preg_match_all ( $MyData, $MyRegEx, $MyResultArray );
>> var_dump($MyResultArray);
>>
>>
>> See if you can get that to run [pray] and, if so, if it correctly picks
>> out all the Names and nothing else.
>>
>> Then check back here... We'll be waiting.
>>
>> --John
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> * * Please support the community that supports you.  * *
>> http://evolt.org/help_support_evolt/
>>
>> For unsubscribe and other options, including the Tip Harvester
>> and archives of thelist go to: http://lists.evolt.org
>> Workers of the Web, evolt !
>>




More information about the thelist mailing list