[thelist] php parse/reg ex

Marshall Wood donkieonthehead at gmail.com
Tue Apr 25 14:50:18 CDT 2006


Yes I totally forgot the new line/line ending.  That is ------.

So that would be

Name13:::::|||||||||||||||-Value1~|||||Value2~|||||Value4~|||||~|||||
-----
Name14:::::||||||||||Value1~|||||

Name1:::::Value1~|||||
------
Name23:::::Value1~Value2~Value1~|||||
------
Name10:::::~|||||

Hows that?

On 4/25/06, John Hicks <johnlist at gulfbridge.net> wrote:
> Marshall Wood wrote:
> > I have a field in my database that contains a block of data.  The data
> > is formatted so it can be parsed.
> >
> > Here is an example of data I'll give the names of the values instead
> > of the values themselves, there is usually 30 of them in one field.
> > Name:::::Value~|||||, the ::::: seperates the Name from the Value, the
> > ~ seperates the Value from the Value, if there is more then one Value
> > per Name, and ||||| seperates the Name from the Next name, but not in
> > all cases.  This is an example of one that has no value.
> > Name:::::~|||||.  Occaasionally they get a bit out of wack, down
> > further in the field, here is an example of one that is out of wack,
> > but I still need the values that it holds.
> > Name:::::|||||||||||||||Value~|||||Value~|||||Value~|||||~|||||.
> >
> > I have been trying explode() but that does no good, I think its fine
> > for the first "Name", the the Value ends up being the rest of the
> > data.  :(  Any and all help would be appreciated, I am thinking RegEx
> > might be the best bet but a help push start would be awesome.
>
> Interesting puzzle.
>
> This could be a hairy regular expression, so it would help to have a
> little more information:
>
> --You didn't mention line endings? Do they occur and do they mean
> anything? Are we to assume there can be more than one Name on a line?
> Generally regular expressions deal with one line at a time. One
> limitation is that they can only deal with 100 pattern matches at a
> time. (Although the preg_match_all() function might not be subject to
> that limitation. I'm not sure.)
>
> --Do Name and Value follow any other rules we can parse by? Perhaps Name
> is alphanumeric? Or maybe Value is numeric? Anything like that would
> make things a lot simpler. If you can specify the character set used for
> for Name and Value, it would be much easier than using only ':::::' and
> '~' and '|||||' as delimiters.
>
> --Can you give us some sample data to play with? :)
>
> Generally, I would build a regular expression one step at a time,
> testing it as I go on real data.
>
> Barring that, I'll take a stab at it.
>
> Let's start with parsing for 'Name'. It looks like we can identify a
> Name as a sequence of characters preceded by '|||||' and followed by
> ':::::'. Since both '|' and ':' have special meaning in regular
> expressions, we have to escape them.
>
> (?<=\|\|\|\|\|)Name(?=\:\:\:\:\:)
>
> or (?<=\|\|\|\|\|)(.+)(?=\:\:\:\:\:)
>
> and to find all occurrences of Name, you'd make that into a repeating
> pattern:
>
> or (:(?<=\|\|\|\|\|)(.+)(?=\:\:\:\:\:))+
> or ((?<=\|\|\|\|\|).+(?=\:\:\:\:\:))+
>
> But I believe the final '+' is redundant on the preg_match_all() function.
>
> So we plug this into php:
>
> $MyRegEx = '((?<=\|\|\|\|\|).+(?=\:\:\:\:\:))';
> preg_match_all ( $MyData, $MyRegEx, $MyResultArray );
> var_dump($MyResultArray);
>
>
> See if you can get that to run [pray] and, if so, if it correctly picks
> out all the Names and nothing else.
>
> Then check back here... We'll be waiting.
>
> --John
>
>
>
>
>
>
>
> --
>
> * * Please support the community that supports you.  * *
> http://evolt.org/help_support_evolt/
>
> For unsubscribe and other options, including the Tip Harvester
> and archives of thelist go to: http://lists.evolt.org
> Workers of the Web, evolt !
>



More information about the thelist mailing list