[thelist] php parse/reg ex

Marshall Wood donkieonthehead at gmail.com
Tue Apr 25 15:29:29 CDT 2006


The space inbetween Name14 and Name1 was meant as two seperate
examples.  Name13 and Name14 are together, where as Name1, Name23 and
Name10 are together.

The Names are always alpha and Values are usually alphanum. The email
from Dan actually works for the most part, there are some odd
behavious that stem from some values being out of place like
Name33:::::||||||||||Value1~|||||, normall it would be
Name33:::::Value1~||||| but there was an error in data entry and so in
the array from the above script the name value pair starts at 1 rather
then the other that starts a 0.

Example:

'Name32' =>
  array (
    0 => 'Value1',
  ),
  'Name33' =>
  array (
    1 => 'Value1',
  ),

So if I were to create some code to parse through this an create my
xml file it might have an issue with the array not being uniform.  One
section starts with 1 while another starts with 0.


On 4/25/06, John Hicks <johnlist at gulfbridge.net> wrote:
> Marshall Wood wrote:
> > Yes I totally forgot the new line/line ending.  That is ------.
> >
> > So that would be
> >
> > Name13:::::|||||||||||||||-Value1~|||||Value2~|||||Value4~|||||~|||||
> > -----
> > Name14:::::||||||||||Value1~|||||
> >
> > Name1:::::Value1~|||||
> > ------
> > Name23:::::Value1~Value2~Value1~|||||
> > ------
> > Name10:::::~|||||
> >
> > Hows that?
>
>
> Clear as mud, I'm afraid. :)
>
> Are you saying that '------' occurs *instead* of line endings. In other
> words, there are no line endings in the data?
>
> And what does the empty line between Name 14 and Name1 mean?
> There's no '------' separating them.
>
> You're the one with the data. It's your job to find a pattern in the
> data that will allow the regex to identify the Names.
>
> Also, you said nothing in answer to my other two questions:
>
>  >> --Do Name and Value follow any other rules we can parse by? Perhaps
> Name is alphanumeric? Or maybe Value is numeric? Anything like that would
> make things a lot simpler. If you can specify the character set used for
> for Name and Value, it would be much easier than using only ':::::' and
> '~' and '|||||' as delimiters.
>
>  >> --Can you give us some sample data to play with? :)
>
> --John
>
>
> > On 4/25/06, John Hicks <johnlist at gulfbridge.net> wrote:
> >> Marshall Wood wrote:
> >>> I have a field in my database that contains a block of data.  The data
> >>> is formatted so it can be parsed.
> >>>
> >>> Here is an example of data I'll give the names of the values instead
> >>> of the values themselves, there is usually 30 of them in one field.
> >>> Name:::::Value~|||||, the ::::: seperates the Name from the Value, the
> >>> ~ seperates the Value from the Value, if there is more then one Value
> >>> per Name, and ||||| seperates the Name from the Next name, but not in
> >>> all cases.  This is an example of one that has no value.
> >>> Name:::::~|||||.  Occaasionally they get a bit out of wack, down
> >>> further in the field, here is an example of one that is out of wack,
> >>> but I still need the values that it holds.
> >>> Name:::::|||||||||||||||Value~|||||Value~|||||Value~|||||~|||||.
> >>>
> >>> I have been trying explode() but that does no good, I think its fine
> >>> for the first "Name", the the Value ends up being the rest of the
> >>> data.  :(  Any and all help would be appreciated, I am thinking RegEx
> >>> might be the best bet but a help push start would be awesome.
> >> Interesting puzzle.
> >>
> >> This could be a hairy regular expression, so it would help to have a
> >> little more information:
> >>
> >> --You didn't mention line endings? Do they occur and do they mean
> >> anything? Are we to assume there can be more than one Name on a line?
> >> Generally regular expressions deal with one line at a time. One
> >> limitation is that they can only deal with 100 pattern matches at a
> >> time. (Although the preg_match_all() function might not be subject to
> >> that limitation. I'm not sure.)
> >>
> >> --Do Name and Value follow any other rules we can parse by? Perhaps Name
> >> is alphanumeric? Or maybe Value is numeric? Anything like that would
> >> make things a lot simpler. If you can specify the character set used for
> >> for Name and Value, it would be much easier than using only ':::::' and
> >> '~' and '|||||' as delimiters.
> >>
> >> --Can you give us some sample data to play with? :)
> >>
> >> Generally, I would build a regular expression one step at a time,
> >> testing it as I go on real data.
> >>
> >> Barring that, I'll take a stab at it.
> >>
> >> Let's start with parsing for 'Name'. It looks like we can identify a
> >> Name as a sequence of characters preceded by '|||||' and followed by
> >> ':::::'. Since both '|' and ':' have special meaning in regular
> >> expressions, we have to escape them.
> >>
> >> (?<=\|\|\|\|\|)Name(?=\:\:\:\:\:)
> >>
> >> or (?<=\|\|\|\|\|)(.+)(?=\:\:\:\:\:)
> >>
> >> and to find all occurrences of Name, you'd make that into a repeating
> >> pattern:
> >>
> >> or (:(?<=\|\|\|\|\|)(.+)(?=\:\:\:\:\:))+
> >> or ((?<=\|\|\|\|\|).+(?=\:\:\:\:\:))+
> >>
> >> But I believe the final '+' is redundant on the preg_match_all() function.
> >>
> >> So we plug this into php:
> >>
> >> $MyRegEx = '((?<=\|\|\|\|\|).+(?=\:\:\:\:\:))';
> >> preg_match_all ( $MyData, $MyRegEx, $MyResultArray );
> >> var_dump($MyResultArray);
> >>
> >>
> >> See if you can get that to run [pray] and, if so, if it correctly picks
> >> out all the Names and nothing else.
> >>
> >> Then check back here... We'll be waiting.
> >>
> >> --John
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >>
> >> * * Please support the community that supports you.  * *
> >> http://evolt.org/help_support_evolt/
> >>
> >> For unsubscribe and other options, including the Tip Harvester
> >> and archives of thelist go to: http://lists.evolt.org
> >> Workers of the Web, evolt !
> >>
>
> --
>
> * * Please support the community that supports you.  * *
> http://evolt.org/help_support_evolt/
>
> For unsubscribe and other options, including the Tip Harvester
> and archives of thelist go to: http://lists.evolt.org
> Workers of the Web, evolt !
>



More information about the thelist mailing list