[thelist] php parse/reg ex
John Hicks
johnlist at gulfbridge.net
Tue Apr 25 14:39:14 CDT 2006
Marshall Wood wrote:
> I have a field in my database that contains a block of data. The data
> is formatted so it can be parsed.
>
> Here is an example of data I'll give the names of the values instead
> of the values themselves, there is usually 30 of them in one field.
> Name:::::Value~|||||, the ::::: seperates the Name from the Value, the
> ~ seperates the Value from the Value, if there is more then one Value
> per Name, and ||||| seperates the Name from the Next name, but not in
> all cases. This is an example of one that has no value.
> Name:::::~|||||. Occaasionally they get a bit out of wack, down
> further in the field, here is an example of one that is out of wack,
> but I still need the values that it holds.
> Name:::::|||||||||||||||Value~|||||Value~|||||Value~|||||~|||||.
>
> I have been trying explode() but that does no good, I think its fine
> for the first "Name", the the Value ends up being the rest of the
> data. :( Any and all help would be appreciated, I am thinking RegEx
> might be the best bet but a help push start would be awesome.
Interesting puzzle.
This could be a hairy regular expression, so it would help to have a
little more information:
--You didn't mention line endings? Do they occur and do they mean
anything? Are we to assume there can be more than one Name on a line?
Generally regular expressions deal with one line at a time. One
limitation is that they can only deal with 100 pattern matches at a
time. (Although the preg_match_all() function might not be subject to
that limitation. I'm not sure.)
--Do Name and Value follow any other rules we can parse by? Perhaps Name
is alphanumeric? Or maybe Value is numeric? Anything like that would
make things a lot simpler. If you can specify the character set used for
for Name and Value, it would be much easier than using only ':::::' and
'~' and '|||||' as delimiters.
--Can you give us some sample data to play with? :)
Generally, I would build a regular expression one step at a time,
testing it as I go on real data.
Barring that, I'll take a stab at it.
Let's start with parsing for 'Name'. It looks like we can identify a
Name as a sequence of characters preceded by '|||||' and followed by
':::::'. Since both '|' and ':' have special meaning in regular
expressions, we have to escape them.
(?<=\|\|\|\|\|)Name(?=\:\:\:\:\:)
or (?<=\|\|\|\|\|)(.+)(?=\:\:\:\:\:)
and to find all occurrences of Name, you'd make that into a repeating
pattern:
or (:(?<=\|\|\|\|\|)(.+)(?=\:\:\:\:\:))+
or ((?<=\|\|\|\|\|).+(?=\:\:\:\:\:))+
But I believe the final '+' is redundant on the preg_match_all() function.
So we plug this into php:
$MyRegEx = '((?<=\|\|\|\|\|).+(?=\:\:\:\:\:))';
preg_match_all ( $MyData, $MyRegEx, $MyResultArray );
var_dump($MyResultArray);
See if you can get that to run [pray] and, if so, if it correctly picks
out all the Names and nothing else.
Then check back here... We'll be waiting.
--John
More information about the thelist
mailing list