[thelist] php parse/reg ex

Dan McCullough dan.mccullough at gmail.com
Tue Apr 25 15:00:32 CDT 2006


Something like this?

$scriptString = $script;

// We have an array to store the results
$scriptArray = array();

// Rows are split by three or more dashes
$rows = preg_split('/[\n\r]*[\-]{3,}[\n\r]*/', trim($scriptString));

// Lets get the keys and values from each row
foreach($rows as $row) {
	// Split key from values
	list($key, $value) = explode(':::::', $row);
	$key   = trim($key);
	$value = trim($value);
	// Split the values from the keys
	$value = preg_split('/(~\|*|~?\|+)+/', $value);
	
	// Filter out the empty values that might be the result of the last preg_split
	foreach($value as $k => $v) {
		if(strlen($v) == 0) unset($value[$k]);
	}
	
	// Store the key/value pair
	$scriptArray[$key] = $value;
}
echo '<pre>';
var_export($scriptArray);
echo '</pre>;

On 4/25/06, Marshall Wood <donkieonthehead at gmail.com> wrote:
> Yes I totally forgot the new line/line ending.  That is ------.
>
> So that would be
>
> Name13:::::|||||||||||||||-Value1~|||||Value2~|||||Value4~|||||~|||||
> -----
> Name14:::::||||||||||Value1~|||||
>
> Name1:::::Value1~|||||
> ------
> Name23:::::Value1~Value2~Value1~|||||
> ------
> Name10:::::~|||||
>
> Hows that?
>
> On 4/25/06, John Hicks <johnlist at gulfbridge.net> wrote:
> > Marshall Wood wrote:
> > > I have a field in my database that contains a block of data.  The data
> > > is formatted so it can be parsed.
> > >
> > > Here is an example of data I'll give the names of the values instead
> > > of the values themselves, there is usually 30 of them in one field.
> > > Name:::::Value~|||||, the ::::: seperates the Name from the Value, the
> > > ~ seperates the Value from the Value, if there is more then one Value
> > > per Name, and ||||| seperates the Name from the Next name, but not in
> > > all cases.  This is an example of one that has no value.
> > > Name:::::~|||||.  Occaasionally they get a bit out of wack, down
> > > further in the field, here is an example of one that is out of wack,
> > > but I still need the values that it holds.
> > > Name:::::|||||||||||||||Value~|||||Value~|||||Value~|||||~|||||.
> > >
> > > I have been trying explode() but that does no good, I think its fine
> > > for the first "Name", the the Value ends up being the rest of the
> > > data.  :(  Any and all help would be appreciated, I am thinking RegEx
> > > might be the best bet but a help push start would be awesome.
> >
> > Interesting puzzle.
> >
> > This could be a hairy regular expression, so it would help to have a
> > little more information:
> >
> > --You didn't mention line endings? Do they occur and do they mean
> > anything? Are we to assume there can be more than one Name on a line?
> > Generally regular expressions deal with one line at a time. One
> > limitation is that they can only deal with 100 pattern matches at a
> > time. (Although the preg_match_all() function might not be subject to
> > that limitation. I'm not sure.)
> >
> > --Do Name and Value follow any other rules we can parse by? Perhaps Name
> > is alphanumeric? Or maybe Value is numeric? Anything like that would
> > make things a lot simpler. If you can specify the character set used for
> > for Name and Value, it would be much easier than using only ':::::' and
> > '~' and '|||||' as delimiters.
> >
> > --Can you give us some sample data to play with? :)
> >
> > Generally, I would build a regular expression one step at a time,
> > testing it as I go on real data.
> >
> > Barring that, I'll take a stab at it.
> >
> > Let's start with parsing for 'Name'. It looks like we can identify a
> > Name as a sequence of characters preceded by '|||||' and followed by
> > ':::::'. Since both '|' and ':' have special meaning in regular
> > expressions, we have to escape them.
> >
> > (?<=\|\|\|\|\|)Name(?=\:\:\:\:\:)
> >
> > or (?<=\|\|\|\|\|)(.+)(?=\:\:\:\:\:)
> >
> > and to find all occurrences of Name, you'd make that into a repeating
> > pattern:
> >
> > or (:(?<=\|\|\|\|\|)(.+)(?=\:\:\:\:\:))+
> > or ((?<=\|\|\|\|\|).+(?=\:\:\:\:\:))+
> >
> > But I believe the final '+' is redundant on the preg_match_all() function.
> >
> > So we plug this into php:
> >
> > $MyRegEx = '((?<=\|\|\|\|\|).+(?=\:\:\:\:\:))';
> > preg_match_all ( $MyData, $MyRegEx, $MyResultArray );
> > var_dump($MyResultArray);
> >
> >
> > See if you can get that to run [pray] and, if so, if it correctly picks
> > out all the Names and nothing else.
> >
> > Then check back here... We'll be waiting.
> >
> > --John
> >
> >
> >
> >
> >
> >
> >
> > --
> >
> > * * Please support the community that supports you.  * *
> > http://evolt.org/help_support_evolt/
> >
> > For unsubscribe and other options, including the Tip Harvester
> > and archives of thelist go to: http://lists.evolt.org
> > Workers of the Web, evolt !
> >
> --
>
> * * Please support the community that supports you.  * *
> http://evolt.org/help_support_evolt/
>
> For unsubscribe and other options, including the Tip Harvester
> and archives of thelist go to: http://lists.evolt.org
> Workers of the Web, evolt !
>



More information about the thelist mailing list