[thelist] importing a non-standard datadump

Ivo P ipletikosic at gmail.com
Tue Feb 22 18:14:14 CST 2005


this is one weird data file. what is the delimiter? I assume it's the
'|' except for the last field where it is a newline followed by a
single quote. Given this it's safe to assume its three fields per row.

Also, the single quotes are part of the info & not delimiters correct?
I ask because I first thought they were single field delimiters until
I noticed that they are only paired up in the first row. all the other
rows have an odd number of single quotes.

Also, the snippet came thru on my email client as
CarriageReturn-LineFeed. Do you need this? I assume only newline is
needed. normally bad to modify data this way but it was quicker for
me...:-)

Here is a quick & dirty script that removed newlines from wihin
fields. It assumes all of the above & works on the 5 line snippet you
sent.

i whipped it up quick so it needs work before production use (error
check,  test on more data than 5 lines...)

<?php
$test_file = "Text-1.txt";
$out_string = "";

$string_contents = file_get_contents($test_file);
$string_size = strlen($string_contents);
for($i = 0; $i < $string_size; $i++)
{
    $current_character = $string_contents[$i];
    $peek_next_character = $string_contents[$i + 1];

    if($current_character == "\r")
    {
        continue;
    }

    if($current_character == "\n")
    {
        if($peek_next_character !== "'")
        {
            $current_character = ' ';
        }
    }
    $out_string .= $current_character;
}
$handle = fopen($test_file . ".out", 'w');
fwrite ( $handle, $out_string );
fclose($handle);

?>


On Tue, 22 Feb 2005 12:24:06 -0500, Brian Cummiskey <Brian at hondaswap.com> wrote:
> Hi all-
> 
> I received a data-dump from a client, and it is in an impossible format.
> 
> The biggest issue is that the last field does not fit all on one line
> (CR\LF)
> 
> here's an example:
> 
> 'CUSTNMBR'|'CUSTNAME'|'NOTES'
> '1'|'blah1' '|'fasdf fsdfd sdf1 '
> '2'|'blah2' '|'fasdf fsdfd sdf2 '
> 
> that is fine and easy to import.  However, there are some rows where the
> notes have their own crlf on them like so:
> 
> 'CUSTNMBR'|'CUSTNAME'|'NOTES'
> '1'|'blah1' '|'fasdf fsdfd sdf1'
> '2'|'blah2' '|'fasdf fsdfd sdf2'
> '3'|'blah3' '|'fasdf fsdfd sdf dfdffsd
> fsdfds
> fsdfsd
> fdsf
> sfs3'
> '4'|'blah4' '|'fasdf fsdfd sdf4'
> 
> There's some 24,000 records here, so going by hand one at a time and
> backspacing them out is a last resort.
> 
> If i search\replace on the CRLF, i lose every row, not just the notes
> row multi-line.
> 
> How do I import this?
> 
> I have access to MS SQL 2000 and MySQL/phpmyadmin.  The end result would
> be on a table in MS Sql.  If php can do something that asp/mssql cannot,
> i can use the import export of the php system to move it to the M$
> system afterwards.
> 
> I appreciate any help
> 
> --
> 
> * * Please support the community that supports you.  * *
> http://evolt.org/help_support_evolt/
> 
> For unsubscribe and other options, including the Tip Harvester
> and archives of thelist go to: http://lists.evolt.org
> Workers of the Web, evolt !
>


More information about the thelist mailing list