[thelist] importing a non-standard datadump

Jason Sweeney jason at designshift.com
Wed Feb 23 10:38:53 CST 2005


This looks like a job for a regular expression. (Start theme music.)

 From the small sample that you have listed here, it looks like the 
beginning of each record begins with a single quote, an integer, a 
single quote, a pipe, and a single quote before other stuff appears. The 
syntax will be a little different, depending on your search-n-replace 
tool, but the regex should look roughly like this:

^'[0-9]+'\|'

(or ^'[0-9][0-9]*'|' in vi, where I tested it.) :-)

I'd review your file to be certain all records comform to this format, 
but that's what it looks like to me. So, to eliminate all but the 
newlines between records, search for these record beginnings and add a 
unique dummy string that you will then replace at the end with newlines, 
changing

    '1'|'reccordname'|'recordnotes'

with

    iamauniquestring'1'|'reccordname'|'recordnotes'

Then, do a search and replace eliminating all newlines in your file.

Lastly, just do one last search and replace, replacing 
"Iamauniquestring" with \n.

That "should" take care of it.

jason sweeney


thelist-request at lists.evolt.org wrote:
> Date: Tue, 22 Feb 2005 12:24:06 -0500
> From: Brian Cummiskey <Brian at hondaswap.com>
> To: "thelist at lists.evolt.org" <thelist at lists.evolt.org>
> Subject: [thelist] importing a non-standard datadump
> Message-ID: <421B6AB6.6080104 at hondaswap.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> MIME-Version: 1.0
> Content-Transfer-Encoding: 7bit
> Precedence: list
> Reply-To: "thelist at lists.evolt.org" <thelist at lists.evolt.org>
> Message: 9
> 
> Hi all-
> 
> I received a data-dump from a client, and it is in an impossible format.
> 
> The biggest issue is that the last field does not fit all on one line 
> (CR\LF)
> 
> here's an example:
> 
> 
> 'CUSTNMBR'|'CUSTNAME'|'NOTES'
> '1'|'blah1' '|'fasdf fsdfd sdf1 '
> '2'|'blah2' '|'fasdf fsdfd sdf2 '
> 
> 
> that is fine and easy to import.  However, there are some rows where the 
> notes have their own crlf on them like so:
> 
> 'CUSTNMBR'|'CUSTNAME'|'NOTES'
> '1'|'blah1' '|'fasdf fsdfd sdf1'
> '2'|'blah2' '|'fasdf fsdfd sdf2'
> '3'|'blah3' '|'fasdf fsdfd sdf dfdffsd
> fsdfds
> fsdfsd
> fdsf
> sfs3'
> '4'|'blah4' '|'fasdf fsdfd sdf4'
> 
> 
> There's some 24,000 records here, so going by hand one at a time and 
> backspacing them out is a last resort.
> 
> If i search\replace on the CRLF, i lose every row, not just the notes 
> row multi-line.
> 
> How do I import this?
> 
> I have access to MS SQL 2000 and MySQL/phpmyadmin.  The end result would 
> be on a table in MS Sql.  If php can do something that asp/mssql cannot, 
> i can use the import export of the php system to move it to the M$ 
> system afterwards.
> 
> I appreciate any help


More information about the thelist mailing list