[thelist] importing a non-standard datadump
Jason Sweeney
jason at designshift.com
Wed Feb 23 10:38:53 CST 2005
This looks like a job for a regular expression. (Start theme music.)
From the small sample that you have listed here, it looks like the
beginning of each record begins with a single quote, an integer, a
single quote, a pipe, and a single quote before other stuff appears. The
syntax will be a little different, depending on your search-n-replace
tool, but the regex should look roughly like this:
^'[0-9]+'\|'
(or ^'[0-9][0-9]*'|' in vi, where I tested it.) :-)
I'd review your file to be certain all records comform to this format,
but that's what it looks like to me. So, to eliminate all but the
newlines between records, search for these record beginnings and add a
unique dummy string that you will then replace at the end with newlines,
changing
'1'|'reccordname'|'recordnotes'
with
iamauniquestring'1'|'reccordname'|'recordnotes'
Then, do a search and replace eliminating all newlines in your file.
Lastly, just do one last search and replace, replacing
"Iamauniquestring" with \n.
That "should" take care of it.
jason sweeney
thelist-request at lists.evolt.org wrote:
> Date: Tue, 22 Feb 2005 12:24:06 -0500
> From: Brian Cummiskey <Brian at hondaswap.com>
> To: "thelist at lists.evolt.org" <thelist at lists.evolt.org>
> Subject: [thelist] importing a non-standard datadump
> Message-ID: <421B6AB6.6080104 at hondaswap.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> MIME-Version: 1.0
> Content-Transfer-Encoding: 7bit
> Precedence: list
> Reply-To: "thelist at lists.evolt.org" <thelist at lists.evolt.org>
> Message: 9
>
> Hi all-
>
> I received a data-dump from a client, and it is in an impossible format.
>
> The biggest issue is that the last field does not fit all on one line
> (CR\LF)
>
> here's an example:
>
>
> 'CUSTNMBR'|'CUSTNAME'|'NOTES'
> '1'|'blah1' '|'fasdf fsdfd sdf1 '
> '2'|'blah2' '|'fasdf fsdfd sdf2 '
>
>
> that is fine and easy to import. However, there are some rows where the
> notes have their own crlf on them like so:
>
> 'CUSTNMBR'|'CUSTNAME'|'NOTES'
> '1'|'blah1' '|'fasdf fsdfd sdf1'
> '2'|'blah2' '|'fasdf fsdfd sdf2'
> '3'|'blah3' '|'fasdf fsdfd sdf dfdffsd
> fsdfds
> fsdfsd
> fdsf
> sfs3'
> '4'|'blah4' '|'fasdf fsdfd sdf4'
>
>
> There's some 24,000 records here, so going by hand one at a time and
> backspacing them out is a last resort.
>
> If i search\replace on the CRLF, i lose every row, not just the notes
> row multi-line.
>
> How do I import this?
>
> I have access to MS SQL 2000 and MySQL/phpmyadmin. The end result would
> be on a table in MS Sql. If php can do something that asp/mssql cannot,
> i can use the import export of the php system to move it to the M$
> system afterwards.
>
> I appreciate any help
More information about the thelist
mailing list