[thelist] sleep, etc to make a script not take resources
Bob Meetin
bobm at dottedi.biz
Wed Oct 14 10:41:23 CDT 2009
James Hardy wrote:
> 2009/10/13 Hassan Schroeder <hassan.schroeder at gmail.com>:
>
>> door #3: Since the regular expression comparison is by far the most
>> computationally intensive part of this, parse the CSV file locally and
>> generate a script of insert statements; upload that, run on DB server.
>>
>> In a loop with sleep()s included, if it makes you feel better. :-)
>>
>>
>
> In my experience of MySQL, using a LOAD DATA IN FILE statement[1] to
> insert structured (eg CSV) data uses almost no resources and takes
> very little time compared with generated MySQL scripts with thousands
> of insert statements, which tend to eat up resources and take an age.
>
> In this case, I would use a simple grep (or awk if more manipulation
> is required) command to generate a CSV with only the matching lines
> locally and then simply import this data.
>
> [1] http://dev.mysql.com/doc/refman/5.1/en/load-data.htm
On the one side I have a living database and on the other side I expect
to see numerous inconsistently formatted data files (the 30,000 line
.csv file) being an example which will need to be compared for matching
lines using a variety of pattern-matching techniques. Whether done in
MySQL or via scripting (#!/bin/sh, PHP, awk, grep, sed, etc) it's going
to take a lot of manipulation to get to perhaps 95% accuracy. To get
much beyond that would likely take hiring a data entry clerk to go
through the text files and clean up rubble.
I haven't use load often but I'm pretty familiar with mysqldump (mysql
to restore) data as well. They really do take minimal resources. Last
night I got my home office linux pc set up with mysql and set up the db
locally so that I can experiment without big brother looking over my
shoulder.
Thanks for all the tips thus far.
-Bob
More information about the thelist
mailing list