[thelist] sleep, etc to make a script not take resources

Bob Meetin bobm at dottedi.biz
Wed Oct 14 10:41:23 CDT 2009


James Hardy wrote:
> 2009/10/13 Hassan Schroeder <hassan.schroeder at gmail.com>:
>   
>> door #3: Since the regular expression comparison is by far the most
>> computationally intensive part of this, parse the CSV file locally and
>> generate a script of insert statements; upload that, run on DB server.
>>
>> In a loop with sleep()s included, if it makes you feel better. :-)
>>
>>     
>
> In my experience of MySQL, using a LOAD DATA IN FILE statement[1] to
> insert structured (eg CSV) data uses almost no resources and takes
> very little time compared with generated MySQL scripts with thousands
> of insert statements, which tend to eat up resources and take an age.
>
> In this case, I would use a simple grep (or awk if more manipulation
> is required) command to generate a CSV with only the matching lines
> locally and then simply import this data.
>
> [1] http://dev.mysql.com/doc/refman/5.1/en/load-data.htm
On the one side I have a living database and on the other side I expect 
to see numerous inconsistently formatted data files (the 30,000 line 
.csv file) being an example which will need to be compared for matching 
lines using a variety of pattern-matching techniques.  Whether done in 
MySQL or via scripting (#!/bin/sh, PHP, awk, grep, sed, etc) it's going 
to take a lot of manipulation to get to perhaps 95% accuracy.  To get 
much beyond that would likely take hiring a data entry clerk to go 
through the text files and clean up rubble.

I haven't use load often but I'm pretty familiar with mysqldump (mysql 
to restore) data as well.  They really do take minimal resources.  Last 
night I got my home office linux pc set up with mysql and set up the db 
locally so that I can experiment without big brother looking over my 
shoulder.

Thanks for all the tips thus far. 

-Bob



More information about the thelist mailing list