On 9/4/07, Judah McAuley <judah at wiredotter.com> wrote: > > I would consider connecting directly to the main DB yourself every hour > > and loop through an SQL result set, instead of the CSV. > > For a variety of fascinating reasons, this isn't really an option. It is > a great suggestion, however. Even if it were technically possible for you to do this, I would not suggest it. It is much easier to handle exceptions (network outages, failed records, etc.) if you bulk export, transfer file, and bulk import. > > I am no expert over here, but I would suggest that the only alternative > > is of course to compare the records, that is the CSV record versus your > > DB record, to see if an update is necessary or not. > > Agreed. My question then, is what is the most efficient way to do this? > I could load the csv file into a temp table and then find all the rows > that need inserts pretty easily via a join. > > What of the other rows though? What is the most efficient way to see if > two rows are identical for 12 columns? What you are discussing is actually a pretty common scenario. Here's what I would suggest, which (not that this makes it legitimate) is how this is usually handled: 1) bulk export all records where there is the potential for necessary updates/inserts 2) transfer the export file by whatever means appropriate (secure FTP, middleware, other) 3) bulk import all records from the file into a staging table (this is probably *not* a temp table) 4) use a sproc or load tool to update changed records (update strategy) 5) use a sproc or load tool to add new records (insert strategy) Note that there is a slight performance benefit by doing #4 before #5, so might as well do it in that order. Depending on the scale of what you're doing, you may consider turning off indexes during the load. This will also force you to turn off referential integrity during the load, which means you will need to handle these checks yourself or use a feature of your load tool. After the load is complete, you rebuild the indexes. This will cause the load process to run much, much faster. You just need to be careful about the lack of RI. Hope this helps, -- Matt Warden Cincinnati, OH, USA http://mattwarden.com This email proudly and graciously contributes to entropy.