On a weekly base my application needs to synchronize a flat text file with about 900.000 rows (and growing) into the database. I use the FlatFileInterface module from the appstore and a a seperate "sync"-table before I import the data into the "production"-table. The flat-file-import takes about 20 minutes to process the huge flat file. Fair enough.
After that I have a seperate process that synchronize the contents of the "sync"-table into the "production"-table. Every row in the "sync"-table is checked if it already exists or needs to be created in the "production"-table or not. If it already exists the row in the "production"-table gets a check "true" so it can stay. If it's does not exist it will be created in the "production"-table with a check "true". After this process all rows with checked "false" will be deleted from "production"-table. This process of synchronizing takes about 45 (!!!) minutes.
My question is:
At point 2. I process all the rows (900.000) in one big list from the "sync"-table into the "production"-table. But is there a way to do this in seperate batches of smaller size?