How to Deal with (very) large data table transfers ?

This post isn’t dealing directly with Dynamics GP, but the process I went thru might actually apply equally well, so keep reading and feel free to add your comment.

Last week a customer contacted me to proceed with some data structure changes in one of their database. It wasn’t a big deal as they wanted me to update the field length in one specific field in a single table, changing it from VARCHAR (2000) to VARCHAR(MAX). No problem you may say, just open SSMS and go into the table design mode, change the value and save the table. Not so fast, because due to some indexes and identity fields, SSMS didn’t want me to let apply the changes.. So I had to cancel out and resort to another method.

A little more googling and here came the answer :
ALTER TABLE YourTable ALTER COLUMN YourColumn VARCHAR (MAX) NOT NULL;
I took the TEST database for this initial attempt as it was originally a copy of the LIVE DB, but at a time where there was very little data in it (barely 100 rows), and it took no time (5 secs) to apply the new field size. Behind the scene, SQL is actually copying the whole table data into a ##temp table, change the field size and copy back all the data.

Not a problem at all when the data is small.. but when I came to try this on the LIVE database, I realized that the table contained actually a little more than 170 mio rows! Yes, you read that right: 170 mio rows that would account for about 22GB of size, just for this table in the DB. With that information on hand, I told the customer that this operation better would not be done during regular hours, but rather nightly or over the week-end.

In the evening I’d go back and try to apply the changes to the LIVE data table, but SQL would time out after a long run and the whole process would fail. Too much data to handle. So I decided to create a copy of the table with the new field size and INSERT all the records from the original table into the cloned table. The INSERT INTO from SELECT statement ran for 2-3 hours and guess what, failed due to running out of disk space.. The transaction log file of the DB had grown to 132GB and left zero space on this disk.

Check your recovery mode you’d say! Hey, I’m not a newbie, so that was the 1st thing I did and put the DB from FULL into SIMPLE recovery mode, just because I didn’t wanted this INSERT statement logged all the way thru the 170mio rows. Turns out that SQL still uses some logging when using the INSERT statement, and the funny thing was that I could watch the insert of the rows into the new table, while the log file was growing. Way before it was maxing out the disk space, the 170mio rows had been copied over, but for some un-known reasons I haven’t yet figured out, SQL kept processing on this data and would eventually run out of space, and then rolling back all the transactions, emptying the destination table, back to square zero.

I finally managed to get the data into the destination table by making a lot of space on the drive and let the process run. A few hours later and the whole thing was there. I renamed the old table and gave the new table the original name, so the application could keep writing into the table. As a side note, this table is used to log testing data coming from some kind of quality control processing system that would run batteries of tests to new devices and write all the data into that Database, thus the huge amount of data.. And there was no question about deleting old data, as it had to be kept for Quality insurance purposes.

Few days later, and the developer contacted me to report that all the queries to the new tables were extremely slow running and it would become painful to run reports against the data. A little research returned an interesting information about the VARCHAR field. Turns out that if the size if defined by using MAX, then technically the field can store up to 2GB of data. The major difference is that SQL doesn’t handle queries the same way soon as the field size is bigger than 8000. This is all related to the way blocks of data are stored in SQL server (more about the details can be found here)

As I explained this to the developer, he understood that he might never need 2GB of space to store lengthy strings, but that 2k was not enough from his past experience, though 8k would probably be fine. Off I was again to reduce from VARCHAR(MAX) to VARCHAR(8000), which I thought would be the same process as the first time.. No sir. I ran the process twice and it failed twice, always running out of disk space, no matter how long I’d let it run. Again, the 170mio rows would copy over in about 20-25 minutes, but then the process wouldn’t just stop there, but kept running for another 3-4 hours, before it just couldn’t anymore.

I started searching if there were better ways to copy large chunk of data, and some suggested to use SSIS or the SQL Import/Export Wizard, which creates some kind of SSIS package, and also indicating that you could actually during the process define a batch size for the processing, limiting the amount of logged data.. Nice. But when I tried the wizard, I couldn’t find any place to set a batch size. Turns out that on the old 2008 R2 this wasn’t an option to select from.

Nonetheless, this had to be done and my week-end window was running out of time, so I went off with the wizard generated package to be executed. What I noticed right away that there was no logging by SQL (or very minimal), which means in some ways it was optimizing the use of disk. The downside of this was that the data transfer was extremely slow, like 1 mio rows in a little less than 2 minutes.. Doing my math, I estimated that the whole transfer would take about 4-5 hours. It was end of PM, so I decided that I could manage this until 9-10PM.

As I checked back by around 9:45PM, it had reached 150mio rows and still counting.. It finally completed successfully by 10:20PM and I could set the DB back in FULL recovery mode, take a full backup and rename the tables. The next day, the developer got back to me to confirm that reporting speed was back similar to what it used to be before the changes. Great ! He would get back in a few days to confirm the old table could be deleted. End of the job.

Not so fast! The whole processing time was really bugging me as I couldn’t believe SQL’s engine was so bad to not being able to process data much faster.. I know this is old SQL 2008 R2, so there is no ‘in-memory’ processing, but heck, 5 hours to run a simple table transfer !?!

I thought I would use their DEV server to copy over the DB backup and play a little with it to see if I could come up with a better execution plan. My attempt failed grossly, as SQL wasn’t able to restore the database. The backup was (compressed) only 6GB large, but SQL needed the whole transaction log space to restore it, and as the DEV server’s disk were just not as big as in PROD, I had to find another way.

So why not use the Import/Export wizard again to transfer only a small sample of data from the PROD server to the DEV system ? It wouldn’t take a big payload on the source system to have any major impact, and off I was. 5 minutes later I had 1mio rows in the destination DB, within a single table I had just created from scratch, based on the original table.

Once this was in place, I created a clone table to have the same exact situation, with an originating VARCHAR(MAX) field going to a VARCHAR(8000) destination field.

Thanks to some older blog posts I found ( http://sqlserverplanet.com/data-warehouse/quick-table-transfers-imports-using-ssis-bulk-insert-or-bcp ), I was able figure out why SQL would log all the INSERT statements, even when in SIMPLE recovery mode. But it also gave me some hints about other option, which confirmed that the SSIS package would use special INSERT BULK commands when running.

The T-SQL script found in this blog post helped me to create automatically from every single table the necessary SQL commands to bulk export data into flat files and re-import them into SQL tables: https://blogs.msdn.microsoft.com/patrickgallucci/2007/08/10/generate-bcp-bulk-insert-code-with-tsql/

Then I started taking some performance timing on the process of bulk import. The export had generated a file of about 95kb (yes!) for 1mio rows. There were various format files that were generated as well to provide the necessary header infos for the data import too.

Just to confirm my original issue, I used the “classic” INSERT command from SQL to bring in the 1mio rows from the flat file to the new SQL table. I checked carefully that the transaction logs very cleared and back to original size (1MB) and the DB in SIMPLE recovery mode before starting each process. This is what came up after the first run :

--classic INSERT from file set (total 15 secs, log grew to 650MB)
2019-11-11 11:54:52.9261332
(1000000 row(s) affected)
2019-11-11 11:55:07.0755146

The transaction log had grown to 650MB in 15 secs, just to insert those 1mio rows. Convert that to 170mio rows, and you can imagine the log file grow way above 100+GB. The timing wasn’t that bad, which would confirm the actual data transfer is not what takes the most of it, somewhere around 40-45 minutes, but there were other things happening behind the scene that I couldn’t figure out which would keep the process running forever.

Now, to the BULK INSERT command. As this command allows for defining batch sizes to keep the logging minimal and reused, I started with 100k. The process ran very fast and watching my log grow only to 35MB during the whole process !
I repeated the process, clearing up my log every time and truncating the destination table between each process, I came up with the following results (below is the statement that was used, only changing the BATCHSIZE value):

BULK INSERT [MyNewDB].[dbo].[RESULTSESSIONLAPDETAIL_NEW]
 FROM 'L:\SQL\Output\RESULTSESSIONLAPDETAIL_NEW.Dat' WITH (
         FORMATFILE = 'L:\SQL\Output\RESULTSESSIONLAPDETAIL_NEW.FMT'
         ,BATCHSIZE = 100000
         ,ERRORFILE = 'L:\SQL\Output\BI_RESULTSESSIONLAPDETAIL_NEW.ERR'
         ,TABLOCK
         );GO
-- BULK INSERT from file dataset (total 8.9 secs, log grew to 35MB), batch size of 100k rows
2019-11-11 11:44:40.5956083
[…]
(1000000 row(s) affected)
2019-11-11 11:44:48.5517613
--BULK INSERT from file dataset (total 7.9 secs, log grew to 47MB), batch size of 1Mio rows
2019-11-11 12:21:12.6430472
(1000000 row(s) affected)
(1000000 row(s) affected)
2019-11-11 12:21:19.5539358
--BULK INSERT with batch size of 10k (total 10 secs, log grew to 10Mb only )
2019-11-11 12:26:12.5680765
[…]
(1000000 row(s) affected)
2019-11-11 12:26:22.9109428
--BULK INSERT with batch size of 50k (total 9 secs, log grew to 22MB)
2019-11-11 12:28:36.6617680
[…]
(1000000 row(s) affected)
2019-11-11 12:28:45.5697645
--BULK INSERT with batch size of 200k (total 9 secs, log grew to 70MB)
2019-11-11 12:30:53.0087645
[…]
(1000000 row(s) affected)
2019-11-11 12:31:02.2284236
 --BULK INSERT with batch size of 500k (total 7.8 secs, log grew o 124MB)
2019-11-11 12:35:23.6991811
[..]
(1000000 row(s) affected)
2019-11-11 12:35:31.5148813 
--Repeated 100k & 1Mio to confirm timing
--BULK INSERT with batch size of 100k (total 8.6 secs, log grew o 35MB)
2019-11-11 12:33:09.5195211
[..]
(1000000 row(s) affected)
2019-11-11 12:33:18.1624751
--BULK INSERT w/  batch size of 1Mio rows (total 6.8 secs, log grew to 47MB) 
2019-11-11 12:37:54.1403120
[..]
(1000000 row(s) affected)
2019-11-11 12:38:00.9267035

Strangely enough, there would be some logic behind the growing of the log file based on the batch size, but when jumping from 500k to 1mio, the log would actually end up being smaller !?!. From 124MB with 500k rows per batch, it would be only 47MB for 1Mio rows.

The processing time wouldn’t change much between the various batch sizes, going from 6.8 secs up to 10 secs for the slowest one, which turned out to be also the smallest batch size (10k).. so smaller doesn’t necessarily mean faster.. in this case the 1Mio rows batch size turned out to be the most optimal when it came to processing a large data pool. Using the BULK INSERT technique would have completed the job in less than 20 minutes and kept the log growth in control any time.

Closing note: I was already aware of the BCP command and BULK INSERT from an old support paper from Microsoft when it came to Dynamics GP data migration, and I had it tried out, granted it was a very small data sample as the TWO company isn’t very big, but I was surprised myself by the speed and effectiveness at the time. Here is the link to this old KB article : https://support.microsoft.com/en-us/help/875179/how-to-use-the-bulk-copy-process-bcp-to-export-microsoft-dynamics-gp-d

NB: if you run into permission issues telling you that the authentication failed when trying to export the data into flat files, don’t necessary blame your SQL server login.. As I found out it is often the folder where you try to save the data that is the culprit.

Hope that helps and until next post, wish you happy reading.
@GP_Beat

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress.com.

Up ↑

ReadyXRM

Stuff about Microsoft Dynamics 365 and the Power Platform

The Dynamics Explorer

Exploring the World of Microsoft Dynamics ERP

Paul S. Randal

My contribution to the never ending Dynamics GP journey

SID-500.COM

Windows . PowerShell . Automation . Network Engineering . Cyber Security

JenKuntz.ca

Dynamics GP & other thoughts!

SergesPoint

Tricks and trips about SharePoint

Salim Adamon - Dynamics CRM Blog

Everything Dynamics CRM from a technical, architectural and business perspective

The Official Blog for Training Dynamo LLC

Microsoft Dynamics GP - Tips, Tricks, News and Thoughts

Dynamics GP Tips and Traps For Users and Developers

Talking about Great Plains, Dexterity, SQL, and some C#

%d bloggers like this: