Huge slowdown when archive size increase

Comments, questions, bug reports, etc.
Gehard
Posts: 18
Joined: Wed Mar 23, 2016 7:52 am

Huge slowdown when archive size increase

Post by Gehard »

Hi all,

Great program :) We are looking to the possibility of using it as our main backup program. However, our first test on windows server 2003 (yes 2003) with wimlib 1.9.1 show an increasing slowdown as the archive size increase:
- We tested with --compress=none to ensure it was not due to the compression algorithm, same problem.
- We tried to backup a full disk of 3TB containing slightly more than 1 million files:
- It started at 105MB/sec mean on the first 30GB (all figures from the wimlib console output)
- Between 45 and 60GB, the speed went down to 85MB/sec
- It continuously decreased until 116GB where we stopped the process (the original harddrive was only read once in a while and the Windows UI was lagging a lot, even without compression). Mean speed between 106GB and 116GB of read data was 33MB/sec.
- We also checked copying the directory with windows explorer, it made the full backup at 130MB/sec mean locally and 112MB/sec on the network.
- We made a defrag utility scan which showed that the archive file was highly fragmented. The backup drive was completely free, the backup made with windows explorer had 0% fragmentation. Test done on smaller archive already show some fragmentation, even on freshly formatted partitions.

If you need some more infos, just tell us.
synchronicity
Site Admin
Posts: 472
Joined: Sun Aug 02, 2015 10:31 pm

Re: Huge slowdown when archive size increase

Post by synchronicity »

Thanks for the information. It should be kept in mind that the read speed and write speed will not necessarily match. Since wimlib performs deduplication of file contents, write speed is expected to decrease over time as the probability that a given file's contents was already encountered increases. That does not mean that the backup is actually slowing down.

Can you provide the exact amount of time it takes to create the archive file with wimlib versus other "backup" methods such as a copy with Windows explorer? The output of 'wiminfo' run on the resulting WIM file may also be useful.

I am also interested in why you observed the WIM file to be heavily fragmented, and I will see if I can reproduce that. Since the data is written sequentially (for the most part), that shouldn't really happen.
Gehard
Posts: 18
Joined: Wed Mar 23, 2016 7:52 am

Re: Huge slowdown when archive size increase

Post by Gehard »

synchronicity wrote:Thanks for the information. It should be kept in mind that the read speed and write speed will not necessarily match. Since wimlib performs deduplication of file contents, write speed is expected to decrease over time as the probability that a given file's contents was already encountered increases. That does not mean that the backup is actually slowing down.
We although thought about the deduplication. But the hard drive with original datas was nearly not reading anymore. Neither did the backup hard drive did write that much. So it shouldn't come from deduplication.
synchronicity wrote:Can you provide the exact amount of time it takes to create the archive file with wimlib versus other "backup" methods such as a copy with Windows explorer? The output of 'wiminfo' run on the resulting WIM file may also be useful.
With windows explorer, It took approximately 6 Hours to backup 2.7 TB of data. With Wimlib, we stopped the process because of unresponsive UI and very slow backup (the leds for the 2 hard drives where hardly blinking). If it would have continued at 33MB/sec, it would have taken already pretty long, but analysis of speed across the 116 first GB showed that it continuously slowed down. as we were at 0,11 TB out of 2.7TB with only 1/4th of the starting speed, we calculated that it would take way to long. This plus the high fragmentation lead us to better speak with you before continuing.
synchronicity wrote:I am also interested in why you observed the WIM file to be heavily fragmented, and I will see if I can reproduce that. Since the data is written sequentially (for the most part), that shouldn't really happen.
Some colleagues also tried in the past to do some python backup script. I don't know if it can help, but he then had the problem that files created bytes by bytes always got fragmented on windows. Only when he did big allocations was the file written in one chunk. Maybe this link http://stackoverflow.com/questions/4552 ... on-windows can explain it ?
synchronicity
Site Admin
Posts: 472
Joined: Sun Aug 02, 2015 10:31 pm

Re: Huge slowdown when archive size increase

Post by synchronicity »

Well, I don't have the hardware to test 3 TB of data, but when I create an archive from a Windows 7 installation I get completely opposite results from you. wimlib created a WIM archive in just over 2 minutes whereas Windows Explorer took more than 10 minutes to copy all the files. And the resulting WIM archive was completely unfragmented --- all clusters of the file were in a single extent.

So you are going to have to narrow down the problem yourself. You might look at CPU and memory usage as well as disk usage, and check whether there are any other programs running that are consuming resources.
Gehard
Posts: 18
Joined: Wed Mar 23, 2016 7:52 am

Re: Huge slowdown when archive size increase

Post by Gehard »

Ok, that's weird. Fragmentation is not our main concern. Decreasing speed was the main reason why we stopped the process.
We also tried a smaller test on win7 x64 with a core 4 cores IVY Bridge at 3.4 Ghz, one HDD (which can copy at max 40MB/sec when reading and writing to the same partition). Test was done with --compress=none and wimlib 1.9.1 x64, all services and programs that were not needed deactivated just like in the first report (no windows update, no antivirus, etc...) :
As admin, with --snapshot, no problem to save the full OS drive (no fragmentation, low cpu usage (about 10% mean), HDD was working all the time. It was 70GB (78 before deduplication) in about 400 000 files.
Memory usage of wimlib was stable at about 310MB but windows memory usage has changed several time during the full OS backup.

It is not incompatible with our past report as the mean speed was still of 98MB/sec after 65GB backed data and the HDD used in this test can only copy at max 40MB/sec to the same partition.

So to have a chance to see the problem, on a core 4 cores IVY Bridge at 3.4 Ghz, you have to backup at least over 100MB/sec (in the first test we used 2 enterprise HDD writing at about 130MB/sec but a very big SSD could also work) and the speed really start to be under 100 MB/sec after 100GB have been backed up.

Edit: Note that their was 0 duplicate on the HDD in the first test, only hardlinks (which I guess are only read once by wimlib). Another difference between the 2 tests: on Win7, the archive stay at 0Kb until the end and only show the real size when done. On server 2003, you can see increases in the size every sec.
Gehard
Posts: 18
Joined: Wed Mar 23, 2016 7:52 am

Re: Huge slowdown when archive size increase

Post by Gehard »

It seems the problem of fragmentation only happens on server 2003. The fact that the file size is updated regularly unlike on windows 7 shows that the behavior is for some reason not the same.
synchronicity
Site Admin
Posts: 472
Joined: Sun Aug 02, 2015 10:31 pm

Re: Huge slowdown when archive size increase

Post by synchronicity »

Is it possible the slowdown is specific to Windows Server 2003 as well?
Gehard
Posts: 18
Joined: Wed Mar 23, 2016 7:52 am

Re: Huge slowdown when archive size increase

Post by Gehard »

Test are on their way for win7, but in the meantime, we are pretty sure that the different behavior for writting to the archive cause the slowdown.
The bottleneck is for sure none of the HDD, even with high fragmentation, but it seems it's not directly due to wimlib either. CPU usage is much higher on server 2003 past the 100GB (75% without compression), but wimlib is only responsible for 12% of it, comparable to win7.
It seems the difference in the way data are written to the archive and especially the allocation is done (regularly small file size increase on server2003 vs file at 0 kb until the end on win7) make the OS to fragment the files and lead to increasing CPU overhead over time to find new free blocks.
Would it be possible to have a version that write exactly the same way to the archive on server 2003 to have comparable results?
synchronicity
Site Admin
Posts: 472
Joined: Sun Aug 02, 2015 10:31 pm

Re: Huge slowdown when archive size increase

Post by synchronicity »

Any news about the results on Windows 7?

Unfortunately, I don't have a copy of Windows Server 2003 available for testing, and in general it is difficult for me to invest time into testing with outdated operating systems.

Also: wimlib almost always runs exactly the same code on all versions of Windows. So if you really are experiencing a difference specifically in how the WIM file is written, that is likely to be a consequence of how the underlying operating system or filesystem is implemented --- I can't simply change it to be exactly the same (although there could be workarounds).
Gehard
Posts: 18
Joined: Wed Mar 23, 2016 7:52 am

Re: Huge slowdown when archive size increase

Post by Gehard »

Just came back from vacation, sorry for the delay. We naturally have a pile of work now :D But it will be done this week. I'll report as soon as new infos are here.
Post Reply