digitalmars.D.announce - parallel copy directory, faster than robocopy
- Jay Norwood (2/2) Feb 13 2012 Attached is the source for a small parallel app that copies a source fol...
- Jay Norwood (66/66) Feb 13 2012 ok, so I guess the Add File didn't work for some reason, so here's the s...
- Jay Norwood (72/72) Feb 13 2012 ok, I didn't test that first one very well. It worked for directory cop...
- Jay Norwood (4/4) Feb 14 2012 An improvement is to change this first mkdir to mkdirRecurse.
- deadalnix (4/8) Feb 14 2012 If I could suggest something, it would be great to see this added to
- Sean Cavanaugh (10/12) Feb 14 2012 more of an 'FYI/reminder':
- Jay Norwood (15/25) Feb 14 2012 Yeah, Robocopy has a lot of nice options. Currently the D
- Nick Sabalausky (6/16) Feb 14 2012 Nice!
- Jay Norwood (16/16) Mar 04 2012 I placed the two parallel file operations, rmdir and copy on
- Andrei Alexandrescu (4/18) Mar 05 2012 Sounds great! Next step, should you be interested, is to create a pull
- Jay Norwood (9/12) Mar 05 2012 I considered that. I suppose the wildArgv code could go in
- Danny (3/3) Feb 06 2018 The rule of thumb is use double number of threads of the logical
- dennis luehring (4/6) Mar 05 2012 do you compare single-threaded robocopy with your implementation or
- Jay Norwood (10/13) Mar 05 2012 yes, I tested vs multithread robocopy. As someone pointed out,
- Jay Norwood (55/55) Mar 05 2012 So here is the output of a batch file I just ran on the ssd drive
- jackreacher (1/56) Dec 11 2017
- rumbu (4/5) Feb 07 2018 For a fair comparison, I think that's the command line:
Attached is the source for a small parallel app that copies a source folder to a destination. It creates the directory structure first using the breadth ordering, then uses a parallel foreach loop with the taskPool to copy all the regular files in parallel. On my corei7, this copied a 1.5GB folder with around 36K entries to a destination in about 11.5 secs (src and dest on the same ssd drive). This was about a second better than robocopy, which is the fastest alternative I could find. The regular win7-64 copy takes 41 secs for the same folder. I'd like to add wildcard processing for the sources, but haven't found a good example.
Feb 13 2012
ok, so I guess the Add File didn't work for some reason, so here's the source. module main; import std.stdio; import std.file; import std.path; import std.datetime; import std.parallelism; int main(string[] argv) { if (argv.length != 3){ writeln ("need to specify src and dest dir"); return 0; } // TODO expand this to handle wildcard string dest = argv[$-1]; foreach(string dir; argv[1..$-1]) { writeln("copying directory: "~ dir ); auto st1 = Clock.currTime(); //Current time in local time. cpdir(dir,dest); auto st2 = Clock.currTime(); //Current time in local time. auto dif = st2 - st1 ; auto ts= dif.toString(); writeln("time:"~ts); } writeln("finished !"); return 0; } void cpdir(in char[] pathname ,in char[] dest){ DirEntry deSrc = dirEntry(pathname); string[] files; if (!exists(dest)){ mkdir (dest); // makes dest root } DirEntry destDe = dirEntry(dest); if(!destDe.isDir()){ throw new FileException( destDe.name, " is not a directory"); } string destName = destDe.name ~ '/'; if(!deSrc.isDir()){ copy(deSrc.name,dest); } else { string srcRoot = deSrc.name; int srcLen = srcRoot.length; string destRoot = destName ~ baseName(deSrc.name); mkdir(destRoot); // make an array of the regular files only, also create the directory structure // Since it is SpanMode.breadth, can just use mkdir foreach(DirEntry e; dirEntries(deSrc.name, SpanMode.breadth, false)){ if (attrIsDir(e.linkAttributes)){ string destDir = destRoot ~ e.name[srcLen..$]; mkdir(destDir); } else{ files ~= e.name; } } // parallel foreach for regular files foreach(fn ; taskPool.parallel(files)) { string dfn = destRoot ~ fn[srcLen..$]; copy(fn,dfn); } } }
Feb 13 2012
ok, I didn't test that first one very well. It worked for directory copies, but I didn't test non directories. So here is the fixed operation for non directories, where it just copies the single file. So it now does two cases: copy regular_file destinationDirectory copy folder destinationDirectory What I'd like to add is wildcard support for something like copy folder/* destinationDirectory I suppose also it could be enhanced to handle all the robocopy options, but I'm just trying out the copy speeds for now. module main; import std.stdio; import std.file; import std.path; import std.datetime; import std.parallelism; int main(string[] argv) { if (argv.length != 3){ writeln ("need to specify src and dest dir"); return 0; } // TODO expand this to handle wildcard string dest = argv[$-1]; foreach(string dir; argv[1..$-1]) { writeln("copying directory: "~ dir ); auto st1 = Clock.currTime(); //Current time in local time. cpdir(dir,dest); auto st2 = Clock.currTime(); //Current time in local time. auto dif = st2 - st1 ; auto ts= dif.toString(); writeln("time:"~ts); } writeln("finished !"); return 0; } void cpdir(in char[] pathname ,in char[] dest){ DirEntry deSrc = dirEntry(pathname); string[] files; if (!exists(dest)){ mkdir (dest); // makes dest root } DirEntry destDe = dirEntry(dest); if(!destDe.isDir()){ throw new FileException( destDe.name, " is not a directory"); } string destName = destDe.name ~ '/'; string destRoot = destName ~ baseName(deSrc.name); if(!deSrc.isDir()){ copy(deSrc.name,destRoot); } else { string srcRoot = deSrc.name; int srcLen = srcRoot.length; mkdir(destRoot); // make an array of the regular files only, also create the directory structure // Since it is SpanMode.breadth, can just use mkdir foreach(DirEntry e; dirEntries(deSrc.name, SpanMode.breadth, false)){ if (attrIsDir(e.linkAttributes)){ string destDir = destRoot ~ e.name[srcLen..$]; mkdir(destDir); } else{ files ~= e.name; } } // parallel foreach for regular files foreach(fn ; taskPool.parallel(files)) { string dfn = destRoot ~ fn[srcLen..$]; copy(fn,dfn); } } }
Feb 13 2012
An improvement is to change this first mkdir to mkdirRecurse. if (!exists(dest)){ mkdir (dest); // makes dest root }
Feb 14 2012
Le 14/02/2012 14:29, Jay Norwood a écrit :An improvement is to change this first mkdir to mkdirRecurse. if (!exists(dest)){ mkdir (dest); // makes dest root }If I could suggest something, it would be great to see this added to std.file . As well as the multithreaded remove we talked about recently in another thread.
Feb 14 2012
On 2/13/2012 10:58 PM, Jay Norwood wrote:Attached is the source for a small parallel app that copies a source folder to a destination. It creates the directory structure first using the breadth ordering, then uses a parallel foreach loop with the taskPool to copy all the regular files in parallel. On my corei7, this copied a 1.5GB folder with around 36K entries to a destination in about 11.5 secs (src and dest on the same ssd drive). This was about a second better than robocopy, which is the fastest alternative I could find. The regular win7-64 copy takes 41 secs for the same folder. I'd like to add wildcard processing for the sources, but haven't found a good example.more of an 'FYI/reminder': At a minimum Robocopy does additional work to preserve the timestamps and attributes of the copies of the files (by default) so it can avoid redundant copies of files in the future. This is undoubtedly creating some additional overhead. Its probably also quite a bit worse with /SEC etc to copy permissions. On the plus side you would have windows scheduling the IO which in theory would be able to minimize seeking to some degree, compared to robocopy's serial copying.
Feb 14 2012
On Wednesday, 15 February 2012 at 00:11:32 UTC, Sean Cavanaugh wrote:more of an 'FYI/reminder': At a minimum Robocopy does additional work to preserve the timestamps and attributes of the copies of the files (by default) so it can avoid redundant copies of files in the future. This is undoubtedly creating some additional overhead. Its probably also quite a bit worse with /SEC etc to copy permissions. On the plus side you would have windows scheduling the IO which in theory would be able to minimize seeking to some degree, compared to robocopy's serial copying.Yeah, Robocopy has a lot of nice options. Currently the D library has copy (srcpath, destpath), which goes directly to the OS copy. If it had something like copy(DirectoryEntry,destpath,options), with the options being like the Robocopy options, that might be more efficient. On the ssd seeking is on the order of 0.2msec vs 16msec on my 7200rpm seagate hard drive. I do think seeks on a hard drive will be a problem with all the small, individual file copies. So is Robocopy bundling these up in some way? I did find a nice solution in std.file for the argv expansion, btw, and posted an example on D.learn. It uses a version of dirEntries that has an extra parameter that is used for expansion that is available in std.path.
Feb 14 2012
"Jay Norwood" <jayn prismnet.com> wrote in message news:jhcplo$1jj8$1 digitalmars.com...Attached is the source for a small parallel app that copies a source folder to a destination. It creates the directory structure first using the breadth ordering, then uses a parallel foreach loop with the taskPool to copy all the regular files in parallel. On my corei7, this copied a 1.5GB folder with around 36K entries to a destination in about 11.5 secs (src and dest on the same ssd drive). This was about a second better than robocopy, which is the fastest alternative I could find. The regular win7-64 copy takes 41 secs for the same folder. I'd like to add wildcard processing for the sources, but haven't found a good example.Nice! Is it possible this could increase disk fragmentation though? Or do the filesystem drivers on Win/Lin/etc work in a way that mitigates that possibility?
Feb 14 2012
I placed the two parallel file operations, rmdir and copy on github in https://github.com/jnorwood/file_parallel These combine the std.parallelism operations with the std.file operations to speed up the processing on Windows. ----------- I also put a useful function that does argv pathname wildcard expansion in https://github.com/jnorwood/file_utils This makes use of one of the existing dirEntries call that has the pattern matching parameter which enables simple * and ? expansions in windows args. I'm only allowing expansions in the basename, and only expanding in one level of the directory. There are example Windows commandline utilies that use each of the functions in file_parallel/examples. I've only testsd these on win7, 64 bit.
Mar 04 2012
On 3/4/12 2:53 PM, Jay Norwood wrote:I placed the two parallel file operations, rmdir and copy on github in https://github.com/jnorwood/file_parallel These combine the std.parallelism operations with the std.file operations to speed up the processing on Windows. ----------- I also put a useful function that does argv pathname wildcard expansion in https://github.com/jnorwood/file_utils This makes use of one of the existing dirEntries call that has the pattern matching parameter which enables simple * and ? expansions in windows args. I'm only allowing expansions in the basename, and only expanding in one level of the directory. There are example Windows commandline utilies that use each of the functions in file_parallel/examples. I've only testsd these on win7, 64 bit.Sounds great! Next step, should you be interested, is to create a pull request for phobos so we can integrate your code within. Andrei
Mar 05 2012
On Monday, 5 March 2012 at 12:48:54 UTC, Andrei Alexandrescu wrote:Sounds great! Next step, should you be interested, is to create a pull request for phobos so we can integrate your code within. AndreiI considered that. I suppose the wildArgv code could go in std.path, and the file operations into std.file. and the pull requests against those files. I haven't followed the discussions closely enough to know what are the rules/politics about adding another std library import into those. It would require adding import of std.parallelism into std.file.
Mar 05 2012
The rule of thumb is use double number of threads of the logical cores..use GS rich copy 360 enterprise..supports upto 256 threads at once..not sure about robocopy.
Feb 06 2018
Am 14.02.2012 05:58, schrieb Jay Norwood:Attached is the source for a small parallel app that copies a source folder to a destination. It creates the directory structure first using the breadth ordering, then uses a parallel foreach loop with the taskPool to copy all the regular files in parallel. On my corei7, this copied a 1.5GB folder with around 36K entries to a destination in about 11.5 secs (src and dest on the same ssd drive). This was about a second better than robocopy, which is the fastest alternative I could find. The regular win7-64 copy takes 41 secs for the same folder. I'd like to add wildcard processing for the sources, but haven't found a good example.do you compare single-threaded robocopy with your implementation or multithreaded? you can command robocopy to use multiple threads with /MT[:n]
Mar 05 2012
On Monday, 5 March 2012 at 16:35:09 UTC, dennis luehring wrote:do you compare single-threaded robocopy with your implementation or multithreaded? you can command robocopy to use multiple threads with /MT[:n]yes, I tested vs multithread robocopy. As someone pointed out, robocopy has lots of nice options, which I didn't try to duplicate, and is only about 10% slower on my test. I was happy to see the D app in the same ballpark as robocopy, which means to me that the very simple and clean std.parallism taskpool foreach loop can produce very good multi-core results in a very concise and readable piece of code. I've done some projects previously using omp pragmas in C++ and it is just so ugly.
Mar 05 2012
So here is the output of a batch file I just ran on the ssd drive for the 1.5GB copy. Robocopy displays that it took around 14 secs, while the release build of the D commandline cpd utility took around 12 secs. That's a pretty consistent result on the ssd drive, which are more sensitive to cpu pr. 06:12 PM H:\xx8>robocopy /E /NDL /NFL /NC /NS /MT:8 xx8c xx8ca ------------------------------------------------------------------------------- ROBOCOPY :: Robust File Copy for Windows ------------------------------------------------------------------------------- Started : Mon Mar 05 18:12:33 2012 Source : H:\xx8\xx8c\ Dest : H:\xx8\xx8ca\ Files : *.* Options : *.* /NS /NC /NDL /NFL /S /E /COPY:DAT /MT:8 /R:1000000 /W:30 ------------------------------------------------------------------------------ 100% ------------------------------------------------------------------------------ Total Copied Skipped Mismatch FAILED Extras Dirs : 2627 2626 1 0 0 0 Files : 36969 36969 0 0 0 0 Bytes : 1.502 g 1.502 g 0 0 0 0 Times : 0:02:05 0:00:12 0:00:00 0:00:01 Ended : Mon Mar 05 18:12:47 2012 H:\xx8>time /T 06:12 PM H:\xx8>rmd xx8ca\* removing: xx8ca\Cross_Tools removing: xx8ca\eclipse removing: xx8ca\gnu removing: xx8ca\PA finished! time:17889 ms H:\xx8>time /T 06:13 PM H:\xx8>cpd xx8c\* xx8ca copying: xx8c\Cross_Tools copying: xx8c\eclipse copying: xx8c\gnu copying: xx8c\PA finished! time: 11681 ms H:\xx8>time /T 06:13 PM btw, I just ran robocopy with /mt:1, and it took around 42 seconds on the same drive, which is about what I see with the standard windows copy, including the gui copy. So, at least for these ssd drives the parallel processing results in worthwhile speed-ups. Started : Mon Mar 05 18:24:31 2012 Ended : Mon Mar 05 18:25:13 2012
Mar 05 2012
On Tuesday, 6 March 2012 at 00:29:01 UTC, Jay Norwood wrote:So here is the output of a batch file I just ran on the ssd drive for the 1.5GB copy. Robocopy displays that it took around 14 secs, while the release build of the D commandline cpd utility took around 12 secs. That's a pretty consistent result on the ssd drive, which are more sensitive to cpu pr. 06:12 PM H:\xx8>robocopy /E /NDL /NFL /NC /NS /MT:8 xx8c xx8ca ------------------------------------------------------------------------------- ROBOCOPY :: Robust File Copy for Windows ------------------------------------------------------------------------------- Started : Mon Mar 05 18:12:33 2012 Source : H:\xx8\xx8c\ Dest : H:\xx8\xx8ca\ Files : *.* Options : *.* /NS /NC /NDL /NFL /S /E /COPY:DAT /MT:8 /R:1000000 /W:30 ------------------------------------------------------------------------------ 100% ------------------------------------------------------------------------------ Total Copied Skipped Mismatch FAILED Extras Dirs : 2627 2626 1 0 0 0 Files : 36969 36969 0 0 0 0 Bytes : 1.502 g 1.502 g 0 0 0 0 Times : 0:02:05 0:00:12 0:00:00 0:00:01 Ended : Mon Mar 05 18:12:47 2012 H:\xx8>time /T 06:12 PM H:\xx8>rmd xx8ca\* removing: xx8ca\Cross_Tools removing: xx8ca\eclipse removing: xx8ca\gnu removing: xx8ca\PA finished! time:17889 ms H:\xx8>time /T 06:13 PM H:\xx8>cpd xx8c\* xx8ca copying: xx8c\Cross_Tools copying: xx8c\eclipse copying: xx8c\gnu copying: xx8c\PA finished! time: 11681 ms H:\xx8>time /T 06:13 PM btw, I just ran robocopy with /mt:1, and it took around 42 seconds on the same drive, which is about what I see with the standard windows copy, including the gui copy. So, at least for these ssd drives the parallel processing results in worthwhile speed-ups. Started : Mon Mar 05 18:24:31 2012 Ended : Mon Mar 05 18:25:13 2012
Dec 11 2017
On Tuesday, 6 March 2012 at 00:29:01 UTC, Jay Norwood wrote:H:\xx8>robocopy /E /NDL /NFL /NC /NS /MT:8 xx8c xx8caFor a fair comparison, I think that's the command line: robocopy /E /NDL /NFL /NC /NS /MT:8 /COPY:D /NODCOPY /XJ /R:0 /W:0 /NP /NJH /NJS /256 xx8c xx8ca
Feb 07 2018