digitalmars.D.announce - parallel copy directory, faster than robocopy
- Jay Norwood (2/2) Feb 13 2012 Attached is the source for a small parallel app that copies a source fol...
- Jay Norwood (66/66) Feb 13 2012 ok, so I guess the Add File didn't work for some reason, so here's the s...
- Jay Norwood (72/72) Feb 13 2012 ok, I didn't test that first one very well. It worked for directory cop...
- Jay Norwood (4/4) Feb 14 2012 An improvement is to change this first mkdir to mkdirRecurse.
- deadalnix (4/8) Feb 14 2012 If I could suggest something, it would be great to see this added to
- Sean Cavanaugh (10/12) Feb 14 2012 more of an 'FYI/reminder':
- Jay Norwood (15/25) Feb 14 2012 Yeah, Robocopy has a lot of nice options. Currently the D
- Nick Sabalausky (6/16) Feb 14 2012 Nice!
- Jay Norwood (16/16) Mar 04 2012 I placed the two parallel file operations, rmdir and copy on
- Andrei Alexandrescu (4/18) Mar 05 2012 Sounds great! Next step, should you be interested, is to create a pull
- Jay Norwood (9/12) Mar 05 2012 I considered that. I suppose the wildArgv code could go in
- Danny (3/3) Feb 06 2018 The rule of thumb is use double number of threads of the logical
- dennis luehring (4/6) Mar 05 2012 do you compare single-threaded robocopy with your implementation or
- Jay Norwood (10/13) Mar 05 2012 yes, I tested vs multithread robocopy. As someone pointed out,
- Jay Norwood (55/55) Mar 05 2012 So here is the output of a batch file I just ran on the ssd drive
- jackreacher (1/56) Dec 11 2017
- rumbu (4/5) Feb 07 2018 For a fair comparison, I think that's the command line:
Attached is the source for a small parallel app that copies a source folder to a destination. It creates the directory structure first using the breadth ordering, then uses a parallel foreach loop with the taskPool to copy all the regular files in parallel. On my corei7, this copied a 1.5GB folder with around 36K entries to a destination in about 11.5 secs (src and dest on the same ssd drive). This was about a second better than robocopy, which is the fastest alternative I could find. The regular win7-64 copy takes 41 secs for the same folder. I'd like to add wildcard processing for the sources, but haven't found a good example.
Feb 13 2012
ok, so I guess the Add File didn't work for some reason, so here's the source.
module main;
import std.stdio;
import std.file;
import std.path;
import std.datetime;
import std.parallelism;
int main(string[] argv)
{
if (argv.length != 3){
writeln ("need to specify src and dest dir");
return 0;
}
// TODO expand this to handle wildcard
string dest = argv[$-1];
foreach(string dir; argv[1..$-1])
{
writeln("copying directory: "~ dir );
auto st1 = Clock.currTime(); //Current time in local time.
cpdir(dir,dest);
auto st2 = Clock.currTime(); //Current time in local time.
auto dif = st2 - st1 ;
auto ts= dif.toString();
writeln("time:"~ts);
}
writeln("finished !");
return 0;
}
void cpdir(in char[] pathname ,in char[] dest){
DirEntry deSrc = dirEntry(pathname);
string[] files;
if (!exists(dest)){
mkdir (dest); // makes dest root
}
DirEntry destDe = dirEntry(dest);
if(!destDe.isDir()){
throw new FileException( destDe.name, " is not a directory");
}
string destName = destDe.name ~ '/';
if(!deSrc.isDir()){
copy(deSrc.name,dest);
}
else {
string srcRoot = deSrc.name;
int srcLen = srcRoot.length;
string destRoot = destName ~ baseName(deSrc.name);
mkdir(destRoot);
// make an array of the regular files only, also create the directory
structure
// Since it is SpanMode.breadth, can just use mkdir
foreach(DirEntry e; dirEntries(deSrc.name, SpanMode.breadth, false)){
if (attrIsDir(e.linkAttributes)){
string destDir = destRoot ~ e.name[srcLen..$];
mkdir(destDir);
}
else{
files ~= e.name;
}
}
// parallel foreach for regular files
foreach(fn ; taskPool.parallel(files)) {
string dfn = destRoot ~ fn[srcLen..$];
copy(fn,dfn);
}
}
}
Feb 13 2012
ok, I didn't test that first one very well. It worked for directory copies,
but I didn't test non directories. So here is the fixed operation for non
directories, where it just copies the single file.
So it now does two cases:
copy regular_file destinationDirectory
copy folder destinationDirectory
What I'd like to add is wildcard support for something like
copy folder/* destinationDirectory
I suppose also it could be enhanced to handle all the robocopy options, but I'm
just trying out the copy speeds for now.
module main;
import std.stdio;
import std.file;
import std.path;
import std.datetime;
import std.parallelism;
int main(string[] argv)
{
if (argv.length != 3){
writeln ("need to specify src and dest dir");
return 0;
}
// TODO expand this to handle wildcard
string dest = argv[$-1];
foreach(string dir; argv[1..$-1])
{
writeln("copying directory: "~ dir );
auto st1 = Clock.currTime(); //Current time in local time.
cpdir(dir,dest);
auto st2 = Clock.currTime(); //Current time in local time.
auto dif = st2 - st1 ;
auto ts= dif.toString();
writeln("time:"~ts);
}
writeln("finished !");
return 0;
}
void cpdir(in char[] pathname ,in char[] dest){
DirEntry deSrc = dirEntry(pathname);
string[] files;
if (!exists(dest)){
mkdir (dest); // makes dest root
}
DirEntry destDe = dirEntry(dest);
if(!destDe.isDir()){
throw new FileException( destDe.name, " is not a directory");
}
string destName = destDe.name ~ '/';
string destRoot = destName ~ baseName(deSrc.name);
if(!deSrc.isDir()){
copy(deSrc.name,destRoot);
}
else {
string srcRoot = deSrc.name;
int srcLen = srcRoot.length;
mkdir(destRoot);
// make an array of the regular files only, also create the directory
structure
// Since it is SpanMode.breadth, can just use mkdir
foreach(DirEntry e; dirEntries(deSrc.name, SpanMode.breadth, false)){
if (attrIsDir(e.linkAttributes)){
string destDir = destRoot ~ e.name[srcLen..$];
mkdir(destDir);
}
else{
files ~= e.name;
}
}
// parallel foreach for regular files
foreach(fn ; taskPool.parallel(files)) {
string dfn = destRoot ~ fn[srcLen..$];
copy(fn,dfn);
}
}
}
Feb 13 2012
An improvement is to change this first mkdir to mkdirRecurse.
if (!exists(dest)){
mkdir (dest); // makes dest root
}
Feb 14 2012
Le 14/02/2012 14:29, Jay Norwood a écrit :
An improvement is to change this first mkdir to mkdirRecurse.
if (!exists(dest)){
mkdir (dest); // makes dest root
}
If I could suggest something, it would be great to see this added to
std.file . As well as the multithreaded remove we talked about recently
in another thread.
Feb 14 2012
On 2/13/2012 10:58 PM, Jay Norwood wrote:Attached is the source for a small parallel app that copies a source folder to a destination. It creates the directory structure first using the breadth ordering, then uses a parallel foreach loop with the taskPool to copy all the regular files in parallel. On my corei7, this copied a 1.5GB folder with around 36K entries to a destination in about 11.5 secs (src and dest on the same ssd drive). This was about a second better than robocopy, which is the fastest alternative I could find. The regular win7-64 copy takes 41 secs for the same folder. I'd like to add wildcard processing for the sources, but haven't found a good example.more of an 'FYI/reminder': At a minimum Robocopy does additional work to preserve the timestamps and attributes of the copies of the files (by default) so it can avoid redundant copies of files in the future. This is undoubtedly creating some additional overhead. Its probably also quite a bit worse with /SEC etc to copy permissions. On the plus side you would have windows scheduling the IO which in theory would be able to minimize seeking to some degree, compared to robocopy's serial copying.
Feb 14 2012
On Wednesday, 15 February 2012 at 00:11:32 UTC, Sean Cavanaugh wrote:more of an 'FYI/reminder': At a minimum Robocopy does additional work to preserve the timestamps and attributes of the copies of the files (by default) so it can avoid redundant copies of files in the future. This is undoubtedly creating some additional overhead. Its probably also quite a bit worse with /SEC etc to copy permissions. On the plus side you would have windows scheduling the IO which in theory would be able to minimize seeking to some degree, compared to robocopy's serial copying.Yeah, Robocopy has a lot of nice options. Currently the D library has copy (srcpath, destpath), which goes directly to the OS copy. If it had something like copy(DirectoryEntry,destpath,options), with the options being like the Robocopy options, that might be more efficient. On the ssd seeking is on the order of 0.2msec vs 16msec on my 7200rpm seagate hard drive. I do think seeks on a hard drive will be a problem with all the small, individual file copies. So is Robocopy bundling these up in some way? I did find a nice solution in std.file for the argv expansion, btw, and posted an example on D.learn. It uses a version of dirEntries that has an extra parameter that is used for expansion that is available in std.path.
Feb 14 2012
"Jay Norwood" <jayn prismnet.com> wrote in message news:jhcplo$1jj8$1 digitalmars.com...Attached is the source for a small parallel app that copies a source folder to a destination. It creates the directory structure first using the breadth ordering, then uses a parallel foreach loop with the taskPool to copy all the regular files in parallel. On my corei7, this copied a 1.5GB folder with around 36K entries to a destination in about 11.5 secs (src and dest on the same ssd drive). This was about a second better than robocopy, which is the fastest alternative I could find. The regular win7-64 copy takes 41 secs for the same folder. I'd like to add wildcard processing for the sources, but haven't found a good example.Nice! Is it possible this could increase disk fragmentation though? Or do the filesystem drivers on Win/Lin/etc work in a way that mitigates that possibility?
Feb 14 2012
I placed the two parallel file operations, rmdir and copy on github in https://github.com/jnorwood/file_parallel These combine the std.parallelism operations with the std.file operations to speed up the processing on Windows. ----------- I also put a useful function that does argv pathname wildcard expansion in https://github.com/jnorwood/file_utils This makes use of one of the existing dirEntries call that has the pattern matching parameter which enables simple * and ? expansions in windows args. I'm only allowing expansions in the basename, and only expanding in one level of the directory. There are example Windows commandline utilies that use each of the functions in file_parallel/examples. I've only testsd these on win7, 64 bit.
Mar 04 2012
On 3/4/12 2:53 PM, Jay Norwood wrote:I placed the two parallel file operations, rmdir and copy on github in https://github.com/jnorwood/file_parallel These combine the std.parallelism operations with the std.file operations to speed up the processing on Windows. ----------- I also put a useful function that does argv pathname wildcard expansion in https://github.com/jnorwood/file_utils This makes use of one of the existing dirEntries call that has the pattern matching parameter which enables simple * and ? expansions in windows args. I'm only allowing expansions in the basename, and only expanding in one level of the directory. There are example Windows commandline utilies that use each of the functions in file_parallel/examples. I've only testsd these on win7, 64 bit.Sounds great! Next step, should you be interested, is to create a pull request for phobos so we can integrate your code within. Andrei
Mar 05 2012
On Monday, 5 March 2012 at 12:48:54 UTC, Andrei Alexandrescu wrote:Sounds great! Next step, should you be interested, is to create a pull request for phobos so we can integrate your code within. AndreiI considered that. I suppose the wildArgv code could go in std.path, and the file operations into std.file. and the pull requests against those files. I haven't followed the discussions closely enough to know what are the rules/politics about adding another std library import into those. It would require adding import of std.parallelism into std.file.
Mar 05 2012
The rule of thumb is use double number of threads of the logical cores..use GS rich copy 360 enterprise..supports upto 256 threads at once..not sure about robocopy.
Feb 06 2018
Am 14.02.2012 05:58, schrieb Jay Norwood:Attached is the source for a small parallel app that copies a source folder to a destination. It creates the directory structure first using the breadth ordering, then uses a parallel foreach loop with the taskPool to copy all the regular files in parallel. On my corei7, this copied a 1.5GB folder with around 36K entries to a destination in about 11.5 secs (src and dest on the same ssd drive). This was about a second better than robocopy, which is the fastest alternative I could find. The regular win7-64 copy takes 41 secs for the same folder. I'd like to add wildcard processing for the sources, but haven't found a good example.do you compare single-threaded robocopy with your implementation or multithreaded? you can command robocopy to use multiple threads with /MT[:n]
Mar 05 2012
On Monday, 5 March 2012 at 16:35:09 UTC, dennis luehring wrote:do you compare single-threaded robocopy with your implementation or multithreaded? you can command robocopy to use multiple threads with /MT[:n]yes, I tested vs multithread robocopy. As someone pointed out, robocopy has lots of nice options, which I didn't try to duplicate, and is only about 10% slower on my test. I was happy to see the D app in the same ballpark as robocopy, which means to me that the very simple and clean std.parallism taskpool foreach loop can produce very good multi-core results in a very concise and readable piece of code. I've done some projects previously using omp pragmas in C++ and it is just so ugly.
Mar 05 2012
So here is the output of a batch file I just ran on the ssd drive
for the 1.5GB copy. Robocopy displays that it took around 14
secs, while the release build of the D commandline cpd utility
took around 12 secs. That's a pretty consistent result on the
ssd drive, which are more sensitive to cpu pr.
06:12 PM
H:\xx8>robocopy /E /NDL /NFL /NC /NS /MT:8 xx8c xx8ca
-------------------------------------------------------------------------------
ROBOCOPY :: Robust File Copy for Windows
-------------------------------------------------------------------------------
Started : Mon Mar 05 18:12:33 2012
Source : H:\xx8\xx8c\
Dest : H:\xx8\xx8ca\
Files : *.*
Options : *.* /NS /NC /NDL /NFL /S /E /COPY:DAT /MT:8
/R:1000000 /W:30
------------------------------------------------------------------------------
100%
------------------------------------------------------------------------------
Total Copied Skipped Mismatch FAILED
Extras
Dirs : 2627 2626 1 0 0
0
Files : 36969 36969 0 0 0
0
Bytes : 1.502 g 1.502 g 0 0 0
0
Times : 0:02:05 0:00:12 0:00:00
0:00:01
Ended : Mon Mar 05 18:12:47 2012
H:\xx8>time /T
06:12 PM
H:\xx8>rmd xx8ca\*
removing: xx8ca\Cross_Tools
removing: xx8ca\eclipse
removing: xx8ca\gnu
removing: xx8ca\PA
finished! time:17889 ms
H:\xx8>time /T
06:13 PM
H:\xx8>cpd xx8c\* xx8ca
copying: xx8c\Cross_Tools
copying: xx8c\eclipse
copying: xx8c\gnu
copying: xx8c\PA
finished! time: 11681 ms
H:\xx8>time /T
06:13 PM
btw, I just ran robocopy with /mt:1, and it took around 42
seconds on the same drive, which is about what I see with the
standard windows copy, including the gui copy. So, at least for
these ssd drives the parallel processing results in worthwhile
speed-ups.
Started : Mon Mar 05 18:24:31 2012
Ended : Mon Mar 05 18:25:13 2012
Mar 05 2012
On Tuesday, 6 March 2012 at 00:29:01 UTC, Jay Norwood wrote:
So here is the output of a batch file I just ran on the ssd
drive for the 1.5GB copy. Robocopy displays that it took
around 14 secs, while the release build of the D commandline
cpd utility took around 12 secs. That's a pretty consistent
result on the ssd drive, which are more sensitive to cpu pr.
06:12 PM
H:\xx8>robocopy /E /NDL /NFL /NC /NS /MT:8 xx8c xx8ca
-------------------------------------------------------------------------------
ROBOCOPY :: Robust File Copy for Windows
-------------------------------------------------------------------------------
Started : Mon Mar 05 18:12:33 2012
Source : H:\xx8\xx8c\
Dest : H:\xx8\xx8ca\
Files : *.*
Options : *.* /NS /NC /NDL /NFL /S /E /COPY:DAT /MT:8
/R:1000000 /W:30
------------------------------------------------------------------------------
100%
------------------------------------------------------------------------------
Total Copied Skipped Mismatch FAILED
Extras
Dirs : 2627 2626 1 0 0
0
Files : 36969 36969 0 0 0
0
Bytes : 1.502 g 1.502 g 0 0 0
0
Times : 0:02:05 0:00:12 0:00:00
0:00:01
Ended : Mon Mar 05 18:12:47 2012
H:\xx8>time /T
06:12 PM
H:\xx8>rmd xx8ca\*
removing: xx8ca\Cross_Tools
removing: xx8ca\eclipse
removing: xx8ca\gnu
removing: xx8ca\PA
finished! time:17889 ms
H:\xx8>time /T
06:13 PM
H:\xx8>cpd xx8c\* xx8ca
copying: xx8c\Cross_Tools
copying: xx8c\eclipse
copying: xx8c\gnu
copying: xx8c\PA
finished! time: 11681 ms
H:\xx8>time /T
06:13 PM
btw, I just ran robocopy with /mt:1, and it took around 42
seconds on the same drive, which is about what I see with the
standard windows copy, including the gui copy. So, at least
for these ssd drives the parallel processing results in
worthwhile speed-ups.
Started : Mon Mar 05 18:24:31 2012
Ended : Mon Mar 05 18:25:13 2012
Dec 11 2017
On Tuesday, 6 March 2012 at 00:29:01 UTC, Jay Norwood wrote:H:\xx8>robocopy /E /NDL /NFL /NC /NS /MT:8 xx8c xx8caFor a fair comparison, I think that's the command line: robocopy /E /NDL /NFL /NC /NS /MT:8 /COPY:D /NODCOPY /XJ /R:0 /W:0 /NP /NJH /NJS /256 xx8c xx8ca
Feb 07 2018









deadalnix <deadalnix gmail.com> 