digitalmars.D.learn - Performance Issue
- Vino.B (24/24) Sep 05 2017 Hi,
- Stefan Koch (2/6) Sep 05 2017 Much slower then ?
- Vino.B (39/47) Sep 06 2017 Hi,
- Azi Hassan (5/9) Sep 06 2017 Is the size in GB ? If so, then subdirTotalGB =
- Vino.B (48/57) Sep 06 2017 Hi Azi,
- Vino.B (11/61) Sep 06 2017 Hi Azi,
- Azi Hassan (90/101) Sep 06 2017 I tried to create a similar file structure on my Linux machine.
- Azi Hassan (2/18) Sep 06 2017
- Vino.B (36/56) Sep 06 2017 Hi Azi,
- Azi Hassan (50/53) Sep 05 2017 You can start by dropping the .array conversions after
- user1234 (3/27) Sep 06 2017 Try to suppress the globMatch. according to the glob, just a
Hi, The below code is consume more memory and slower can you provide your suggestion on how to over come these issues. string[][] csizeDirList (string FFs, int SizeDir) { ulong subdirTotal = 0; ulong subdirTotalGB; auto Subdata = appender!(string[][]); auto dFiles = dirEntries(FFs, SpanMode.shallow).filter!(a => a.isDir && !globMatch(a.baseName, "*DND*")).map!(a => tuple(a.name, a.size)).array; foreach (d; dFiles) { auto SdFiles = dirEntries(join(["\\\\?\\", d[0]]), SpanMode.depth).map!(a => tuple(a.size)).array; foreach (f; parallel(SdFiles,1)) { subdirTotal += f[0]; } subdirTotalGB = (subdirTotal/1024/1024); if (subdirTotalGB > SizeDir) { Subdata ~= [d[0], to!string(subdirTotalGB)]; } subdirTotal = 0; } return Subdata.data; } From, Vino.B
Sep 05 2017
On Tuesday, 5 September 2017 at 09:44:09 UTC, Vino.B wrote:Hi, The below code is consume more memory and slower can you provide your suggestion on how to over come these issues. [...]Much slower then ?
Sep 05 2017
On Tuesday, 5 September 2017 at 10:28:28 UTC, Stefan Koch wrote:On Tuesday, 5 September 2017 at 09:44:09 UTC, Vino.B wrote:Hi, This code is used to get the size of folders on a NetApp NAS Filesystem , so the NetApp have their own tool to perform such task which is faster than this code, the difference is about 15-20 mins. While going through this website i was able to findd that we can use the "fold" from std.algorithm.iteration which would be faster that use the normal "+=", so tried replacing the line "{ subdirTotal += f[0]; }" with { subdirTotal = f[0].fold!( (a, b) => a + b); }, and this produces the required output+ additional output , in the next line of the code i say to list only folders that are greater than 10 Mb but this now is listing all folder (folder whose size is less than 10 MB are getting listed, not sure why. Program: string[][] coSizeDirList (string FFs, int SizeDir) { ulong subdirTotal = 0; ulong subdirTotalGB; auto Subdata = appender!(string[][]); Subdata.reserve(100); auto dFiles = dirEntries(FFs, SpanMode.shallow).filter!(a => a.isDir && !globMatch(a.baseName, "*DND*")).map!(a => tuple(a.name, a.size)).array; foreach (d; dFiles) { auto SdFiles = dirEntries(join(["\\\\?\\", d[0]]), SpanMode.depth).map!(a => tuple(a.size)).array; foreach (f; parallel(SdFiles,1)) { subdirTotal = f[0].fold!( (a, b) => a + b); } subdirTotalGB = (subdirTotal/1024/1024); if (subdirTotalGB > SizeDir) { Subdata ~= [d[0], to!string(subdirTotalGB)]; } subdirTotal = 0; } return Subdata.data; } OutPut C:\Temp\TEAM1\dir1 - > Sieze greater than 10MB C:\Temp\TEAM1\dir2 -> Size lesser than 10MB. From, Vino.BHi, The below code is consume more memory and slower can you provide your suggestion on how to over come these issues. [...]Much slower then ?
Sep 06 2017
On Wednesday, 6 September 2017 at 08:10:35 UTC, Vino.B wrote:in the next line of the code i say to list only folders that are greater than 10 Mb but this now is listing all folder (folder whose size is less than 10 MB are getting listed, not sure why.Is the size in GB ? If so, then subdirTotalGB = (subdirTotal/1024/1024); needs to become subdirTotalGB = (subdirTotal/1024/1024/1024); for it to take effect. But do correct me if I'm wrong, I still haven't had my morning coffee.
Sep 06 2017
On Wednesday, 6 September 2017 at 10:58:25 UTC, Azi Hassan wrote:On Wednesday, 6 September 2017 at 08:10:35 UTC, Vino.B wrote:Hi Azi, Your are correct, i tried to implement the fold in a separate small program as below, but not able to get the the required output, when you execute the below program the output you get is as below Output: [31460] [31460, 1344448] [31460, 1344448, 2277663] [31460, 1344448, 2277663, 2277663] [31460, 1344448, 2277663, 2277663, 31460] Setup: C:\\Temp\\TEST1\\BACKUP : This has 2 folder and 2 files in each folder C:\\Temp\\TEST2\\EXPORT : This has 2 folder and 2 files in one folder and 1 file in another folder Total files : 5 Required output: [31460, 1344448] - Array 1 for the FS C:\\Temp\\TEST1\\BACKUP [2277663, 2277663, 31460] - Array 2 for the C:\\Temp\\TEST2\\EXPORT import std.algorithm: filter, map, fold; import std.parallelism: parallel; import std.file: SpanMode, dirEntries, isDir; import std.stdio: writeln; import std.typecons: tuple; import std.path: globMatch; import std.array; void main () { ulong[] Alternate; string[] Filesys = ["C:\\Temp\\TEST1\\BACKUP", "C:\\Temp\\TEST2\\EXPORT"]; foreach(FFs; Filesys) { auto dFiles = dirEntries(FFs, SpanMode.shallow).filter!(a => a.isDir).map!(a => tuple(a.name, a.size)).array; foreach (d; dFiles) { auto SdFiles = dirEntries(join(["\\\\?\\", d[0]]), SpanMode.depth).map!(a => tuple(a.size)).array; foreach (f; parallel(SdFiles,1)) { Alternate ~= f[0]; writeln(Alternate); } } } } From, Vino.Bin the next line of the code i say to list only folders that are greater than 10 Mb but this now is listing all folder (folder whose size is less than 10 MB are getting listed, not sure why.Is the size in GB ? If so, then subdirTotalGB = (subdirTotal/1024/1024); needs to become subdirTotalGB = (subdirTotal/1024/1024/1024); for it to take effect. But do correct me if I'm wrong, I still haven't had my morning coffee.
Sep 06 2017
On Wednesday, 6 September 2017 at 14:38:39 UTC, Vino.B wrote:On Wednesday, 6 September 2017 at 10:58:25 UTC, Azi Hassan wrote:Hi Azi, The required out is like below [31460] - Array 1 for folder 1(all files in Folder 1) of the FS C:\\Temp\\TEST1\\BACKUP [1344448] - Array 2 for folder 2(all files in Folder 2) of the FS C:\\Temp\\TEST1\\BACKUP [2277663, 2277663] - Array 3 for folder 1(all files in Folder 1) of the FS C:\\Temp\\TEST2\\EXPOR [31460] - Array 4 for folder 2(all files in Folder 2) the FS C:\\Temp\\TEST2\\EXPORT[...]Hi Azi, Your are correct, i tried to implement the fold in a separate small program as below, but not able to get the the required output, when you execute the below program the output you get is as below Output: [31460] [31460, 1344448] [31460, 1344448, 2277663] [31460, 1344448, 2277663, 2277663] [31460, 1344448, 2277663, 2277663, 31460] Setup: C:\\Temp\\TEST1\\BACKUP : This has 2 folder and 2 files in each folder C:\\Temp\\TEST2\\EXPORT : This has 2 folder and 2 files in one folder and 1 file in another folder Total files : 5 Required output: [31460, 1344448] - Array 1 for the FS C:\\Temp\\TEST1\\BACKUP [2277663, 2277663, 31460] - Array 2 for the C:\\Temp\\TEST2\\EXPORT import std.algorithm: filter, map, fold; import std.parallelism: parallel; import std.file: SpanMode, dirEntries, isDir; import std.stdio: writeln; import std.typecons: tuple; import std.path: globMatch; import std.array; void main () { ulong[] Alternate; string[] Filesys = ["C:\\Temp\\TEST1\\BACKUP", "C:\\Temp\\TEST2\\EXPORT"]; foreach(FFs; Filesys) { auto dFiles = dirEntries(FFs, SpanMode.shallow).filter!(a => a.isDir).map!(a => tuple(a.name, a.size)).array; foreach (d; dFiles) { auto SdFiles = dirEntries(join(["\\\\?\\", d[0]]), SpanMode.depth).map!(a => tuple(a.size)).array; foreach (f; parallel(SdFiles,1)) { Alternate ~= f[0]; writeln(Alternate); } } } } From, Vino.B
Sep 06 2017
On Wednesday, 6 September 2017 at 15:11:57 UTC, Vino.B wrote:On Wednesday, 6 September 2017 at 14:38:39 UTC, Vino.B wrote: Hi Azi, The required out is like below [31460] - Array 1 for folder 1(all files in Folder 1) of the FS C:\\Temp\\TEST1\\BACKUP [1344448] - Array 2 for folder 2(all files in Folder 2) of the FS C:\\Temp\\TEST1\\BACKUP [2277663, 2277663] - Array 3 for folder 1(all files in Folder 1) of the FS C:\\Temp\\TEST2\\EXPOR [31460] - Array 4 for folder 2(all files in Folder 2) the FS C:\\Temp\\TEST2\\EXPORTI tried to create a similar file structure on my Linux machine. Here's the result of ls -R TEST1: TEST1: BACKUP TEST1/BACKUP: FOLDER1 FOLDER2 TEST1/BACKUP/FOLDER1: file1 file2 file3 TEST1/BACKUP/FOLDER2: b1 b2 And here's the output of ls -R TEST2 : TEST2: EXPORT TEST2/EXPORT: FOLDER1 FOLDER2 TEST2/EXPORT/FOLDER1: file2_1 file2_2 file2_3 TEST2/EXPORT/FOLDER2: export1 export2 export3 export4 This codes output the sizes in the format you described : import std.algorithm: filter, map, fold, each; import std.parallelism: parallel; import std.file: SpanMode, dirEntries, DirEntry; import std.stdio: writeln; import std.typecons: tuple; import std.path: globMatch; import std.array; void main () { auto Filesys = ["TEST1/BACKUP", "TEST2/EXPORT"]; ulong[][] sizes; foreach(FFs; Filesys) { auto dFiles = dirEntries(FFs, SpanMode.shallow).filter!(a => a.isDir).map!(a => a.name); foreach (d; dFiles) { sizes ~= dirEntries(d, SpanMode.depth).map!(a => a.size).array; } } sizes.each!writeln; } It outputs the sizes : [6, 6, 6] [8, 8] [8, 8, 8] [9, 9, 9, 9] Note that there's no need to store them in ulong[][] sizes, you can display them inside the loop by replacing `sizes ~= dirEntries(d, SpanMode.depth).map!(a => a.size).array;` with `dirEntries(d, SpanMode.depth).map!(a => a.size).joiner(", ").writeln;` To make sure that it calculates the correct sizes, I made it display the paths instead by making "sizes" string[][] instead of ulong[][] and by replacing map!(a => a.size) with map!(a => a.name) in the second foreach loop : import std.algorithm: filter, map, each; import std.file: SpanMode, dirEntries, DirEntry; import std.stdio: writeln; import std.array : array; void main () { auto Filesys = ["TEST1/BACKUP", "TEST2/EXPORT"]; string[][] sizes; foreach(FFs; Filesys) { auto dFiles = dirEntries(FFs, SpanMode.shallow).filter!(a => a.isDir).map!(a => a.name); foreach (d; dFiles) { sizes ~= dirEntries(d, SpanMode.depth).map!(a => a.name).array; } } sizes.each!writeln; } It outputs the paths as expected : ["TEST1/BACKUP/FOLDER1/file1", "TEST1/BACKUP/FOLDER1/file2", "TEST1/BACKUP/FOLDER1/file3"] ["TEST1/BACKUP/FOLDER2/b1", "TEST1/BACKUP/FOLDER2/b2"] ["TEST2/EXPORT/FOLDER1/file2_3", "TEST2/EXPORT/FOLDER1/file2_1", "TEST2/EXPORT/FOLDER1/file2_2"] ["TEST2/EXPORT/FOLDER2/export2", "TEST2/EXPORT/FOLDER2/export3", "TEST2/EXPORT/FOLDER2/export1", "TEST2/EXPORT/FOLDER2/export4"]
Sep 06 2017
On Wednesday, 6 September 2017 at 18:21:44 UTC, Azi Hassan wrote:I tried to create a similar file structure on my Linux machine. Here's the result of ls -R TEST1: TEST1: BACKUP ...Upon further inspection it looks like I messed up the output.[31460] - Array 1 for folder 1(all files in Folder 1) of the FS C:\\Temp\\TEST1\\BACKUP [1344448] - Array 2 for folder 2(all files in Folder 2) of the FS C:\\Temp\\TEST1\\BACKUP [2277663, 2277663] - Array 3 for folder 1(all files in Folder 1) of the FS C:\\Temp\\TEST2\\EXPOR [31460] - Array 4 for folder 2(all files in Folder 2) the FS C:\\Temp\\TEST2\\EXPORTWhat files do these sizes correspond to ? Shouldn't there be two elements in the first array because C:\Temp\TEST1\BACKUP\FOLDER1 contains two files ?
Sep 06 2017
On Wednesday, 6 September 2017 at 18:44:26 UTC, Azi Hassan wrote:On Wednesday, 6 September 2017 at 18:21:44 UTC, Azi Hassan wrote:Hi Azi, Was able to implement "fold", below is the update code, regarding container array, I have almost completed my program(Release 1), so it is not a good idea to convert the program from standard array to container array at this point. Some staring tomorrow i would be working on(Release 2) where in this release i plan to make the above changes. I have not reached my study on container array, so can you help me on how to implement the container array for the below code. Note : I have raised another thread "Container Array" asking the same. string[][] coSizeDirList (string FFs, int SizeDir) { ulong subdirTotal = 0; ulong subdirTotalGB; auto Subdata = appender!(string[][]); Subdata.reserve(100); auto dFiles = dirEntries(FFs, SpanMode.shallow).filter!(a => a.isDir).map!(a => tuple(a.name, a.size)).array; foreach (d; dFiles) { auto SdFiles = dirEntries(join(["\\\\?\\", d[0]]), SpanMode.depth).map!(a => tuple(a.size)).array; foreach(f; parallel(SdFiles, 1)) { subdirTotal += f.fold!((a, b) => a + b); } subdirTotalGB = (subdirTotal/1024/1024); if (subdirTotalGB > SizeDir) { Subdata ~= [d[0], to!string(subdirTotalGB)]; } subdirTotal = 0; } return Subdata.data; } Note To All : I am basically a Admin guy, so started learning D a few months ago and found it very interesting, hence i raise so many question, so request you to adjust with me for a while. From, Vino.BI tried to create a similar file structure on my Linux machine. Here's the result of ls -R TEST1: TEST1: BACKUP ...Upon further inspection it looks like I messed up the output.[31460] - Array 1 for folder 1(all files in Folder 1) of the FS C:\\Temp\\TEST1\\BACKUP [1344448] - Array 2 for folder 2(all files in Folder 2) of the FS C:\\Temp\\TEST1\\BACKUP [2277663, 2277663] - Array 3 for folder 1(all files in Folder 1) of the FS C:\\Temp\\TEST2\\EXPOR [31460] - Array 4 for folder 2(all files in Folder 2) the FS C:\\Temp\\TEST2\\EXPORTWhat files do these sizes correspond to ? Shouldn't there be two elements in the first array because C:\Temp\TEST1\BACKUP\FOLDER1 contains two files ?
Sep 06 2017
On Tuesday, 5 September 2017 at 09:44:09 UTC, Vino.B wrote:Hi, The below code is consume more memory and slower can you provide your suggestion on how to over come these issues.You can start by dropping the .array conversions after dirEntries. That way your algorithm will become lazy (as opposed to eager), meaning that it won't allocate an entire array of DirEntry[]. It will, instead, treat the DirEntries one at a time, resulting in less memory consumption. I didn't understand the join(["\\\\?\\", d[0]]) part, maybe you meant to write join("\\\\?\\", d[0]) ? If appender is too slow, you can experiment with a dynamic array whose capacity was preallocated : string[][] Subdata; Subdata.reserve(10000); In this case Subdata will hold enough space for 10000 string[]s, which will result in better performance. Here's the updated code (sans .array) in case any one wants to reproduce the issue : import std.stdio; import std.conv; import std.typecons; import std.array; import std.path; import std.container; import std.file; import std.parallelism; import std.algorithm; void main() { ".".csizeDirList(1024).each!writeln; } string[][] csizeDirList (string FFs, int SizeDir) { ulong subdirTotal = 0; ulong subdirTotalGB; auto Subdata = appender!(string[][]); auto dFiles = dirEntries(FFs, SpanMode.shallow) .filter!(a => a.isDir && !globMatch(a.baseName, "*DND*")) .map!(a => tuple(a.name, a.size)); foreach (d; dFiles) { auto SdFiles = dirEntries(join(["\\\\?\\", d[0]]), SpanMode.depth) .map!(a => tuple(a.size)); foreach (f; parallel(SdFiles,1)) { subdirTotal += f[0]; } subdirTotalGB = (subdirTotal/1024/1024); if (subdirTotalGB > SizeDir) { Subdata ~= [d[0], to!string(subdirTotalGB)]; } subdirTotal = 0; } return Subdata.data; }
Sep 05 2017
On Tuesday, 5 September 2017 at 09:44:09 UTC, Vino.B wrote:Hi, The below code is consume more memory and slower can you provide your suggestion on how to over come these issues. string[][] csizeDirList (string FFs, int SizeDir) { ulong subdirTotal = 0; ulong subdirTotalGB; auto Subdata = appender!(string[][]); auto dFiles = dirEntries(FFs, SpanMode.shallow).filter!(a => a.isDir && !globMatch(a.baseName, "*DND*")).map!(a => tuple(a.name, a.size)).array; foreach (d; dFiles) { auto SdFiles = dirEntries(join(["\\\\?\\", d[0]]), SpanMode.depth).map!(a => tuple(a.size)).array; foreach (f; parallel(SdFiles,1)) { subdirTotal += f[0]; } subdirTotalGB = (subdirTotal/1024/1024); if (subdirTotalGB > SizeDir) { Subdata ~= [d[0], to!string(subdirTotalGB)]; } subdirTotal = 0; } return Subdata.data; } From, Vino.BTry to suppress the globMatch. according to the glob, just a ctRegex would do the job or even more simple `!a.canFind("DND")`.
Sep 06 2017