digitalmars.D.learn - Performance Issue

Vino.B (24/24) Sep 05 2017 Hi,

Stefan Koch (2/6) Sep 05 2017 Much slower then ?

Vino.B (39/47) Sep 06 2017 Hi,

Azi Hassan (5/9) Sep 06 2017 Is the size in GB ? If so, then subdirTotalGB =

Vino.B (48/57) Sep 06 2017 Hi Azi,

Vino.B (11/61) Sep 06 2017 Hi Azi,

Azi Hassan (90/101) Sep 06 2017 I tried to create a similar file structure on my Linux machine.

Azi Hassan (2/18) Sep 06 2017

Vino.B (36/56) Sep 06 2017 Hi Azi,

Azi Hassan (50/53) Sep 05 2017 You can start by dropping the .array conversions after
user1234 (3/27) Sep 06 2017 Try to suppress the globMatch. according to the glob, just a

Vino.B <vino.bheeman hotmail.com> writes:

Hi,

  The below code is consume more memory and slower can you provide 
your suggestion on how to over come these issues.

string[][] csizeDirList (string FFs, int SizeDir) {
	ulong subdirTotal = 0;
	ulong subdirTotalGB;
	auto Subdata = appender!(string[][]);
     auto dFiles = dirEntries(FFs, SpanMode.shallow).filter!(a => 
a.isDir && !globMatch(a.baseName, "*DND*")).map!(a => 
tuple(a.name, a.size)).array;
	  foreach (d; dFiles) {
				auto SdFiles = dirEntries(join(["\\\\?\\", d[0]]), 
SpanMode.depth).map!(a => tuple(a.size)).array;
				foreach (f; parallel(SdFiles,1))
					{ subdirTotal += f[0]; }
						subdirTotalGB = (subdirTotal/1024/1024);
						if (subdirTotalGB > SizeDir) { Subdata ~= [d[0], 
to!string(subdirTotalGB)]; }
						subdirTotal = 0;
		    }
			return Subdata.data;
}

From,
Vino.B

Sep 05 2017

Stefan Koch <uplink.coder googlemail.com> writes:

On Tuesday, 5 September 2017 at 09:44:09 UTC, Vino.B wrote:
 Hi,

  The below code is consume more memory and slower can you 
 provide your suggestion on how to over come these issues.

 [...]

Much slower then ?

Sep 05 2017

Vino.B <vino.bheeman hotmail.com> writes:

On Tuesday, 5 September 2017 at 10:28:28 UTC, Stefan Koch wrote:
 On Tuesday, 5 September 2017 at 09:44:09 UTC, Vino.B wrote:
 Hi,

  The below code is consume more memory and slower can you 
 provide your suggestion on how to over come these issues.

 [...]

 Much slower then ?

Hi,

   This code is used to get the size of folders on a NetApp NAS 
Filesystem , so the NetApp have their own tool to perform such 
task which is faster than this code, the difference is about 
15-20 mins. While going through this website i was able to findd 
that we can use the "fold" from std.algorithm.iteration which 
would be faster that use the normal "+=", so tried replacing the 
line "{ subdirTotal += f[0]; }" with { subdirTotal = f[0].fold!( 
(a, b) => a + b); }, and this produces the required output+ 
additional output , in the next line of the code i say to list 
only folders that are greater than 10 Mb but this now is listing 
all folder (folder whose size is less than 10 MB are getting 
listed, not sure why.

Program:
string[][] coSizeDirList (string FFs, int SizeDir) {
	ulong subdirTotal = 0;
	ulong subdirTotalGB;
	auto Subdata = appender!(string[][]); Subdata.reserve(100);
     auto dFiles = dirEntries(FFs, SpanMode.shallow).filter!(a => 
a.isDir && !globMatch(a.baseName, "*DND*")).map!(a => 
tuple(a.name, a.size)).array;
	  foreach (d; dFiles) {
				auto SdFiles = dirEntries(join(["\\\\?\\", d[0]]), 
SpanMode.depth).map!(a => tuple(a.size)).array;
				foreach (f; parallel(SdFiles,1))
					{ subdirTotal = f[0].fold!( (a, b) => a + b); }
						subdirTotalGB = (subdirTotal/1024/1024);
						if (subdirTotalGB > SizeDir) { Subdata ~= [d[0], 
to!string(subdirTotalGB)]; }
						subdirTotal = 0;
		    }
			return Subdata.data;
}

OutPut
C:\Temp\TEAM1\dir1 - > Sieze greater than 10MB
C:\Temp\TEAM1\dir2  -> Size lesser than 10MB.

From,
Vino.B

Sep 06 2017

Azi Hassan <azi.hassan live.fr> writes:

On Wednesday, 6 September 2017 at 08:10:35 UTC, Vino.B wrote:
 in the next line of the code i say to list only folders that 
 are greater than 10 Mb but this now is listing all folder 
 (folder whose size is less than 10 MB are getting listed, not 
 sure why.

Is the size in GB ? If so, then subdirTotalGB = 
(subdirTotal/1024/1024); needs to become subdirTotalGB = 
(subdirTotal/1024/1024/1024); for it to take effect. But do 
correct me if I'm wrong, I still haven't had my morning coffee.

Sep 06 2017

Vino.B <vino.bheeman hotmail.com> writes:

On Wednesday, 6 September 2017 at 10:58:25 UTC, Azi Hassan wrote:
 On Wednesday, 6 September 2017 at 08:10:35 UTC, Vino.B wrote:
 in the next line of the code i say to list only folders that 
 are greater than 10 Mb but this now is listing all folder 
 (folder whose size is less than 10 MB are getting listed, not 
 sure why.

 Is the size in GB ? If so, then subdirTotalGB = 
 (subdirTotal/1024/1024); needs to become subdirTotalGB = 
 (subdirTotal/1024/1024/1024); for it to take effect. But do 
 correct me if I'm wrong, I still haven't had my morning coffee.

Hi Azi,

   Your are correct, i tried to implement the fold in a separate 
small program as below, but not able to get the the required 
output, when you execute the below program the output you get is 
as below

Output:
[31460]
[31460, 1344448]
[31460, 1344448, 2277663]
[31460, 1344448, 2277663, 2277663]
[31460, 1344448, 2277663, 2277663, 31460]

Setup:

C:\\Temp\\TEST1\\BACKUP : This has 2 folder and 2 files in each 
folder
C:\\Temp\\TEST2\\EXPORT : This has 2 folder and 2 files in one 
folder and  1 file in another folder

Total files : 5

Required output:
[31460, 1344448] - Array 1 for the FS C:\\Temp\\TEST1\\BACKUP
[2277663, 2277663, 31460] - Array 2 for the 
C:\\Temp\\TEST2\\EXPORT

import std.algorithm: filter, map, fold;
import std.parallelism: parallel;
import std.file: SpanMode, dirEntries, isDir;
import std.stdio: writeln;
import std.typecons: tuple;
import std.path: globMatch;
import std.array;
void main () {
	ulong[] Alternate;
	string[] Filesys = ["C:\\Temp\\TEST1\\BACKUP", 
"C:\\Temp\\TEST2\\EXPORT"];
	foreach(FFs; Filesys)
		{
			auto dFiles = dirEntries(FFs, SpanMode.shallow).filter!(a => 
a.isDir).map!(a => tuple(a.name, a.size)).array;
			foreach (d; dFiles) {
								auto SdFiles = dirEntries(join(["\\\\?\\", d[0]]), 
SpanMode.depth).map!(a => tuple(a.size)).array;
								foreach (f; parallel(SdFiles,1)) {
								Alternate ~= f[0]; writeln(Alternate);
																}
								}
		}
}

From,
Vino.B

Sep 06 2017

Vino.B <vino.bheeman hotmail.com> writes:

On Wednesday, 6 September 2017 at 14:38:39 UTC, Vino.B wrote:
 On Wednesday, 6 September 2017 at 10:58:25 UTC, Azi Hassan 
 wrote:
 [...]

 Hi Azi,

   Your are correct, i tried to implement the fold in a separate 
 small program as below, but not able to get the the required 
 output, when you execute the below program the output you get 
 is as below

 Output:
 [31460]
 [31460, 1344448]
 [31460, 1344448, 2277663]
 [31460, 1344448, 2277663, 2277663]
 [31460, 1344448, 2277663, 2277663, 31460]

 Setup:

 C:\\Temp\\TEST1\\BACKUP : This has 2 folder and 2 files in each 
 folder
 C:\\Temp\\TEST2\\EXPORT : This has 2 folder and 2 files in one 
 folder and  1 file in another folder

 Total files : 5

 Required output:
 [31460, 1344448] - Array 1 for the FS C:\\Temp\\TEST1\\BACKUP
 [2277663, 2277663, 31460] - Array 2 for the 
 C:\\Temp\\TEST2\\EXPORT

 import std.algorithm: filter, map, fold;
 import std.parallelism: parallel;
 import std.file: SpanMode, dirEntries, isDir;
 import std.stdio: writeln;
 import std.typecons: tuple;
 import std.path: globMatch;
 import std.array;
 void main () {
 	ulong[] Alternate;
 	string[] Filesys = ["C:\\Temp\\TEST1\\BACKUP", 
 "C:\\Temp\\TEST2\\EXPORT"];
 	foreach(FFs; Filesys)
 		{
 			auto dFiles = dirEntries(FFs, SpanMode.shallow).filter!(a => 
 a.isDir).map!(a => tuple(a.name, a.size)).array;
 			foreach (d; dFiles) {
 								auto SdFiles = dirEntries(join(["\\\\?\\", d[0]]), 
 SpanMode.depth).map!(a => tuple(a.size)).array;
 								foreach (f; parallel(SdFiles,1)) {
 								Alternate ~= f[0]; writeln(Alternate);
 																}
 								}
 		}
 }

 From,
 Vino.B

Hi Azi,

   The required out is like below

[31460]  - Array 1 for folder 1(all files in Folder 1) of the FS 
C:\\Temp\\TEST1\\BACKUP
[1344448]  - Array 2 for folder 2(all files in Folder 2) of the 
FS C:\\Temp\\TEST1\\BACKUP
[2277663, 2277663]  - Array 3 for folder 1(all files in Folder 1) 
of the FS C:\\Temp\\TEST2\\EXPOR
[31460] - Array 4 for folder 2(all files in Folder 2) the FS 
C:\\Temp\\TEST2\\EXPORT

Sep 06 2017

Azi Hassan <azi.hassan live.fr> writes:

On Wednesday, 6 September 2017 at 15:11:57 UTC, Vino.B wrote:
 On Wednesday, 6 September 2017 at 14:38:39 UTC, Vino.B wrote:
 Hi Azi,

   The required out is like below

 [31460]  - Array 1 for folder 1(all files in Folder 1) of the 
 FS C:\\Temp\\TEST1\\BACKUP
 [1344448]  - Array 2 for folder 2(all files in Folder 2) of the 
 FS C:\\Temp\\TEST1\\BACKUP
 [2277663, 2277663]  - Array 3 for folder 1(all files in Folder 
 1) of the FS C:\\Temp\\TEST2\\EXPOR
 [31460] - Array 4 for folder 2(all files in Folder 2) the FS 
 C:\\Temp\\TEST2\\EXPORT

I tried to create a similar file structure on my Linux machine. 
Here's the result of ls -R TEST1:

TEST1:
BACKUP

TEST1/BACKUP:
FOLDER1
FOLDER2

TEST1/BACKUP/FOLDER1:
file1
file2
file3

TEST1/BACKUP/FOLDER2:
b1
b2

And here's the output of ls -R TEST2 :

TEST2:
EXPORT

TEST2/EXPORT:
FOLDER1
FOLDER2

TEST2/EXPORT/FOLDER1:
file2_1
file2_2
file2_3

TEST2/EXPORT/FOLDER2:
export1
export2
export3
export4

This codes output the sizes in the format you described :

import std.algorithm: filter, map, fold, each;
import std.parallelism: parallel;
import std.file: SpanMode, dirEntries, DirEntry;
import std.stdio: writeln;
import std.typecons: tuple;
import std.path: globMatch;
import std.array;

void main () {
	auto Filesys = ["TEST1/BACKUP", "TEST2/EXPORT"];
	ulong[][] sizes;
	foreach(FFs; Filesys)
	{
		auto dFiles = dirEntries(FFs, SpanMode.shallow).filter!(a => 
a.isDir).map!(a => a.name);
		foreach (d; dFiles) {
			sizes ~= dirEntries(d, SpanMode.depth).map!(a => a.size).array;
		}
	}
	sizes.each!writeln;
}

It outputs the sizes :

[6, 6, 6]
[8, 8]
[8, 8, 8]
[9, 9, 9, 9]

Note that there's no need to store them in ulong[][] sizes, you 
can display them inside the loop by replacing `sizes ~= 
dirEntries(d, SpanMode.depth).map!(a => a.size).array;` with 
`dirEntries(d, SpanMode.depth).map!(a => a.size).joiner(", 
").writeln;`

To make sure that it calculates the correct sizes, I made it 
display the paths instead by making "sizes" string[][] instead of 
ulong[][] and by replacing map!(a => a.size) with map!(a => 
a.name) in the second foreach loop :

import std.algorithm: filter, map, each;
import std.file: SpanMode, dirEntries, DirEntry;
import std.stdio: writeln;
import std.array : array;

void main () {
	auto Filesys = ["TEST1/BACKUP", "TEST2/EXPORT"];
	string[][] sizes;
	foreach(FFs; Filesys)
	{
		auto dFiles = dirEntries(FFs, SpanMode.shallow).filter!(a => 
a.isDir).map!(a => a.name);
		foreach (d; dFiles) {
			sizes ~= dirEntries(d, SpanMode.depth).map!(a => a.name).array;
		}
	}
	sizes.each!writeln;
}

It outputs the paths as expected :

["TEST1/BACKUP/FOLDER1/file1", "TEST1/BACKUP/FOLDER1/file2", 
"TEST1/BACKUP/FOLDER1/file3"]
["TEST1/BACKUP/FOLDER2/b1", "TEST1/BACKUP/FOLDER2/b2"]
["TEST2/EXPORT/FOLDER1/file2_3", "TEST2/EXPORT/FOLDER1/file2_1", 
"TEST2/EXPORT/FOLDER1/file2_2"]
["TEST2/EXPORT/FOLDER2/export2", "TEST2/EXPORT/FOLDER2/export3", 
"TEST2/EXPORT/FOLDER2/export1", "TEST2/EXPORT/FOLDER2/export4"]

Sep 06 2017

Azi Hassan <azi.hassan live.fr> writes:

On Wednesday, 6 September 2017 at 18:21:44 UTC, Azi Hassan wrote:
 I tried to create a similar file structure on my Linux machine. 
 Here's the result of ls -R TEST1:

 TEST1:
 BACKUP
...

Upon further inspection it looks like I messed up the output.

 [31460]  - Array 1 for folder 1(all files in Folder 1) of the 
 FS C:\\Temp\\TEST1\\BACKUP
 [1344448]  - Array 2 for folder 2(all files in Folder 2) of the 
 FS C:\\Temp\\TEST1\\BACKUP
 [2277663, 2277663]  - Array 3 for folder 1(all files in Folder 
 1) of the FS C:\\Temp\\TEST2\\EXPOR
 [31460] - Array 4 for folder 2(all files in Folder 2) the FS 
 C:\\Temp\\TEST2\\EXPORT

What files do these sizes correspond to ? Shouldn't there be two 
elements in the first array because C:\Temp\TEST1\BACKUP\FOLDER1 
contains two files ?

Sep 06 2017

Vino.B <vino.bheeman hotmail.com> writes:

On Wednesday, 6 September 2017 at 18:44:26 UTC, Azi Hassan wrote:
 On Wednesday, 6 September 2017 at 18:21:44 UTC, Azi Hassan 
 wrote:
 I tried to create a similar file structure on my Linux 
 machine. Here's the result of ls -R TEST1:

 TEST1:
 BACKUP
...

 Upon further inspection it looks like I messed up the output.

 [31460]  - Array 1 for folder 1(all files in Folder 1) of the 
 FS C:\\Temp\\TEST1\\BACKUP
 [1344448]  - Array 2 for folder 2(all files in Folder 2) of 
 the FS C:\\Temp\\TEST1\\BACKUP
 [2277663, 2277663]  - Array 3 for folder 1(all files in Folder 
 1) of the FS C:\\Temp\\TEST2\\EXPOR
 [31460] - Array 4 for folder 2(all files in Folder 2) the FS 
 C:\\Temp\\TEST2\\EXPORT

What files do these sizes correspond to ? Shouldn't there be 
two elements in the first array because 
C:\Temp\TEST1\BACKUP\FOLDER1 contains two files ?


Hi Azi,

  Was able to implement "fold", below is the update code, 
regarding container array, I have almost completed my 
program(Release 1), so it is not a good idea to convert the 
program from standard array to container array at this point. 
Some staring tomorrow i would be working on(Release 2) where in 
this release i plan to make the above changes. I have not reached 
my study on container array, so can you help me on how to 
implement the container array for the below code.
Note : I have raised another thread "Container Array" asking the 
same.

string[][] coSizeDirList (string FFs, int SizeDir) {
	ulong subdirTotal = 0;
	ulong subdirTotalGB;
	auto Subdata = appender!(string[][]); Subdata.reserve(100);
     auto dFiles = dirEntries(FFs, SpanMode.shallow).filter!(a => 
a.isDir).map!(a => tuple(a.name, a.size)).array;
     foreach (d; dFiles) {
				auto SdFiles = dirEntries(join(["\\\\?\\", d[0]]), 
SpanMode.depth).map!(a => tuple(a.size)).array;
				foreach(f; parallel(SdFiles, 1)) { subdirTotal += f.fold!((a, 
b) => a + b); }
					subdirTotalGB = (subdirTotal/1024/1024);
					if (subdirTotalGB > SizeDir) { Subdata ~= [d[0], 
to!string(subdirTotalGB)]; }
					 subdirTotal = 0;
		    }
			return Subdata.data;
}

Note To All :

I am basically a Admin guy, so started learning D a few months 
ago and found it very interesting, hence i raise so many 
question, so request you to adjust with me for a while.


From,
Vino.B

Sep 06 2017

Azi Hassan <azi.hassan live.fr> writes:

On Tuesday, 5 September 2017 at 09:44:09 UTC, Vino.B wrote:
 Hi,

  The below code is consume more memory and slower can you 
 provide your suggestion on how to over come these issues.

You can start by dropping the .array conversions after 
dirEntries. That way your algorithm will become lazy (as opposed 
to eager), meaning that it won't allocate an entire array of 
DirEntry[]. It will, instead, treat the DirEntries one at a time, 
resulting in less memory consumption.

I didn't understand the join(["\\\\?\\", d[0]]) part, maybe you 
meant to write join("\\\\?\\", d[0]) ?

If appender is too slow, you can experiment with a dynamic array 
whose capacity was preallocated : string[][] Subdata; 
Subdata.reserve(10000); In this case Subdata will hold enough 
space for 10000 string[]s, which will result in better 
performance.

Here's the updated code (sans .array) in case any one wants to 
reproduce the issue :

import std.stdio;
import std.conv;
import std.typecons;
import std.array;
import std.path;
import std.container;
import std.file;
import std.parallelism;
import std.algorithm;

void main()
{
	".".csizeDirList(1024).each!writeln;
}

string[][] csizeDirList (string FFs, int SizeDir) {
	ulong subdirTotal = 0;
	ulong subdirTotalGB;
	auto Subdata = appender!(string[][]);
	auto dFiles = dirEntries(FFs, SpanMode.shallow)
		.filter!(a => a.isDir && !globMatch(a.baseName, "*DND*"))
		.map!(a => tuple(a.name, a.size));

	foreach (d; dFiles) {
		auto SdFiles = dirEntries(join(["\\\\?\\", d[0]]), 
SpanMode.depth)
			.map!(a => tuple(a.size));
		foreach (f; parallel(SdFiles,1)) {
			subdirTotal += f[0];
		}
		subdirTotalGB = (subdirTotal/1024/1024);
		if (subdirTotalGB > SizeDir) {
			Subdata ~= [d[0], to!string(subdirTotalGB)];
		}
		subdirTotal = 0;
	}
	return Subdata.data;
}

Sep 05 2017

user1234 <user1234 12.hu> writes:

On Tuesday, 5 September 2017 at 09:44:09 UTC, Vino.B wrote:
 Hi,

  The below code is consume more memory and slower can you 
 provide your suggestion on how to over come these issues.

 string[][] csizeDirList (string FFs, int SizeDir) {
 	ulong subdirTotal = 0;
 	ulong subdirTotalGB;
 	auto Subdata = appender!(string[][]);
     auto dFiles = dirEntries(FFs, SpanMode.shallow).filter!(a 
 => a.isDir && !globMatch(a.baseName, "*DND*")).map!(a => 
 tuple(a.name, a.size)).array;
 	  foreach (d; dFiles) {
 				auto SdFiles = dirEntries(join(["\\\\?\\", d[0]]), 
 SpanMode.depth).map!(a => tuple(a.size)).array;
 				foreach (f; parallel(SdFiles,1))
 					{ subdirTotal += f[0]; }
 						subdirTotalGB = (subdirTotal/1024/1024);
 						if (subdirTotalGB > SizeDir) { Subdata ~= [d[0], 
 to!string(subdirTotalGB)]; }
 						subdirTotal = 0;
 		    }
 			return Subdata.data;
 }

 From,
 Vino.B

Try to suppress the globMatch. according to the glob, just a 
ctRegex would do the job or even more simple `!a.canFind("DND")`.

Sep 06 2017

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Performance Issue