digitalmars.D.learn - What is the memory usage of my app?

Adil (68/68) Apr 16 2015 I've written a simple socket-server app that securities (stock

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (5/9) Apr 16 2015 I'd say this is memory allocated while you load the CSV file. I
Laeeth Isharc (36/106) Apr 16 2015 Fwiw, I have been working on something similar. Others will have

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (12/14) Apr 16 2015 If your sure that CSV reading is the culprit, writing a custom

Adil (91/105) Apr 17 2015 These are REALLY USEFUL optimizations! Thanks Marc.

Adil (10/136) Apr 17 2015 Laeeth,

=?UTF-8?B?Ik3DoXJjaW8=?= Martins" (41/111) Apr 17 2015 After a quick look, it seems like you are only count the fields

=?UTF-8?B?Ik3DoXJjaW8=?= Martins" (2/134) Apr 17 2015 Sorry for the poor grammar - I hate it that I can't edit posts :P

Adil (15/161) Apr 18 2015 Thank you for your insight Marcio. That was helpful. I'm inclined

"Adil" <adil stockopedia.com> writes:

I've written a simple socket-server app that securities (stock
market shares) data and allows clients to query over them. The
app starts by loading instrument information from a CSV file into
some structs, then listens on a socket responding to queries. It
doesn't mutate the data or allocate anything substantial.

There are 2 main structs in the app. One stores security data,
and the other groups together securities. They are defined as
follows :

````
__gshared Securities securities;

struct Security
{
           string RIC;
           string TRBC;
           string[string] fields;
           double[string] doubles;

            nogc  property pure size_t bytes()
           {
               size_t bytes;

               bytes = RIC.sizeof + RIC.length;
               bytes += TRBC.sizeof + TRBC.length;

               foreach(k,v; fields) {
                   bytes += (k.sizeof + k.length + v.sizeof +
v.length);
               }

               foreach(k, v; doubles) {
                   bytes += (k.sizeof + k.length + v.sizeof);
               }

               return bytes + Security.sizeof;
           }
}

struct Securities
{
           Security[] securities;
           private size_t[string] rics;

           // Store offsets for each TRBC group
           ulong[2][string] econSect;
           ulong[2][string] busSect;
           ulong[2][string] IndGrp;
           ulong[2][string] Ind;

            nogc  property pure size_t bytes()
           {
               size_t bytes;

               foreach(Security s; securities) {
                   bytes += s.sizeof + s.bytes;
               }

               foreach(k, v; rics) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               foreach(k, v; econSect) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               foreach(k, v; busSect) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               foreach(k, v; IndGrp) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               foreach(k, v; Ind) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               return bytes + Securities.sizeof;
           }
}
````

Calling Securities.bytes shows "188 MB", but "ps" shows about 591
MB of Resident memory. Where is the memory usage coming from?
What am i missing?

Apr 16 2015

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Thursday, 16 April 2015 at 12:17:24 UTC, Adil wrote:
 Calling Securities.bytes shows "188 MB", but "ps" shows about 
 591
 MB of Resident memory. Where is the memory usage coming from?
 What am i missing?

I'd say this is memory allocated while you load the CSV file. I 
can't tell much more without seeing the actual code.

Suggestion: Compile with `dmd -vgc` and look where allocations 
happen, especially in loops.

Apr 16 2015

"Laeeth Isharc" <nospamlaeeth nospam.laeeth.com> writes:

Fwiw, I have been working on something similar.  Others will have 
more experience on the GC, but perhaps you might find this 
interesting.

For CSV files, what I found is that parsing is quite slow (and 
memory intensive).  So rather than parse the same data every 
time, I found it helpful to do so once in a batch that runs on a 
cron job, and write out to msgpack format.

I am not a GC expert, but what happens if you run GC.collect() 
once you are done parsing?

auto loadGiltPrices()
{
	auto data=cast(ubyte[])std.file.read("/hist/msgpack/dmo.pack");
	return cast(immutable)data.unpack!(GiltPriceFromDMO[][string]);
}

struct GiltPriceFromDMO
{
	string name;
	string ISIN;
	KPDateTime redemptionDate;
	KPDateTime closeDate;
	int indexLag;
	double cleanPrice;
	double dirtyPrice;
	double accrued;
	double yield;
	double modifiedDuration;
}

void main(string[] args)
{
	auto gilts=readCSVDMO();
	ubyte[] data=pack(gilts);
	std.file.write("dmo.pack",data);
	writefln("* done");
	data=cast(ubyte[])std.file.read("dmo.pack");
}

On Thursday, 16 April 2015 at 12:17:24 UTC, Adil wrote:
 I've written a simple socket-server app that securities (stock
 market shares) data and allows clients to query over them. The
 app starts by loading instrument information from a CSV file 
 into
 some structs, then listens on a socket responding to queries. It
 doesn't mutate the data or allocate anything substantial.

 There are 2 main structs in the app. One stores security data,
 and the other groups together securities. They are defined as
 follows :

 ````
 __gshared Securities securities;

 struct Security
 {
           string RIC;
           string TRBC;
           string[string] fields;
           double[string] doubles;

            nogc  property pure size_t bytes()
           {
               size_t bytes;

               bytes = RIC.sizeof + RIC.length;
               bytes += TRBC.sizeof + TRBC.length;

               foreach(k,v; fields) {
                   bytes += (k.sizeof + k.length + v.sizeof +
 v.length);
               }

               foreach(k, v; doubles) {
                   bytes += (k.sizeof + k.length + v.sizeof);
               }

               return bytes + Security.sizeof;
           }
 }

 struct Securities
 {
           Security[] securities;
           private size_t[string] rics;

           // Store offsets for each TRBC group
           ulong[2][string] econSect;
           ulong[2][string] busSect;
           ulong[2][string] IndGrp;
           ulong[2][string] Ind;

            nogc  property pure size_t bytes()
           {
               size_t bytes;

               foreach(Security s; securities) {
                   bytes += s.sizeof + s.bytes;
               }

               foreach(k, v; rics) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               foreach(k, v; econSect) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               foreach(k, v; busSect) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               foreach(k, v; IndGrp) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               foreach(k, v; Ind) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               return bytes + Securities.sizeof;
           }
 }
 ````

 Calling Securities.bytes shows "188 MB", but "ps" shows about 
 591
 MB of Resident memory. Where is the memory usage coming from?
 What am i missing?

Apr 16 2015

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Thursday, 16 April 2015 at 17:13:25 UTC, Laeeth Isharc wrote:
 For CSV files, what I found is that parsing is quite slow (and 
 memory intensive).

If your sure that CSV reading is the culprit, writing a custom 
parser could help. It's possible to load a CSV file with almost 
no memory overhead. What I would do:

- Use std.mmfile with Mode.readCopyOnWrite to map the file into 
memory.
- Iterate over the lines, and then over the fields using 
std.algorithm.splitter.
- Don't copy, but return slices into the mapped memory.
- If a field needs to be unescaped, this can be done in-place. 
Unescaping never makes a string longer, and the original file 
won't be modified thanks to COW (private mapping).

Apr 16 2015

"Adil" <adil stockopedia.com> writes:

On Thursday, 16 April 2015 at 20:33:17 UTC, Marc Schütz wrote:
 On Thursday, 16 April 2015 at 17:13:25 UTC, Laeeth Isharc wrote:
 For CSV files, what I found is that parsing is quite slow (and 
 memory intensive).

 If your sure that CSV reading is the culprit, writing a custom 
 parser could help. It's possible to load a CSV file with almost 
 no memory overhead. What I would do:

 - Use std.mmfile with Mode.readCopyOnWrite to map the file into 
 memory.
 - Iterate over the lines, and then over the fields using 
 std.algorithm.splitter.
 - Don't copy, but return slices into the mapped memory.
 - If a field needs to be unescaped, this can be done in-place. 
 Unescaping never makes a string longer, and the original file 
 won't be modified thanks to COW (private mapping).

These are REALLY USEFUL optimizations! Thanks Marc.

Although, i'm still no better with the memory usage. I've reduced 
the application to just loading a CSV file into structs.

Here is void main :

````
void main(string[] args)
{
     auto text = readText(args[1]);

     foreach(record; csvReader!(string[string])(text, null))
     {
         if (!record["RIC"] || !record["TRBCIndCode"]) {
             continue;
         }

         // Add a Security to Securities
         securities.add(record["RIC"], record["TRBCIndCode"], 
record, []);
     }

     delete text;

     GC.collect();

     writefln("%d securities processed", securities.length);
     writefln("Securities : %d MB", securities.bytes/1024/1024);

     import core.thread;
     Thread.sleep(dur!"seconds"(60));
}
````

The output is :

````
make screener-d-simple; ./screener-d data/instruments-clean.csv

dmd -vgc -ofscreener-d source/simplemain.d source/lib/security.d
source/simplemain.d(30): vgc: indexing an associative array may 
cause GC allocation
source/simplemain.d(30): vgc: indexing an associative array may 
cause GC allocation
source/simplemain.d(35): vgc: indexing an associative array may 
cause GC allocation
source/simplemain.d(35): vgc: indexing an associative array may 
cause GC allocation
source/simplemain.d(38): vgc: 'delete' requires GC
source/lib/security.d(105): vgc: indexing an associative array 
may cause GC allocation
source/lib/security.d(111): vgc: indexing an associative array 
may cause GC allocation
source/lib/security.d(113): vgc: indexing an associative array 
may cause GC allocation
source/lib/security.d(115): vgc: indexing an associative array 
may cause GC allocation
source/lib/security.d(118): vgc: indexing an associative array 
may cause GC allocation
source/lib/security.d(122): vgc: operator ~= may cause GC 
allocation
source/lib/security.d(123): vgc: indexing an associative array 
may cause GC allocation
source/lib/security.d(164): vgc: indexing an associative array 
may cause GC allocation
source/lib/security.d(164): vgc: indexing an associative array 
may cause GC allocation
source/lib/security.d(173): vgc: indexing an associative array 
may cause GC allocation
source/lib/security.d(173): vgc: indexing an associative array 
may cause GC allocation
source/lib/security.d(182): vgc: indexing an associative array 
may cause GC allocation
source/lib/security.d(182): vgc: indexing an associative array 
may cause GC allocation
source/lib/security.d(191): vgc: indexing an associative array 
may cause GC allocation
source/lib/security.d(191): vgc: indexing an associative array 
may cause GC allocation
source/lib/security.d(203): vgc: indexing an associative array 
may cause GC allocation
source/lib/security.d-mixin-213(213): vgc: indexing an 
associative array may cause GC allocation
source/lib/security.d-mixin-213(213): vgc: indexing an 
associative array may cause GC allocation
source/lib/security.d-mixin-219(219): vgc: indexing an 
associative array may cause GC allocation
source/lib/security.d-mixin-219(219): vgc: indexing an 
associative array may cause GC allocation
source/lib/security.d-mixin-225(225): vgc: indexing an 
associative array may cause GC allocation
source/lib/security.d-mixin-225(225): vgc: indexing an 
associative array may cause GC allocation
source/lib/security.d-mixin-231(231): vgc: indexing an 
associative array may cause GC allocation
source/lib/security.d-mixin-231(231): vgc: indexing an 
associative array may cause GC allocation

20066 securities processed
Securities : 188 MB
````

And yet memory usage is 617 MB.

Apr 17 2015

"Adil" <adil stockopedia.com> writes:

On Thursday, 16 April 2015 at 17:13:25 UTC, Laeeth Isharc wrote:
 Fwiw, I have been working on something similar.  Others will 
 have more experience on the GC, but perhaps you might find this 
 interesting.

 For CSV files, what I found is that parsing is quite slow (and 
 memory intensive).  So rather than parse the same data every 
 time, I found it helpful to do so once in a batch that runs on 
 a cron job, and write out to msgpack format.

 I am not a GC expert, but what happens if you run GC.collect() 
 once you are done parsing?

 auto loadGiltPrices()
 {
 	auto data=cast(ubyte[])std.file.read("/hist/msgpack/dmo.pack");
 	return cast(immutable)data.unpack!(GiltPriceFromDMO[][string]);
 }

 struct GiltPriceFromDMO
 {
 	string name;
 	string ISIN;
 	KPDateTime redemptionDate;
 	KPDateTime closeDate;
 	int indexLag;
 	double cleanPrice;
 	double dirtyPrice;
 	double accrued;
 	double yield;
 	double modifiedDuration;
 }

 void main(string[] args)
 {
 	auto gilts=readCSVDMO();
 	ubyte[] data=pack(gilts);
 	std.file.write("dmo.pack",data);
 	writefln("* done");
 	data=cast(ubyte[])std.file.read("dmo.pack");
 }

 On Thursday, 16 April 2015 at 12:17:24 UTC, Adil wrote:
 I've written a simple socket-server app that securities (stock
 market shares) data and allows clients to query over them. The
 app starts by loading instrument information from a CSV file 
 into
 some structs, then listens on a socket responding to queries. 
 It
 doesn't mutate the data or allocate anything substantial.

 There are 2 main structs in the app. One stores security data,
 and the other groups together securities. They are defined as
 follows :

 ````
 __gshared Securities securities;

 struct Security
 {
          string RIC;
          string TRBC;
          string[string] fields;
          double[string] doubles;

           nogc  property pure size_t bytes()
          {
              size_t bytes;

              bytes = RIC.sizeof + RIC.length;
              bytes += TRBC.sizeof + TRBC.length;

              foreach(k,v; fields) {
                  bytes += (k.sizeof + k.length + v.sizeof +
 v.length);
              }

              foreach(k, v; doubles) {
                  bytes += (k.sizeof + k.length + v.sizeof);
              }

              return bytes + Security.sizeof;
          }
 }

 struct Securities
 {
          Security[] securities;
          private size_t[string] rics;

          // Store offsets for each TRBC group
          ulong[2][string] econSect;
          ulong[2][string] busSect;
          ulong[2][string] IndGrp;
          ulong[2][string] Ind;

           nogc  property pure size_t bytes()
          {
              size_t bytes;

              foreach(Security s; securities) {
                  bytes += s.sizeof + s.bytes;
              }

              foreach(k, v; rics) {
                  bytes += k.sizeof + k.length + v.sizeof;
              }

              foreach(k, v; econSect) {
                  bytes += k.sizeof + k.length + v.sizeof;
              }

              foreach(k, v; busSect) {
                  bytes += k.sizeof + k.length + v.sizeof;
              }

              foreach(k, v; IndGrp) {
                  bytes += k.sizeof + k.length + v.sizeof;
              }

              foreach(k, v; Ind) {
                  bytes += k.sizeof + k.length + v.sizeof;
              }

              return bytes + Securities.sizeof;
          }
 }
 ````

 Calling Securities.bytes shows "188 MB", but "ps" shows about 
 591
 MB of Resident memory. Where is the memory usage coming from?
 What am i missing?


Laeeth,

GC.collect() made no difference. It seems the memory is being 
held by the data structures above. I think i may not be 
accounting for hash table usage properly, or it could be 
something else.

I only need to work with interday data for now, so the CSV load 
speed doesn't bother me atm. Great idea on using the mmfile w 
msgpack! I will try that out.

Adil

Apr 17 2015

=?UTF-8?B?Ik3DoXJjaW8=?= Martins" <marcioapm gmail.com> writes:

On Thursday, 16 April 2015 at 12:17:24 UTC, Adil wrote:
 I've written a simple socket-server app that securities (stock
 market shares) data and allows clients to query over them. The
 app starts by loading instrument information from a CSV file 
 into
 some structs, then listens on a socket responding to queries. It
 doesn't mutate the data or allocate anything substantial.

 There are 2 main structs in the app. One stores security data,
 and the other groups together securities. They are defined as
 follows :

 ````
 __gshared Securities securities;

 struct Security
 {
           string RIC;
           string TRBC;
           string[string] fields;
           double[string] doubles;

            nogc  property pure size_t bytes()
           {
               size_t bytes;

               bytes = RIC.sizeof + RIC.length;
               bytes += TRBC.sizeof + TRBC.length;

               foreach(k,v; fields) {
                   bytes += (k.sizeof + k.length + v.sizeof +
 v.length);
               }

               foreach(k, v; doubles) {
                   bytes += (k.sizeof + k.length + v.sizeof);
               }

               return bytes + Security.sizeof;
           }
 }

 struct Securities
 {
           Security[] securities;
           private size_t[string] rics;

           // Store offsets for each TRBC group
           ulong[2][string] econSect;
           ulong[2][string] busSect;
           ulong[2][string] IndGrp;
           ulong[2][string] Ind;

            nogc  property pure size_t bytes()
           {
               size_t bytes;

               foreach(Security s; securities) {
                   bytes += s.sizeof + s.bytes;
               }

               foreach(k, v; rics) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               foreach(k, v; econSect) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               foreach(k, v; busSect) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               foreach(k, v; IndGrp) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               foreach(k, v; Ind) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               return bytes + Securities.sizeof;
           }
 }
 ````

 Calling Securities.bytes shows "188 MB", but "ps" shows about 
 591
 MB of Resident memory. Where is the memory usage coming from?
 What am i missing?

After a quick look, it seems like you are only count the fields 
memory in the associative arrays, but forgetting about the 
internal data structure memory - this is a common mistake.

Depending on D's associative array implementation and growth 
policies, (which I am not familiar with, yet), you might be 
paying a lot of overhead from having so many of them, all of them 
holding relatively small types,
which make the overhead/payload ratio very bad.
Unfortunately, to my knowledge, there is no way to query the 
current capacity or load factor of an AA.

If I am reading druntime's code correctly, if your hash table 
contains at least five elements, you are already paying at least 
for sizeof(void*) * 31. The 31 grows based on predefined prime 
number list you can see here: 
https://github.com/D-Programming-Language/druntime/blob/master/src/rt/aaA.d#L36

I hope you can see how this overhead is gigantic for your case, 
when you're mapping string -> double, or string -> ulong[2]

In addition, each allocation on the runtime heap incurs a booking 
keeping cost of at least one pointer size, *often more*, and a 
lot of times an addition extra padding cost for alignment 
requirements.

There are a few more hidden costs that you can't easily avoid or 
even calculate from within your binary that you will see in the 
size the OS reports.

The solution in your case is to use more flat arrays and less AAs.
AAs are not a silver bullet! Sometimes it's faster to do 
linear/binary search in a contiguous block of an array than to 
search through an AA. This is very often the case for D's current 
AA implementation.

Rant: I think D's associative array implementation is pretty bad 
for such an integral and often used part of the language. Mostly 
due to it being implemented in the runtime, as opposed to being 
an inlineable library template, but also because it's using an 
old-school linked-list approach which is pretty bad for you CPU 
caches. I generally roll my own hash tables for perf sensitive 
scenarios, which are more cpu efficient and almost always also 
more memory efficient.


Sorry for the wall of text! I thought I'd elaborate a bit more 
since I rarely see these hidden costs mentioned anywhere, in 
addition to a general overuse of AAs.

Apr 17 2015

=?UTF-8?B?Ik3DoXJjaW8=?= Martins" <marcioapm gmail.com> writes:

On Friday, 17 April 2015 at 14:49:19 UTC, Márcio Martins wrote:
 On Thursday, 16 April 2015 at 12:17:24 UTC, Adil wrote:
 I've written a simple socket-server app that securities (stock
 market shares) data and allows clients to query over them. The
 app starts by loading instrument information from a CSV file 
 into
 some structs, then listens on a socket responding to queries. 
 It
 doesn't mutate the data or allocate anything substantial.

 There are 2 main structs in the app. One stores security data,
 and the other groups together securities. They are defined as
 follows :

 ````
 __gshared Securities securities;

 struct Security
 {
          string RIC;
          string TRBC;
          string[string] fields;
          double[string] doubles;

           nogc  property pure size_t bytes()
          {
              size_t bytes;

              bytes = RIC.sizeof + RIC.length;
              bytes += TRBC.sizeof + TRBC.length;

              foreach(k,v; fields) {
                  bytes += (k.sizeof + k.length + v.sizeof +
 v.length);
              }

              foreach(k, v; doubles) {
                  bytes += (k.sizeof + k.length + v.sizeof);
              }

              return bytes + Security.sizeof;
          }
 }

 struct Securities
 {
          Security[] securities;
          private size_t[string] rics;

          // Store offsets for each TRBC group
          ulong[2][string] econSect;
          ulong[2][string] busSect;
          ulong[2][string] IndGrp;
          ulong[2][string] Ind;

           nogc  property pure size_t bytes()
          {
              size_t bytes;

              foreach(Security s; securities) {
                  bytes += s.sizeof + s.bytes;
              }

              foreach(k, v; rics) {
                  bytes += k.sizeof + k.length + v.sizeof;
              }

              foreach(k, v; econSect) {
                  bytes += k.sizeof + k.length + v.sizeof;
              }

              foreach(k, v; busSect) {
                  bytes += k.sizeof + k.length + v.sizeof;
              }

              foreach(k, v; IndGrp) {
                  bytes += k.sizeof + k.length + v.sizeof;
              }

              foreach(k, v; Ind) {
                  bytes += k.sizeof + k.length + v.sizeof;
              }

              return bytes + Securities.sizeof;
          }
 }
 ````

 Calling Securities.bytes shows "188 MB", but "ps" shows about 
 591
 MB of Resident memory. Where is the memory usage coming from?
 What am i missing?

 After a quick look, it seems like you are only count the fields 
 memory in the associative arrays, but forgetting about the 
 internal data structure memory - this is a common mistake.

 Depending on D's associative array implementation and growth 
 policies, (which I am not familiar with, yet), you might be 
 paying a lot of overhead from having so many of them, all of 
 them holding relatively small types,
 which make the overhead/payload ratio very bad.
 Unfortunately, to my knowledge, there is no way to query the 
 current capacity or load factor of an AA.

 If I am reading druntime's code correctly, if your hash table 
 contains at least five elements, you are already paying at 
 least for sizeof(void*) * 31. The 31 grows based on predefined 
 prime number list you can see here: 
 https://github.com/D-Programming-Language/druntime/blob/master/src/rt/aaA.d#L36

 I hope you can see how this overhead is gigantic for your case, 
 when you're mapping string -> double, or string -> ulong[2]

 In addition, each allocation on the runtime heap incurs a 
 booking keeping cost of at least one pointer size, *often 
 more*, and a lot of times an addition extra padding cost for 
 alignment requirements.

 There are a few more hidden costs that you can't easily avoid 
 or even calculate from within your binary that you will see in 
 the size the OS reports.

 The solution in your case is to use more flat arrays and less 
 AAs.
 AAs are not a silver bullet! Sometimes it's faster to do 
 linear/binary search in a contiguous block of an array than to 
 search through an AA. This is very often the case for D's 
 current AA implementation.

 Rant: I think D's associative array implementation is pretty 
 bad for such an integral and often used part of the language. 
 Mostly due to it being implemented in the runtime, as opposed 
 to being an inlineable library template, but also because it's 
 using an old-school linked-list approach which is pretty bad 
 for you CPU caches. I generally roll my own hash tables for 
 perf sensitive scenarios, which are more cpu efficient and 
 almost always also more memory efficient.


 Sorry for the wall of text! I thought I'd elaborate a bit more 
 since I rarely see these hidden costs mentioned anywhere, in 
 addition to a general overuse of AAs.

Sorry for the poor grammar - I hate it that I can't edit posts :P

Apr 17 2015

"Adil" <ad ad.com> writes:

On Friday, 17 April 2015 at 14:50:29 UTC, Márcio Martins wrote:
 On Friday, 17 April 2015 at 14:49:19 UTC, Márcio Martins wrote:
 On Thursday, 16 April 2015 at 12:17:24 UTC, Adil wrote:
 I've written a simple socket-server app that securities (stock
 market shares) data and allows clients to query over them. The
 app starts by loading instrument information from a CSV file 
 into
 some structs, then listens on a socket responding to queries. 
 It
 doesn't mutate the data or allocate anything substantial.

 There are 2 main structs in the app. One stores security data,
 and the other groups together securities. They are defined as
 follows :

 ````
 __gshared Securities securities;

 struct Security
 {
         string RIC;
         string TRBC;
         string[string] fields;
         double[string] doubles;

          nogc  property pure size_t bytes()
         {
             size_t bytes;

             bytes = RIC.sizeof + RIC.length;
             bytes += TRBC.sizeof + TRBC.length;

             foreach(k,v; fields) {
                 bytes += (k.sizeof + k.length + v.sizeof +
 v.length);
             }

             foreach(k, v; doubles) {
                 bytes += (k.sizeof + k.length + v.sizeof);
             }

             return bytes + Security.sizeof;
         }
 }

 struct Securities
 {
         Security[] securities;
         private size_t[string] rics;

         // Store offsets for each TRBC group
         ulong[2][string] econSect;
         ulong[2][string] busSect;
         ulong[2][string] IndGrp;
         ulong[2][string] Ind;

          nogc  property pure size_t bytes()
         {
             size_t bytes;

             foreach(Security s; securities) {
                 bytes += s.sizeof + s.bytes;
             }

             foreach(k, v; rics) {
                 bytes += k.sizeof + k.length + v.sizeof;
             }

             foreach(k, v; econSect) {
                 bytes += k.sizeof + k.length + v.sizeof;
             }

             foreach(k, v; busSect) {
                 bytes += k.sizeof + k.length + v.sizeof;
             }

             foreach(k, v; IndGrp) {
                 bytes += k.sizeof + k.length + v.sizeof;
             }

             foreach(k, v; Ind) {
                 bytes += k.sizeof + k.length + v.sizeof;
             }

             return bytes + Securities.sizeof;
         }
 }
 ````

 Calling Securities.bytes shows "188 MB", but "ps" shows about 
 591
 MB of Resident memory. Where is the memory usage coming from?
 What am i missing?

 After a quick look, it seems like you are only count the 
 fields memory in the associative arrays, but forgetting about 
 the internal data structure memory - this is a common mistake.

 Depending on D's associative array implementation and growth 
 policies, (which I am not familiar with, yet), you might be 
 paying a lot of overhead from having so many of them, all of 
 them holding relatively small types,
 which make the overhead/payload ratio very bad.
 Unfortunately, to my knowledge, there is no way to query the 
 current capacity or load factor of an AA.

 If I am reading druntime's code correctly, if your hash table 
 contains at least five elements, you are already paying at 
 least for sizeof(void*) * 31. The 31 grows based on predefined 
 prime number list you can see here: 
 https://github.com/D-Programming-Language/druntime/blob/master/src/rt/aaA.d#L36

 I hope you can see how this overhead is gigantic for your 
 case, when you're mapping string -> double, or string -> 
 ulong[2]

 In addition, each allocation on the runtime heap incurs a 
 booking keeping cost of at least one pointer size, *often 
 more*, and a lot of times an addition extra padding cost for 
 alignment requirements.

 There are a few more hidden costs that you can't easily avoid 
 or even calculate from within your binary that you will see in 
 the size the OS reports.

 The solution in your case is to use more flat arrays and less 
 AAs.
 AAs are not a silver bullet! Sometimes it's faster to do 
 linear/binary search in a contiguous block of an array than to 
 search through an AA. This is very often the case for D's 
 current AA implementation.

 Rant: I think D's associative array implementation is pretty 
 bad for such an integral and often used part of the language. 
 Mostly due to it being implemented in the runtime, as opposed 
 to being an inlineable library template, but also because it's 
 using an old-school linked-list approach which is pretty bad 
 for you CPU caches. I generally roll my own hash tables for 
 perf sensitive scenarios, which are more cpu efficient and 
 almost always also more memory efficient.


 Sorry for the wall of text! I thought I'd elaborate a bit more 
 since I rarely see these hidden costs mentioned anywhere, in 
 addition to a general overuse of AAs.

 Sorry for the poor grammar - I hate it that I can't edit posts 
 :P

Thank you for your insight Marcio. That was helpful. I'm inclined
to agree with you. I noticed some strange behaviour on behalf of
the garbage collecter as well. If i run GC.collect() once every
1000 iterations, it seems to shave off 200 Mb of mem usage!
That's a lot! I experimented with collecting at higher and lower
frequencies, one collection/1000 iterations seemed to get the
most savings while keeping load times acceptable.

This suggests, as you said, lots of small allocations, but also
that they're not being reclaimed when a GC.collect cycle is run.
Not reliably anyway. This is a bit disappointing, but i guess the
GC is a WIP. I'm afraid i have no knowledge of the GC to talk
intelligently about it any further.

+1 for looking up the druntime source! Reminder to self: check
source of open source project :)

Apr 18 2015

D Programming

C/C++ Programming

Other

digitalmars.D.learn - What is the memory usage of my app?