www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - What is the memory usage of my app?

reply "Adil" <adil stockopedia.com> writes:
I've written a simple socket-server app that securities (stock
market shares) data and allows clients to query over them. The
app starts by loading instrument information from a CSV file into
some structs, then listens on a socket responding to queries. It
doesn't mutate the data or allocate anything substantial.

There are 2 main structs in the app. One stores security data,
and the other groups together securities. They are defined as
follows :

````
__gshared Securities securities;

struct Security
{
           string RIC;
           string TRBC;
           string[string] fields;
           double[string] doubles;

            nogc  property pure size_t bytes()
           {
               size_t bytes;

               bytes = RIC.sizeof + RIC.length;
               bytes += TRBC.sizeof + TRBC.length;

               foreach(k,v; fields) {
                   bytes += (k.sizeof + k.length + v.sizeof +
v.length);
               }

               foreach(k, v; doubles) {
                   bytes += (k.sizeof + k.length + v.sizeof);
               }

               return bytes + Security.sizeof;
           }
}

struct Securities
{
           Security[] securities;
           private size_t[string] rics;

           // Store offsets for each TRBC group
           ulong[2][string] econSect;
           ulong[2][string] busSect;
           ulong[2][string] IndGrp;
           ulong[2][string] Ind;

            nogc  property pure size_t bytes()
           {
               size_t bytes;

               foreach(Security s; securities) {
                   bytes += s.sizeof + s.bytes;
               }

               foreach(k, v; rics) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               foreach(k, v; econSect) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               foreach(k, v; busSect) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               foreach(k, v; IndGrp) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               foreach(k, v; Ind) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               return bytes + Securities.sizeof;
           }
}
````

Calling Securities.bytes shows "188 MB", but "ps" shows about 591
MB of Resident memory. Where is the memory usage coming from?
What am i missing?
Apr 16 2015
next sibling parent "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Thursday, 16 April 2015 at 12:17:24 UTC, Adil wrote:
 Calling Securities.bytes shows "188 MB", but "ps" shows about 
 591
 MB of Resident memory. Where is the memory usage coming from?
 What am i missing?
I'd say this is memory allocated while you load the CSV file. I can't tell much more without seeing the actual code. Suggestion: Compile with `dmd -vgc` and look where allocations happen, especially in loops.
Apr 16 2015
prev sibling next sibling parent reply "Laeeth Isharc" <nospamlaeeth nospam.laeeth.com> writes:
Fwiw, I have been working on something similar.  Others will have 
more experience on the GC, but perhaps you might find this 
interesting.

For CSV files, what I found is that parsing is quite slow (and 
memory intensive).  So rather than parse the same data every 
time, I found it helpful to do so once in a batch that runs on a 
cron job, and write out to msgpack format.

I am not a GC expert, but what happens if you run GC.collect() 
once you are done parsing?

auto loadGiltPrices()
{
	auto data=cast(ubyte[])std.file.read("/hist/msgpack/dmo.pack");
	return cast(immutable)data.unpack!(GiltPriceFromDMO[][string]);
}

struct GiltPriceFromDMO
{
	string name;
	string ISIN;
	KPDateTime redemptionDate;
	KPDateTime closeDate;
	int indexLag;
	double cleanPrice;
	double dirtyPrice;
	double accrued;
	double yield;
	double modifiedDuration;
}

void main(string[] args)
{
	auto gilts=readCSVDMO();
	ubyte[] data=pack(gilts);
	std.file.write("dmo.pack",data);
	writefln("* done");
	data=cast(ubyte[])std.file.read("dmo.pack");
}

On Thursday, 16 April 2015 at 12:17:24 UTC, Adil wrote:
 I've written a simple socket-server app that securities (stock
 market shares) data and allows clients to query over them. The
 app starts by loading instrument information from a CSV file 
 into
 some structs, then listens on a socket responding to queries. It
 doesn't mutate the data or allocate anything substantial.

 There are 2 main structs in the app. One stores security data,
 and the other groups together securities. They are defined as
 follows :

 ````
 __gshared Securities securities;

 struct Security
 {
           string RIC;
           string TRBC;
           string[string] fields;
           double[string] doubles;

            nogc  property pure size_t bytes()
           {
               size_t bytes;

               bytes = RIC.sizeof + RIC.length;
               bytes += TRBC.sizeof + TRBC.length;

               foreach(k,v; fields) {
                   bytes += (k.sizeof + k.length + v.sizeof +
 v.length);
               }

               foreach(k, v; doubles) {
                   bytes += (k.sizeof + k.length + v.sizeof);
               }

               return bytes + Security.sizeof;
           }
 }

 struct Securities
 {
           Security[] securities;
           private size_t[string] rics;

           // Store offsets for each TRBC group
           ulong[2][string] econSect;
           ulong[2][string] busSect;
           ulong[2][string] IndGrp;
           ulong[2][string] Ind;

            nogc  property pure size_t bytes()
           {
               size_t bytes;

               foreach(Security s; securities) {
                   bytes += s.sizeof + s.bytes;
               }

               foreach(k, v; rics) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               foreach(k, v; econSect) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               foreach(k, v; busSect) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               foreach(k, v; IndGrp) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               foreach(k, v; Ind) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               return bytes + Securities.sizeof;
           }
 }
 ````

 Calling Securities.bytes shows "188 MB", but "ps" shows about 
 591
 MB of Resident memory. Where is the memory usage coming from?
 What am i missing?
Apr 16 2015
next sibling parent reply "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Thursday, 16 April 2015 at 17:13:25 UTC, Laeeth Isharc wrote:
 For CSV files, what I found is that parsing is quite slow (and 
 memory intensive).
If your sure that CSV reading is the culprit, writing a custom parser could help. It's possible to load a CSV file with almost no memory overhead. What I would do: - Use std.mmfile with Mode.readCopyOnWrite to map the file into memory. - Iterate over the lines, and then over the fields using std.algorithm.splitter. - Don't copy, but return slices into the mapped memory. - If a field needs to be unescaped, this can be done in-place. Unescaping never makes a string longer, and the original file won't be modified thanks to COW (private mapping).
Apr 16 2015
parent "Adil" <adil stockopedia.com> writes:
On Thursday, 16 April 2015 at 20:33:17 UTC, Marc Schütz wrote:
 On Thursday, 16 April 2015 at 17:13:25 UTC, Laeeth Isharc wrote:
 For CSV files, what I found is that parsing is quite slow (and 
 memory intensive).
If your sure that CSV reading is the culprit, writing a custom parser could help. It's possible to load a CSV file with almost no memory overhead. What I would do: - Use std.mmfile with Mode.readCopyOnWrite to map the file into memory. - Iterate over the lines, and then over the fields using std.algorithm.splitter. - Don't copy, but return slices into the mapped memory. - If a field needs to be unescaped, this can be done in-place. Unescaping never makes a string longer, and the original file won't be modified thanks to COW (private mapping).
These are REALLY USEFUL optimizations! Thanks Marc. Although, i'm still no better with the memory usage. I've reduced the application to just loading a CSV file into structs. Here is void main : ```` void main(string[] args) { auto text = readText(args[1]); foreach(record; csvReader!(string[string])(text, null)) { if (!record["RIC"] || !record["TRBCIndCode"]) { continue; } // Add a Security to Securities securities.add(record["RIC"], record["TRBCIndCode"], record, []); } delete text; GC.collect(); writefln("%d securities processed", securities.length); writefln("Securities : %d MB", securities.bytes/1024/1024); import core.thread; Thread.sleep(dur!"seconds"(60)); } ```` The output is : ```` make screener-d-simple; ./screener-d data/instruments-clean.csv dmd -vgc -ofscreener-d source/simplemain.d source/lib/security.d source/simplemain.d(30): vgc: indexing an associative array may cause GC allocation source/simplemain.d(30): vgc: indexing an associative array may cause GC allocation source/simplemain.d(35): vgc: indexing an associative array may cause GC allocation source/simplemain.d(35): vgc: indexing an associative array may cause GC allocation source/simplemain.d(38): vgc: 'delete' requires GC source/lib/security.d(105): vgc: indexing an associative array may cause GC allocation source/lib/security.d(111): vgc: indexing an associative array may cause GC allocation source/lib/security.d(113): vgc: indexing an associative array may cause GC allocation source/lib/security.d(115): vgc: indexing an associative array may cause GC allocation source/lib/security.d(118): vgc: indexing an associative array may cause GC allocation source/lib/security.d(122): vgc: operator ~= may cause GC allocation source/lib/security.d(123): vgc: indexing an associative array may cause GC allocation source/lib/security.d(164): vgc: indexing an associative array may cause GC allocation source/lib/security.d(164): vgc: indexing an associative array may cause GC allocation source/lib/security.d(173): vgc: indexing an associative array may cause GC allocation source/lib/security.d(173): vgc: indexing an associative array may cause GC allocation source/lib/security.d(182): vgc: indexing an associative array may cause GC allocation source/lib/security.d(182): vgc: indexing an associative array may cause GC allocation source/lib/security.d(191): vgc: indexing an associative array may cause GC allocation source/lib/security.d(191): vgc: indexing an associative array may cause GC allocation source/lib/security.d(203): vgc: indexing an associative array may cause GC allocation source/lib/security.d-mixin-213(213): vgc: indexing an associative array may cause GC allocation source/lib/security.d-mixin-213(213): vgc: indexing an associative array may cause GC allocation source/lib/security.d-mixin-219(219): vgc: indexing an associative array may cause GC allocation source/lib/security.d-mixin-219(219): vgc: indexing an associative array may cause GC allocation source/lib/security.d-mixin-225(225): vgc: indexing an associative array may cause GC allocation source/lib/security.d-mixin-225(225): vgc: indexing an associative array may cause GC allocation source/lib/security.d-mixin-231(231): vgc: indexing an associative array may cause GC allocation source/lib/security.d-mixin-231(231): vgc: indexing an associative array may cause GC allocation 20066 securities processed Securities : 188 MB ```` And yet memory usage is 617 MB.
Apr 17 2015
prev sibling parent "Adil" <adil stockopedia.com> writes:
On Thursday, 16 April 2015 at 17:13:25 UTC, Laeeth Isharc wrote:
 Fwiw, I have been working on something similar.  Others will 
 have more experience on the GC, but perhaps you might find this 
 interesting.

 For CSV files, what I found is that parsing is quite slow (and 
 memory intensive).  So rather than parse the same data every 
 time, I found it helpful to do so once in a batch that runs on 
 a cron job, and write out to msgpack format.

 I am not a GC expert, but what happens if you run GC.collect() 
 once you are done parsing?

 auto loadGiltPrices()
 {
 	auto data=cast(ubyte[])std.file.read("/hist/msgpack/dmo.pack");
 	return cast(immutable)data.unpack!(GiltPriceFromDMO[][string]);
 }

 struct GiltPriceFromDMO
 {
 	string name;
 	string ISIN;
 	KPDateTime redemptionDate;
 	KPDateTime closeDate;
 	int indexLag;
 	double cleanPrice;
 	double dirtyPrice;
 	double accrued;
 	double yield;
 	double modifiedDuration;
 }

 void main(string[] args)
 {
 	auto gilts=readCSVDMO();
 	ubyte[] data=pack(gilts);
 	std.file.write("dmo.pack",data);
 	writefln("* done");
 	data=cast(ubyte[])std.file.read("dmo.pack");
 }

 On Thursday, 16 April 2015 at 12:17:24 UTC, Adil wrote:
 I've written a simple socket-server app that securities (stock
 market shares) data and allows clients to query over them. The
 app starts by loading instrument information from a CSV file 
 into
 some structs, then listens on a socket responding to queries. 
 It
 doesn't mutate the data or allocate anything substantial.

 There are 2 main structs in the app. One stores security data,
 and the other groups together securities. They are defined as
 follows :

 ````
 __gshared Securities securities;

 struct Security
 {
          string RIC;
          string TRBC;
          string[string] fields;
          double[string] doubles;

           nogc  property pure size_t bytes()
          {
              size_t bytes;

              bytes = RIC.sizeof + RIC.length;
              bytes += TRBC.sizeof + TRBC.length;

              foreach(k,v; fields) {
                  bytes += (k.sizeof + k.length + v.sizeof +
 v.length);
              }

              foreach(k, v; doubles) {
                  bytes += (k.sizeof + k.length + v.sizeof);
              }

              return bytes + Security.sizeof;
          }
 }

 struct Securities
 {
          Security[] securities;
          private size_t[string] rics;

          // Store offsets for each TRBC group
          ulong[2][string] econSect;
          ulong[2][string] busSect;
          ulong[2][string] IndGrp;
          ulong[2][string] Ind;

           nogc  property pure size_t bytes()
          {
              size_t bytes;

              foreach(Security s; securities) {
                  bytes += s.sizeof + s.bytes;
              }

              foreach(k, v; rics) {
                  bytes += k.sizeof + k.length + v.sizeof;
              }

              foreach(k, v; econSect) {
                  bytes += k.sizeof + k.length + v.sizeof;
              }

              foreach(k, v; busSect) {
                  bytes += k.sizeof + k.length + v.sizeof;
              }

              foreach(k, v; IndGrp) {
                  bytes += k.sizeof + k.length + v.sizeof;
              }

              foreach(k, v; Ind) {
                  bytes += k.sizeof + k.length + v.sizeof;
              }

              return bytes + Securities.sizeof;
          }
 }
 ````

 Calling Securities.bytes shows "188 MB", but "ps" shows about 
 591
 MB of Resident memory. Where is the memory usage coming from?
 What am i missing?
Laeeth, GC.collect() made no difference. It seems the memory is being held by the data structures above. I think i may not be accounting for hash table usage properly, or it could be something else. I only need to work with interday data for now, so the CSV load speed doesn't bother me atm. Great idea on using the mmfile w msgpack! I will try that out. Adil
Apr 17 2015
prev sibling parent reply =?UTF-8?B?Ik3DoXJjaW8=?= Martins" <marcioapm gmail.com> writes:
On Thursday, 16 April 2015 at 12:17:24 UTC, Adil wrote:
 I've written a simple socket-server app that securities (stock
 market shares) data and allows clients to query over them. The
 app starts by loading instrument information from a CSV file 
 into
 some structs, then listens on a socket responding to queries. It
 doesn't mutate the data or allocate anything substantial.

 There are 2 main structs in the app. One stores security data,
 and the other groups together securities. They are defined as
 follows :

 ````
 __gshared Securities securities;

 struct Security
 {
           string RIC;
           string TRBC;
           string[string] fields;
           double[string] doubles;

            nogc  property pure size_t bytes()
           {
               size_t bytes;

               bytes = RIC.sizeof + RIC.length;
               bytes += TRBC.sizeof + TRBC.length;

               foreach(k,v; fields) {
                   bytes += (k.sizeof + k.length + v.sizeof +
 v.length);
               }

               foreach(k, v; doubles) {
                   bytes += (k.sizeof + k.length + v.sizeof);
               }

               return bytes + Security.sizeof;
           }
 }

 struct Securities
 {
           Security[] securities;
           private size_t[string] rics;

           // Store offsets for each TRBC group
           ulong[2][string] econSect;
           ulong[2][string] busSect;
           ulong[2][string] IndGrp;
           ulong[2][string] Ind;

            nogc  property pure size_t bytes()
           {
               size_t bytes;

               foreach(Security s; securities) {
                   bytes += s.sizeof + s.bytes;
               }

               foreach(k, v; rics) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               foreach(k, v; econSect) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               foreach(k, v; busSect) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               foreach(k, v; IndGrp) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               foreach(k, v; Ind) {
                   bytes += k.sizeof + k.length + v.sizeof;
               }

               return bytes + Securities.sizeof;
           }
 }
 ````

 Calling Securities.bytes shows "188 MB", but "ps" shows about 
 591
 MB of Resident memory. Where is the memory usage coming from?
 What am i missing?
After a quick look, it seems like you are only count the fields memory in the associative arrays, but forgetting about the internal data structure memory - this is a common mistake. Depending on D's associative array implementation and growth policies, (which I am not familiar with, yet), you might be paying a lot of overhead from having so many of them, all of them holding relatively small types, which make the overhead/payload ratio very bad. Unfortunately, to my knowledge, there is no way to query the current capacity or load factor of an AA. If I am reading druntime's code correctly, if your hash table contains at least five elements, you are already paying at least for sizeof(void*) * 31. The 31 grows based on predefined prime number list you can see here: https://github.com/D-Programming-Language/druntime/blob/master/src/rt/aaA.d#L36 I hope you can see how this overhead is gigantic for your case, when you're mapping string -> double, or string -> ulong[2] In addition, each allocation on the runtime heap incurs a booking keeping cost of at least one pointer size, *often more*, and a lot of times an addition extra padding cost for alignment requirements. There are a few more hidden costs that you can't easily avoid or even calculate from within your binary that you will see in the size the OS reports. The solution in your case is to use more flat arrays and less AAs. AAs are not a silver bullet! Sometimes it's faster to do linear/binary search in a contiguous block of an array than to search through an AA. This is very often the case for D's current AA implementation. Rant: I think D's associative array implementation is pretty bad for such an integral and often used part of the language. Mostly due to it being implemented in the runtime, as opposed to being an inlineable library template, but also because it's using an old-school linked-list approach which is pretty bad for you CPU caches. I generally roll my own hash tables for perf sensitive scenarios, which are more cpu efficient and almost always also more memory efficient. Sorry for the wall of text! I thought I'd elaborate a bit more since I rarely see these hidden costs mentioned anywhere, in addition to a general overuse of AAs.
Apr 17 2015
parent reply =?UTF-8?B?Ik3DoXJjaW8=?= Martins" <marcioapm gmail.com> writes:
On Friday, 17 April 2015 at 14:49:19 UTC, Márcio Martins wrote:
 On Thursday, 16 April 2015 at 12:17:24 UTC, Adil wrote:
 I've written a simple socket-server app that securities (stock
 market shares) data and allows clients to query over them. The
 app starts by loading instrument information from a CSV file 
 into
 some structs, then listens on a socket responding to queries. 
 It
 doesn't mutate the data or allocate anything substantial.

 There are 2 main structs in the app. One stores security data,
 and the other groups together securities. They are defined as
 follows :

 ````
 __gshared Securities securities;

 struct Security
 {
          string RIC;
          string TRBC;
          string[string] fields;
          double[string] doubles;

           nogc  property pure size_t bytes()
          {
              size_t bytes;

              bytes = RIC.sizeof + RIC.length;
              bytes += TRBC.sizeof + TRBC.length;

              foreach(k,v; fields) {
                  bytes += (k.sizeof + k.length + v.sizeof +
 v.length);
              }

              foreach(k, v; doubles) {
                  bytes += (k.sizeof + k.length + v.sizeof);
              }

              return bytes + Security.sizeof;
          }
 }

 struct Securities
 {
          Security[] securities;
          private size_t[string] rics;

          // Store offsets for each TRBC group
          ulong[2][string] econSect;
          ulong[2][string] busSect;
          ulong[2][string] IndGrp;
          ulong[2][string] Ind;

           nogc  property pure size_t bytes()
          {
              size_t bytes;

              foreach(Security s; securities) {
                  bytes += s.sizeof + s.bytes;
              }

              foreach(k, v; rics) {
                  bytes += k.sizeof + k.length + v.sizeof;
              }

              foreach(k, v; econSect) {
                  bytes += k.sizeof + k.length + v.sizeof;
              }

              foreach(k, v; busSect) {
                  bytes += k.sizeof + k.length + v.sizeof;
              }

              foreach(k, v; IndGrp) {
                  bytes += k.sizeof + k.length + v.sizeof;
              }

              foreach(k, v; Ind) {
                  bytes += k.sizeof + k.length + v.sizeof;
              }

              return bytes + Securities.sizeof;
          }
 }
 ````

 Calling Securities.bytes shows "188 MB", but "ps" shows about 
 591
 MB of Resident memory. Where is the memory usage coming from?
 What am i missing?
After a quick look, it seems like you are only count the fields memory in the associative arrays, but forgetting about the internal data structure memory - this is a common mistake. Depending on D's associative array implementation and growth policies, (which I am not familiar with, yet), you might be paying a lot of overhead from having so many of them, all of them holding relatively small types, which make the overhead/payload ratio very bad. Unfortunately, to my knowledge, there is no way to query the current capacity or load factor of an AA. If I am reading druntime's code correctly, if your hash table contains at least five elements, you are already paying at least for sizeof(void*) * 31. The 31 grows based on predefined prime number list you can see here: https://github.com/D-Programming-Language/druntime/blob/master/src/rt/aaA.d#L36 I hope you can see how this overhead is gigantic for your case, when you're mapping string -> double, or string -> ulong[2] In addition, each allocation on the runtime heap incurs a booking keeping cost of at least one pointer size, *often more*, and a lot of times an addition extra padding cost for alignment requirements. There are a few more hidden costs that you can't easily avoid or even calculate from within your binary that you will see in the size the OS reports. The solution in your case is to use more flat arrays and less AAs. AAs are not a silver bullet! Sometimes it's faster to do linear/binary search in a contiguous block of an array than to search through an AA. This is very often the case for D's current AA implementation. Rant: I think D's associative array implementation is pretty bad for such an integral and often used part of the language. Mostly due to it being implemented in the runtime, as opposed to being an inlineable library template, but also because it's using an old-school linked-list approach which is pretty bad for you CPU caches. I generally roll my own hash tables for perf sensitive scenarios, which are more cpu efficient and almost always also more memory efficient. Sorry for the wall of text! I thought I'd elaborate a bit more since I rarely see these hidden costs mentioned anywhere, in addition to a general overuse of AAs.
Sorry for the poor grammar - I hate it that I can't edit posts :P
Apr 17 2015
parent "Adil" <ad ad.com> writes:
On Friday, 17 April 2015 at 14:50:29 UTC, Márcio Martins wrote:
 On Friday, 17 April 2015 at 14:49:19 UTC, Márcio Martins wrote:
 On Thursday, 16 April 2015 at 12:17:24 UTC, Adil wrote:
 I've written a simple socket-server app that securities (stock
 market shares) data and allows clients to query over them. The
 app starts by loading instrument information from a CSV file 
 into
 some structs, then listens on a socket responding to queries. 
 It
 doesn't mutate the data or allocate anything substantial.

 There are 2 main structs in the app. One stores security data,
 and the other groups together securities. They are defined as
 follows :

 ````
 __gshared Securities securities;

 struct Security
 {
         string RIC;
         string TRBC;
         string[string] fields;
         double[string] doubles;

          nogc  property pure size_t bytes()
         {
             size_t bytes;

             bytes = RIC.sizeof + RIC.length;
             bytes += TRBC.sizeof + TRBC.length;

             foreach(k,v; fields) {
                 bytes += (k.sizeof + k.length + v.sizeof +
 v.length);
             }

             foreach(k, v; doubles) {
                 bytes += (k.sizeof + k.length + v.sizeof);
             }

             return bytes + Security.sizeof;
         }
 }

 struct Securities
 {
         Security[] securities;
         private size_t[string] rics;

         // Store offsets for each TRBC group
         ulong[2][string] econSect;
         ulong[2][string] busSect;
         ulong[2][string] IndGrp;
         ulong[2][string] Ind;

          nogc  property pure size_t bytes()
         {
             size_t bytes;

             foreach(Security s; securities) {
                 bytes += s.sizeof + s.bytes;
             }

             foreach(k, v; rics) {
                 bytes += k.sizeof + k.length + v.sizeof;
             }

             foreach(k, v; econSect) {
                 bytes += k.sizeof + k.length + v.sizeof;
             }

             foreach(k, v; busSect) {
                 bytes += k.sizeof + k.length + v.sizeof;
             }

             foreach(k, v; IndGrp) {
                 bytes += k.sizeof + k.length + v.sizeof;
             }

             foreach(k, v; Ind) {
                 bytes += k.sizeof + k.length + v.sizeof;
             }

             return bytes + Securities.sizeof;
         }
 }
 ````

 Calling Securities.bytes shows "188 MB", but "ps" shows about 
 591
 MB of Resident memory. Where is the memory usage coming from?
 What am i missing?
After a quick look, it seems like you are only count the fields memory in the associative arrays, but forgetting about the internal data structure memory - this is a common mistake. Depending on D's associative array implementation and growth policies, (which I am not familiar with, yet), you might be paying a lot of overhead from having so many of them, all of them holding relatively small types, which make the overhead/payload ratio very bad. Unfortunately, to my knowledge, there is no way to query the current capacity or load factor of an AA. If I am reading druntime's code correctly, if your hash table contains at least five elements, you are already paying at least for sizeof(void*) * 31. The 31 grows based on predefined prime number list you can see here: https://github.com/D-Programming-Language/druntime/blob/master/src/rt/aaA.d#L36 I hope you can see how this overhead is gigantic for your case, when you're mapping string -> double, or string -> ulong[2] In addition, each allocation on the runtime heap incurs a booking keeping cost of at least one pointer size, *often more*, and a lot of times an addition extra padding cost for alignment requirements. There are a few more hidden costs that you can't easily avoid or even calculate from within your binary that you will see in the size the OS reports. The solution in your case is to use more flat arrays and less AAs. AAs are not a silver bullet! Sometimes it's faster to do linear/binary search in a contiguous block of an array than to search through an AA. This is very often the case for D's current AA implementation. Rant: I think D's associative array implementation is pretty bad for such an integral and often used part of the language. Mostly due to it being implemented in the runtime, as opposed to being an inlineable library template, but also because it's using an old-school linked-list approach which is pretty bad for you CPU caches. I generally roll my own hash tables for perf sensitive scenarios, which are more cpu efficient and almost always also more memory efficient. Sorry for the wall of text! I thought I'd elaborate a bit more since I rarely see these hidden costs mentioned anywhere, in addition to a general overuse of AAs.
Sorry for the poor grammar - I hate it that I can't edit posts :P
Thank you for your insight Marcio. That was helpful. I'm inclined to agree with you. I noticed some strange behaviour on behalf of the garbage collecter as well. If i run GC.collect() once every 1000 iterations, it seems to shave off 200 Mb of mem usage! That's a lot! I experimented with collecting at higher and lower frequencies, one collection/1000 iterations seemed to get the most savings while keeping load times acceptable. This suggests, as you said, lots of small allocations, but also that they're not being reclaimed when a GC.collect cycle is run. Not reliably anyway. This is a bit disappointing, but i guess the GC is a WIP. I'm afraid i have no knowledge of the GC to talk intelligently about it any further. +1 for looking up the druntime source! Reminder to self: check source of open source project :)
Apr 18 2015