digitalmars.D - BigInt memory usage

bearophile (18/21) Oct 12 2011 On a 32 bit system this program has allocates about 100 MB of RAM:

Jay Norwood (3/36) Oct 12 2011 I haven't looked at the std implementation, but I liked this guy's packa...

kennytm (2/10) Oct 12 2011

Marco Leise (16/40) Oct 13 2011 So the BigInt struct has a size of 3 machine words:

Steven Schveighoffer (7/59) Oct 13 2011 Allocating an array of 4 uints allocates a block large enough to hold 7 ...

Jay Norwood (15/15) Oct 13 2011 I ran your test and measured what is allocated for a couple of individu...

bearophile <bearophileHUGS lycos.com> writes:

On a 32 bit system this program has allocates about 100 MB of RAM:


import std.bigint;
int main() {
    static assert(BigInt.sizeof == 12);
    auto data = new BigInt[2_000_000];
    foreach (i; 0 .. data.length)
        data[i] = BigInt("9223372036854775808123");

    int count;
    foreach (i; 0 .. 2_000_000_000) { count++; }
    return count;
}



It means about 50 bytes/biginteger.

Every zero BigInt struct needs 12 bytes inside the 'data' array.

The represented numeral requires about 73 bits, this means 3 words (with lot of
wasted space):

 l = 9223372036854775808123
 from math import log
 log(l, 2)



72.965784284662092


Using so much memory allows to store less numbers, and makes them slower too (I
have hit both problems). Do you know why D BigInts use so much memory?

Bye,
bearophile

Oct 12 2011

Jay Norwood <jayn prismnet.com> writes:

I haven't looked at the std implementation, but I liked this guy's package in
c++ .  He uses strings to hold the data, but you choose how the string packs
the digits, with one of the choices being base 256 for the most efficient
storage.    It might be a nice fit for a D port.

http://www.hvks.com/Numerical/arbitrary_precision.html

bearophile Wrote:

 On a 32 bit system this program has allocates about 100 MB of RAM:
 
 
 import std.bigint;
 int main() {
     static assert(BigInt.sizeof == 12);
     auto data = new BigInt[2_000_000];
     foreach (i; 0 .. data.length)
         data[i] = BigInt("9223372036854775808123");
 
     int count;
     foreach (i; 0 .. 2_000_000_000) { count++; }
     return count;
 }
 
 
 
 It means about 50 bytes/biginteger.
 
 Every zero BigInt struct needs 12 bytes inside the 'data' array.
 
 The represented numeral requires about 73 bits, this means 3 words (with lot
of wasted space):
 
 l = 9223372036854775808123
 from math import log
 log(l, 2)



 72.965784284662092
 
 
 Using so much memory allows to store less numbers, and makes them slower too
(I have hit both problems). Do you know why D BigInts use so much memory?
 
 Bye,
 bearophile

Oct 12 2011

kennytm <kennytm gmail.com> writes:

Jay Norwood <jayn prismnet.com> wrote:
 I haven't looked at the std implementation, but I liked this guy's
 package in c++ .  He uses strings to hold the data, but you choose how
 the string packs the digits, with one of the choices being base 256 for
 the most efficient storage.    It might be a nice fit for a D port.
 

D's big-int *is* using base-256. (actually, base-(size_t.max+1))

 http://www.hvks.com/Numerical/arbitrary_precision.html
 
 bearophile Wrote:

Oct 12 2011

"Marco Leise" <Marco.Leise gmx.de> writes:

Am 13.10.2011, 02:26 Uhr, schrieb bearophile <bearophileHUGS lycos.com>:

 On a 32 bit system this program has allocates about 100 MB of RAM:


 import std.bigint;
 int main() {
     static assert(BigInt.sizeof == 12);
     auto data = new BigInt[2_000_000];
     foreach (i; 0 .. data.length)
         data[i] = BigInt("9223372036854775808123");

     int count;
     foreach (i; 0 .. 2_000_000_000) { count++; }
     return count;
 }



 It means about 50 bytes/biginteger.

 Every zero BigInt struct needs 12 bytes inside the 'data' array.

 The represented numeral requires about 73 bits, this means 3 words (with  
 lot of wasted space):

 l = 9223372036854775808123
 from math import log
 log(l, 2)



 72.965784284662092


 Using so much memory allows to store less numbers, and makes them slower  
 too (I have hit both problems). Do you know why D BigInts use so much  
 memory?

 Bye,
 bearophile

So the BigInt struct has a size of 3 machine words:
"data": a dynamic array (2 words) pointing to a block of uints
"signed": a flag (1 word including alignment) that can be avoided
           if you only need positive BigUInts.

During parsing the "data" array is set to a predicted size of 4 uints (16  
bytes) for your number, which makes a total of 28 bytes. That is what you  
must expect as a minimum from the implementation. Now when I run your code  
I get between 45 to 46 bytes per BigInt. So that is a difference of 17 to  
18 bytes.
That is the overhead of D's memory management. You get the same 17 to 18  
additional bytes if you replace your BigInt with a dynamic array of uints  
and initialize each of them like this:

      auto data = new uint[][2_000_000];
      foreach (i; 0 .. data.length)
          data[i] = new uint[4];

Oct 13 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 13 Oct 2011 11:00:54 -0400, Marco Leise <Marco.Leise gmx.de> wrote:

 Am 13.10.2011, 02:26 Uhr, schrieb bearophile <bearophileHUGS lycos.com>:

 On a 32 bit system this program has allocates about 100 MB of RAM:


 import std.bigint;
 int main() {
     static assert(BigInt.sizeof == 12);
     auto data = new BigInt[2_000_000];
     foreach (i; 0 .. data.length)
         data[i] = BigInt("9223372036854775808123");

     int count;
     foreach (i; 0 .. 2_000_000_000) { count++; }
     return count;
 }



 It means about 50 bytes/biginteger.

 Every zero BigInt struct needs 12 bytes inside the 'data' array.

 The represented numeral requires about 73 bits, this means 3 words  
 (with lot of wasted space):

 l = 9223372036854775808123
 from math import log
 log(l, 2)



 72.965784284662092


 Using so much memory allows to store less numbers, and makes them  
 slower too (I have hit both problems). Do you know why D BigInts use so  
 much memory?

 Bye,
 bearophile

 So the BigInt struct has a size of 3 machine words:
 "data": a dynamic array (2 words) pointing to a block of uints
 "signed": a flag (1 word including alignment) that can be avoided
            if you only need positive BigUInts.

 During parsing the "data" array is set to a predicted size of 4 uints  
 (16 bytes) for your number, which makes a total of 28 bytes. That is  
 what you must expect as a minimum from the implementation. Now when I  
 run your code I get between 45 to 46 bytes per BigInt. So that is a  
 difference of 17 to 18 bytes.
 That is the overhead of D's memory management. You get the same 17 to 18  
 additional bytes if you replace your BigInt with a dynamic array of  
 uints and initialize each of them like this:

       auto data = new uint[][2_000_000];
       foreach (i; 0 .. data.length)
           data[i] = new uint[4];

Allocating an array of 4 uints allocates a block large enough to hold 7  
(block of 32 bytes)  This is likely where the overhead is coming from.

I'd have to look at the BigInt code, but if it's not using append  
operations, it should probably be creating the array using GC.malloc  
instead of new-ing an array.

-Steve

Oct 13 2011

Jay Norwood <jayn prismnet.com> writes:

 I ran your test and measured what is allocated for a couple of individual
allocations, and then looked at the ptr offsets at the end.

I see 32 bytes being allocated for each new, but at the end I see over 100MB
offset in the ptr range.

If I try with new char[12], which would be the storage needed for sign byte
plus base 256 data, the allocations are 16 bytes with the same test, and around
69MB offsets in the ptr range for all the allocations.

So ... looks to me  the char array implementation would  would be a significant
improvement in terms of required memory for this particular case.

Here is the code
	auto data = new uint[][2_000_000];
	 foreach (i; 0 .. data.length)
	 data[i] = new uint[4];
	 auto p0 = data[0].ptr;
	 auto sz0 =   GC.sizeOf(p0);
	 auto p1 = data[1].ptr;
	 auto sz1 =   GC.sizeOf(p1);
	 auto pn = data[1999999].ptr;
	 writeln("p0=",p0 ," sz0=",sz0," p1=",p1," sz1=",sz1," pn=", pn);

Oct 13 2011

D Programming

C/C++ Programming

Other

digitalmars.D - BigInt memory usage