www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - BigInts in Tango/Phobos

reply bearophile <bearophileHUGS lycos.com> writes:
I have tried to find Don's email address, but so far I have failed (and I have
not seen him on IRC d.tango), so I talk here, even if this isn't the best place.

I think this is the data representation of a Tango BigInt:

alias uint BigDigit;
const BigDigit[] ZERO = [0];

struct BigUint {
  BigDigit[] data = ZERO;
}

struct BigInt {
  BigUint data;     // BigInt adds signed arithmetic to BigUint.
  bool sign = false;
}



I think the following representation may be better:

struct BigInt {
  size_t* data;
  int sizeVal;  // sign(sizeVal) is always the sign of the BigInt.
                // If data!=null then abs(sizeVal) is the actually size used
                // of the memory block, otherwise sizeVal is the value of this
                // BigInt (useful for small BigInts), with overflow tests too. 
            
  int capacity; // The available size of the block
}

So the operations on such BigInts are cheap if the numbers can be stored in a
32 bit int (it just has to test if data is null and then perform a safe
operation on 32 bit numbers). This first part can be done in a small struct
method that I hope will always be inlined. If data isn't null then there's a
call to the bigint machienery.
When a 32 bit sizeVal produces overflow the bigint is converted into a normal
bigint.
Small integers are very common, so speeding up this case is very useful.

Overflow tests in C:
http://www.fefe.de/intof.html

The capacity/sizeVal pair is modeled on a similar pair of the C++ vector, so it
can mutate inplace and grow avoiding some reallocations (GNU GMP too allows
inplace mutation).

In theory inside such BigInt struct there's space for a 64 bit value (as a
union) but I think 64 bit values aren't common enouggh to justify the little
slowdown on smaller values.

Bye,
bearophile
Sep 09 2009
parent Don <nospam nospam.com> writes:
bearophile wrote:
 I have tried to find Don's email address, but so far I have failed (and I have
not seen him on IRC d.tango), so I talk here, even if this isn't the best place.
You should post this kind of thing as a ticket in Tango.
 I think this is the data representation of a Tango BigInt:
 
 alias uint BigDigit;
 const BigDigit[] ZERO = [0];
 
 struct BigUint {
   BigDigit[] data = ZERO;
 }
 
 struct BigInt {
   BigUint data;     // BigInt adds signed arithmetic to BigUint.
   bool sign = false;
 }
 
 I think the following representation may be better:
 
 struct BigInt {
   size_t* data;
   int sizeVal;  // sign(sizeVal) is always the sign of the BigInt.
                 // If data!=null then abs(sizeVal) is the actually size used
                 // of the memory block, otherwise sizeVal is the value of this
                 // BigInt (useful for small BigInts), with overflow tests too.
             
   int capacity; // The available size of the block
 }
 
 So the operations on such BigInts are cheap if the numbers can be stored in a
32 bit int (it just has to test if data is null and then perform a safe
operation on 32 bit numbers). This first part can be done in a small struct
method that I hope will always be inlined. If data isn't null then there's a
call to the bigint machienery.
 When a 32 bit sizeVal produces overflow the bigint is converted into a normal
bigint.
Yes, I've considered doing something like this. But I'm actually not happy at all with copy-on-write BigInts as used in Tango, they result in a huge amount of wasted memory allocation. They are the only option in D1, but I'll do it differently in D2. I've really just concentrated on getting the low-level routines working, and exposing a minimal wrapper for functionality.My feeling was that the optimal design for BigInt couldn't be settled without reference to BigFloat. Someone had said they were working on a BigFloat, but it doesn't seem to have materialized.
 Small integers are very common, so speeding up this case is very useful.
 
 Overflow tests in C:
 http://www.fefe.de/intof.html
 
 The capacity/sizeVal pair is modeled on a similar pair of the C++ vector, so
it can mutate inplace and grow avoiding some reallocations (GNU GMP too allows
inplace mutation).
Yes. But note that mutate in-place is impossible with copy-on-write, so it's D2 only. BigInts are also the perfect place for memory pools. Temporary variables can in theory be dealt with extremely efficiently.
 In theory inside such BigInt struct there's space for a 64 bit value (as a
union) but I think 64 bit values aren't common enouggh to justify the little
slowdown on smaller values.
You're probably right. But I'm concentrating on compiler bugfixes right now, BigInt is on hold for now.
Sep 10 2009