digitalmars.D - customized "new" and pointer alignment

%u (23/23) Jan 29 2007 no answer from digitalmars.D.learn, try it here.

%u (4/27) Jan 29 2007 Also buffer is declared as:

Jarrett Billingsley (8/11) Jan 29 2007 The size of a void is (AFAIK) defined to be the smallest addressable (or...

Chris Paulson-Ellis (26/29) Jan 30 2007 The Unicode term is "code unit".

Jarrett Billingsley (4/5) Jan 30 2007 Code _unit_. I thought it was something like that. I think the people ...

=?ISO-8859-1?Q?Julio_C=E9sar_Carrascal_Urquijo?= (4/6) Jan 31 2007 Not all of them. UTF-8 was designed in one night and it is still better

Kevin Bealer (15/51) Jan 30 2007 Most allocators align data to the largest primitive excepting perhaps

%u <new new.com> writes:

no answer from digitalmars.D.learn, try it here.

== Posted at 2007/01/29 15:52 to digitalmars.D.learn

I want to do explicit memory allocation for some of my objects,

I'm reading:  http://digitalmars.com/d/memory.html#newdelete

which says:


alignment. This is 8 on win32 systems.

Then on the next section:

http://digitalmars.com/d/memory.html#markrelease

new(size_t sz)
    {   void *p;

 p = &buffer[bufindex];
 bufindex += sz;
 if (bufindex > buffer.length)
     throw new OutOfMemory;
 return p;
    }

Is this code correct? I mean the object size (sz) could be any integer, how
can one ensure the alignment requirement?

If the above code in "Mark/Release" is incorrect, can anyone tell me how to
return aligned memory pointers?  and for lots of small objects, does alignment
waste too much memory?

Thanks.

Jan 29 2007

%u <new new.com> writes:

== Quote from %u (new new.com)'s article
 no answer from digitalmars.D.learn, try it here.
 == Posted at 2007/01/29 15:52 to digitalmars.D.learn
 I want to do explicit memory allocation for some of my objects,
 I'm reading:  http://digitalmars.com/d/memory.html#newdelete
 which says:

 alignment. This is 8 on win32 systems.
 Then on the next section:
 http://digitalmars.com/d/memory.html#markrelease
 new(size_t sz)
     {   void *p;
  p = &buffer[bufindex];
  bufindex += sz;
  if (bufindex > buffer.length)
      throw new OutOfMemory;
  return p;
     }
 Is this code correct? I mean the object size (sz) could be any integer, how
 can one ensure the alignment requirement?
 If the above code in "Mark/Release" is incorrect, can anyone tell me how to
 return aligned memory pointers?  and for lots of small objects, does alignment
 waste too much memory?
 Thanks.

Also buffer is declared as:

void[] buffer;

Is the size of void is the same as char?

Jan 29 2007

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"%u" <new new.com> wrote in message news:epm1ep$1ld5$1 digitaldaemon.com...

 Also buffer is declared as:

 void[] buffer;

 Is the size of void is the same as char?

The size of a void is (AFAIK) defined to be the smallest addressable (or 
maybe manipulatable) data unit on the machine.  So on most computers, it'll 
be an 8-bit byte.

The size of a char variable is always 8 bits, because it's a UTF-8 
something-or-other.  It's not a codepoint, it's a ...?  But it's always 8 
bits.

So the two sizes are the same mostly by coincidence.

Jan 29 2007

Chris Paulson-Ellis <chris edesix.com> writes:

Jarrett Billingsley wrote:
 The size of a char variable is always 8 bits, because it's a UTF-8 
 something-or-other.  It's not a codepoint, it's a ...?  But it's always 8 
 bits.

The Unicode term is "code unit".

For the benefit of the Unicode uninitiated, the D spec could be clearer 
on this point. Despite its name, a char variable does not hold a 
character, but rather a single unit of the UTF-8 character encoding.

For example, the UTF-8 code unit sequence 0xE2 0x82 0xAC decodes into 
U+20AC, the Unicode code point for the Euro currency symbol character, �.

Similarly, the wchar type is defined to be a UTF-16 code unit, which is 
usually the same as the corresponding code point, but not for code 
points > U+FFFF, which are encoded using 2 code units (called a 
surrogate pair).

The dchar type is a UTF-32 code unit. These are the same as the code 
points, except for values > U+10FFFF which are beyond the range of 
Unicode. You are free to use out of range values to mean something 
within your application, but they will never represent Unicode characters.

Another complication arises from the fact that the UTF encodings can 
encode "non-character" code points (anything ending in FFFE or FFFF, 
such as U+FFFE or U+3FFFF). Similarly, the "surrogates" (the code points 
with the same values as the code units used by UTF-16 to encode code 
points > U+FFFF) are not characters even though they can be represented 
in UTF-8 or UTF-32. So even a char or wchar sequence that decodes okay 
or a single dchar may not be a "character". Again, you can use these 
code points within your application, but in the words of the code page 
for U+FFF[EF], they are "not valid for interchange".

Nothing is ever crystal clear in Unicode land.

Chris.

Jan 30 2007

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"Chris Paulson-Ellis" <chris edesix.com> wrote in message 
news:epohto$1qsd$1 digitaldaemon.com...
 The Unicode term is "code unit".

Code _unit_.  I thought it was something like that.  I think the people who 
come up with Unicode have a little too much time on their hands.

Jan 30 2007

=?ISO-8859-1?Q?Julio_C=E9sar_Carrascal_Urquijo?= writes:

Jarrett Billingsley wrote:
 Code _unit_.  I thought it was something like that.  I think the people who 
 come up with Unicode have a little too much time on their hands. 

Not all of them. UTF-8 was designed in one night and it is still better 
than UTF-16:

http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt

Jan 31 2007

Kevin Bealer <kevinbealer gmail.com> writes:

%u wrote:
 no answer from digitalmars.D.learn, try it here.
 
 == Posted at 2007/01/29 15:52 to digitalmars.D.learn
 
 I want to do explicit memory allocation for some of my objects,
 
 I'm reading:  http://digitalmars.com/d/memory.html#newdelete
 
 which says:
 

 alignment. This is 8 on win32 systems.
 
 Then on the next section:
 
 http://digitalmars.com/d/memory.html#markrelease
 
 new(size_t sz)
     {   void *p;
 
  p = &buffer[bufindex];
  bufindex += sz;
  if (bufindex > buffer.length)
      throw new OutOfMemory;
  return p;
     }
 
 Is this code correct? I mean the object size (sz) could be any integer, how
 can one ensure the alignment requirement?
 
 If the above code in "Mark/Release" is incorrect, can anyone tell me how to
 return aligned memory pointers?  and for lots of small objects, does alignment
 waste too much memory?
 
 Thanks.
 

Most allocators align data to the largest primitive excepting perhaps 
'real'.  The malloc() manpage says 'for any data type' which implies at 
least 64 bits, but I'm not sure I would rely on more than 32 bit aligns 
on a 32 bit system.

In the calling code you can align data with something like this, not 
tested btw:

// Allocate sz bytes, aligned to a multiple of aln.

void[] data = new void[sz + aln-1]; // enough bytes to make this work
int shift = (aln - (data.ptr & (aln-1))) & (aln-1);
data = data[shift..sz+shift]; // slice the array at a multiple of al

This assumes 'al' is a power of 2.  I can't imagine why a non-power-of-2 
alignment would be useful, but you could replace "& (x-1)" with "% x" to 
get that.

Kevin

Jan 30 2007

D Programming

C/C++ Programming

Other

digitalmars.D - customized "new" and pointer alignment