D - Garbage Collection is bad... other comments

rafael baptista (65/65) Jan 12 2003 I just saw the D spec today... and it is quite wonderful. There are so m...

Clay (12/13) Jan 12 2003 I disagree. For business applications, it is nice not to have to spend ...

Evan McClanahan (23/40) Jan 13 2003 I know Mr. Baptisa by reputation, so I think that I can anticipate some

rafael b aptista (37/43) Jan 13 2003 Yeah, that describes me pretty well. Wow. I had no idea I had a reputati...

Evan McClanahan (21/75) Jan 13 2003 I applied to your company a long time ago (assuming that you're who I
Sean L. Palmer (45/88) Jan 13 2003 reputation!

Mike Wynn (12/21) Jan 13 2003 and you can therefore create all your required objects up front, and man...

Walter (55/77) Jan 23 2003 Some great comments! My replies are inline below.

Ilya Minkov (26/26) Jan 28 2003 Hello.

Russell Lewis (17/27) Jan 13 2003 gc.disable();
Ilya Minkov (52/52) Jan 13 2003 Please, believe me garbage collection (GC) is the future of game design....

Burton Radons (18/25) Jan 13 2003 Type-based searching isn't implemented yet. DMD should produce a
rafael baptista (35/54) Jan 13 2003 You are right. That is why game programmers avoid accessing slow devices...

Ross Judson (13/23) Jan 13 2003 Dude, what do you think Java (and most other GC-enabled environments)

Evan McClanahan (11/24) Jan 14 2003 But D is nicer than C, and I'd rather use it, and have it be a general

Bill Cox (22/25) Jan 24 2003 Hi.

Ilya Minkov (83/115) Jan 24 2003 Hello.

Bill Cox (26/40) Jan 25 2003 I didn't make myself clear. I don't write recursive destructors. Our

Ilya Minkov (32/54) Jan 25 2003 What further work does it do?

Bill Cox (137/178) Jan 25 2003 Ok. Here's a description of our CASE tool:

Ilya Minkov (30/32) Jan 31 2003 Seems to be kept secret though. I was searching for it, and was unable

Sean L. Palmer (18/50) Feb 01 2003 This is why functional languages always copy data.

Mike Wynn (21/29) Feb 01 2003 the
Mike Wynn (21/29) Feb 01 2003 the
Walter (9/16) Feb 05 2003 the

Ilya Minkov (12/17) Feb 05 2003 Granted the GC is on. If it's off, these all will pollute the memory.

Walter (4/15) Feb 05 2003 I hear you, but I really think it's worth just taking advantage of the G...

Bill Cox (36/72) Feb 02 2003 Try: www.viasic.com/download/datadraw.tar.gz

rafael baptista (33/55) Jan 14 2003 Thats right, a GC will normally allocate memory in large blocks, which i...

Ross Judson (12/26) Jan 14 2003 Two things come to mind...first, some customers care about performance.

Ilya Minkov (34/81) Jan 14 2003 OK. Let me see. I've got an OpenGL game somewhere which is written in

Ilya Minkov (12/13) Jan 14 2003 ^^^^^^^

Walter (12/14) Jan 23 2003 stalling
Achillefs Margaritis (91/156) Jan 30 2003 C++ can be fully garbage-collected. Just make a GC super class of all

Bill Cox (32/91) Jan 31 2003 This looks like fairly useful code in some cases, but it's not real GC.
Ilya Minkov (51/53) Jan 31 2003 Walter considers C++ too flexible. And he doesn't rob you of C++, he

Burton Radons (11/16) Jan 31 2003 Let's see what happens when type-based scanning is in first; nobody

Walter (6/16) Feb 03 2003 I have a lot of ideas about improving GC, such as making it a copying

Evan McClanahan (4/24) Feb 09 2003 Implementing hardware write barriers? I didn't think that the hardware

Mike Wynn (4/11) Feb 09 2003 I took that to be; marking the page as read only. and catching the page

Evan McClanahan (12/18) Feb 09 2003 From what I have read, can't find the link right now, that usually

Mike Wynn (22/39) Feb 09 2003 on a pentium the page can be 4K or 4M
C (11/33) Feb 08 2003 Every time the page status is changed the TLB cache would

Walter (7/10) Feb 03 2003 global

rafael baptista <rafael_member pathlink.com> writes:

I just saw the D spec today... and it is quite wonderful. There are so many ways
that C and C++ could be better and D addresses them. But I think D is a little
too ambitious to its detriment. It has a good side: it fixes syntactic and
semantic problems with C/C++. It eliminates all kinds of misfeatures and cruft
in C++.

And D has a bad side: it attempts to implement all kinds of features that are
best left to the programmer - not the language spec: built in string comparison,
dynamic arrays, associative arrays, but the worst in my mind is the inclusion of
garbage collection.

As is pointed out in the D overview garbage collection makes D unusable for real
time programming - this makes it unusable for Kernel programming, device
drivers, computer games, any real time graphics, industrial control etc. etc. It
limits the language in a way that is really not necessary. Why not just add in
language features that make it easier to integrate GC without making it a part
of the language.

For my part I wouldn't use GC on any project in any case. My experience has been
that GC is inferior to manually managing memory in every case anyway. Why?

Garbage collection is slower: There is an argument in the Garbage collection
section of the D documents that GC is somehow faster than manually managing
memory. Essentially the argument boils down to that totally bungling manual
memory management is slower than GC. Ok that is true. If I program with tons of
redundant destructors, and have massive memory leaks, and spend lots of time
manually allocating and deallocating small buffers, and constantly twiddling
thousands of ref counts then managing memory manually is a disaster. Most
programs manage memory properly and the overhead of memory management is not a
problem.

The argument that GC does not suffer from memory leaks is also invalid. In GC'd
system, if I leave around references to memory that I never intend to use again
I am leaking memory just as surely as if I did not call delete on an allocated
buffer in c++.

Finally there is the argument that manually managing memory is somehow difficult
and error prone. In C++ managing memory is simple - you create a few templatized
container classes at the start of a project. They will allocate and free *all*
of the memory in your program. You verify that they work correctly and then
write the rest of your program using these classes. If you are calling "new" all
over your C++ code you are asking for trouble. The advantage is that you have
full control over how memory is managed in your program. I can plan exactly how
I will deal with the different types of memory, where my data structures are
placed in memory. I can code explicity to make sure that certain data structures
will be in cache memory at the same time.

I had a similar reaction of having the compiler automatically build me hash
tables, string classes and dynamic arrays. I normally code these things myself
with particular attention to how they are going to be used. The unit test stuff
is also redundant. I can chose to implement that stuff whether the language
supports it or not. I normally make unit test programs that generate output that
I can regress against earlier builds - but I wouldn't want to saddle arbitrary
programs with having to run my often quite extensive and time consuming unit
tests at startup. Imaginary numbers and bit fields should similarly be left up
to the programmer.

I suppose I can just chose not to use these features if I don't want to - but it
strikes me as unnecessary cruft - and I subscribe to the idea beautifully
expressed in the D overview that removing cruft makes a language better.

The GC is worse though - because its in there operating - arbitrarily stalling
my program whether I use it or not.

I urge the implementor of D to think about it this way: you have lots of ideas.
Many of them are really good - but it is likely that despite your best efforts
not all of them are. It would make your project much more likely to succeed if
you decoupled the ideas as much as possible such that the good ones can succeed
without being weighed down by the bad. So wherever possible I think you should
move functionality from the language spec to a standard library. This way the
library can continue to improve, while the language proper could stabilize
early.

This is probably old ground - I assume lots of people have probably already
commented on the GC. I looked for a thread on GC and could not find one. 

I for one would find D a lot more attractive if it did not have GC.

Jan 12 2003

Clay <Clay_member pathlink.com> writes:

In article <avsfoq$6nc$1 digitaldaemon.com>, rafael baptista says...
I for one would find D a lot more attractive if it did not have GC.

I disagree.  For business applications, it is nice not to have to spend time
thinking about memory management, b/c it isn't worth the gains at runtime.  For
simple senarios, STL and GC are probably the same complexity for the programmer.
When the variable goes out of scope within a method, it can be cleaned up.

But for more complicated code, it helps that we do not think about memory
management and more about the software.  For example, when a object is created
in class A, given to class B and C, and either B or C will clean it up (it is
not know ahead of time), a GC is much easier to write for. 

It comes down to, "who is D for?". But, I wonder if it is possible to create a
language that allows for both GC and STL memory management.

disclaimer: I am familiar with STL, but I have never used it.

Jan 12 2003

Evan McClanahan <evan dontSPAMaltarinteractive.com> writes:

Clay wrote:
 In article <avsfoq$6nc$1 digitaldaemon.com>, rafael baptista says...
 
I for one would find D a lot more attractive if it did not have GC.

 
 
 I disagree.  For business applications, it is nice not to have to spend time
 thinking about memory management, b/c it isn't worth the gains at runtime.  For
 simple senarios, STL and GC are probably the same complexity for the
programmer.
 When the variable goes out of scope within a method, it can be cleaned up.
 
 But for more complicated code, it helps that we do not think about memory
 management and more about the software.  For example, when a object is created
 in class A, given to class B and C, and either B or C will clean it up (it is
 not know ahead of time), a GC is much easier to write for. 
 
 It comes down to, "who is D for?". But, I wonder if it is possible to create a
 language that allows for both GC and STL memory management.

I know Mr. Baptisa by reputation, so I think that I can anticipate some 
of his concerns (if he's the same person I'm thinking of).  As he's a 
games programmer, he's very concered with more or less realtime 
performance, and performance with small bits of memory to allocate. I've 
thought of many of the same issues, being in the same field, and I think 
that it's more a matter of problem domain than anything else.  I was 
trying to think of a way to do GC on a console, and I've basicaly come 
to the conclusion that it's more trouble than it would be worth.  While 
GC is great for people with less pressing time and space concerns, like 
yourself, seemingly, it lacks a lot of fine grained control, and 
regaining that control would almost require writing a special case CG 
for every game, which would certainly take as much at as all of the 
memory management and debugging on a project.  I think that his argument 
is similar to something that I've been saying, which is that memory 
management should be more cleanly modularized, so that one can pick and 
choose among strategies, without much language level interference.

The unittest comments are less well founded, since unittest go away in 
release builds.  I imagine that it's easy enough to do testing with the 
systems that are in there without many problems.  Kinda a non-issue for me.

I agree that some of the builtins should be moved into the library, but 
I don't really have the time right now to go very deeply into it.

Evan

Jan 13 2003

rafael b aptista <rafael_member pathlink.com> writes:

In article <avu432$224e$1 digitaldaemon.com>, Evan McClanahan says...
I know Mr. Baptisa by reputation, so I think that I can anticipate some 
of his concerns (if he's the same person I'm thinking of).  As he's a 
games programmer, he's very concered with more or less realtime 
performance, and performance with small bits of memory to allocate. 

Yeah, that describes me pretty well. Wow. I had no idea I had a reputation!
Evan, you and I are in perfect agreement on GC, I think - it may be good for
some other projects but no for what we do.


The unittest comments are less well founded, since unittest go away in 
release builds. 

Yeah, I don't agree with you about that. I think unit tests are a good idea.
Some people would implement them as a series of quick sanity checks on their
libraries that would run at load time in debug builds. A proper unit test in my
mind is a console app that does an exhaustive check on all of your code and
tests all of your assumptions and prints out a long log of all the things it
tried. Every time you make changes to a lib, you run the unit test and then diff
the log against a previous run of the log and make sure that the results are the
same - or that they only differ in expected ways. You then check into source
code control the updated diff log. I have known companies that have implemented
source code control in such a way that you cannot check your files back in until
the system has regressed all the unit tests.

Running unit tests at startup time has many disadvantages:

1. Even in a debug build you have limits on how long you are willing to wait
through a unit test. With an offline unit test you can make it do everything you
want.

2. Unit tests are better as text oriented apps, and most modern applications are
graphical. You would have to make the unit test spew to a log, and then go check
the log. Similarly it is a pain to try to run a gui app in a script. So if you
want to make scripts that regress your libraries you have to make dummy console
apps that only run the unit test in a text mode.

3. Unit tests running at application startup time are not running when they
should run. You want a unit test to run every time you change the code that it
tests - not necessarily every time you start up your main application. 

Making a unit test function for a class be a class member function has another
disadvantage: You are running the unit test after the class constructor, and the
unit test is oriented toward testing only that one instance. To unit test a
class I prefer to make a non-member function that exercises the whole API in one
function - including all the constructors.

Experienced programmers can disagree about the best way to do unit tests - and
that is exactly why unit tests should not be built in to the language. 

The way that D supports unit tests is not the way most programmers who implement
unit test chose to implement them. So you are codifying into the language
something which is not established as the best practice anyway.

Jan 13 2003

Evan McClanahan <evan dontSPAMaltarinteractive.com> writes:

rafael b aptista wrote:
 In article <avu432$224e$1 digitaldaemon.com>, Evan McClanahan says...
 
I know Mr. Baptisa by reputation, so I think that I can anticipate some 
of his concerns (if he's the same person I'm thinking of).  As he's a 
games programmer, he's very concered with more or less realtime 
performance, and performance with small bits of memory to allocate. 

 
 
 Yeah, that describes me pretty well. Wow. I had no idea I had a reputation!
 Evan, you and I are in perfect agreement on GC, I think - it may be good for
 some other projects but no for what we do.

I applied to your company a long time ago (assuming that you're who I 
think that you are and not someone with the same name), which is how I 
know who you are (or might be). Game programming is a smallish world 
anyway. Some interesting stuff on your site.  I assumed that your 
perspective would be coming from GBA developement, and went from there. 
  A GC would be disastrous in that environment, and I think that it 
would be bad for a lot of system level stuff as well.  But then, I think 
that my perfect language would be almost nothing in the system core, a 
robust syntax extender, and a biggish library of standard extensions, so 
that might explain the slant in my perspective.

The unittest comments are less well founded, since unittest go away in 
release builds. 

 
 Yeah, I don't agree with you about that. I think unit tests are a good idea.
 Some people would implement them as a series of quick sanity checks on their
 libraries that would run at load time in debug builds. A proper unit test in my
 mind is a console app that does an exhaustive check on all of your code and
 tests all of your assumptions and prints out a long log of all the things it
 tried. Every time you make changes to a lib, you run the unit test and then
diff
 the log against a previous run of the log and make sure that the results are
the
 same - or that they only differ in expected ways. You then check into source
 code control the updated diff log. I have known companies that have implemented
 source code control in such a way that you cannot check your files back in
until
 the system has regressed all the unit tests.
 
 Running unit tests at startup time has many disadvantages:
 
 1. Even in a debug build you have limits on how long you are willing to wait
 through a unit test. With an offline unit test you can make it do everything
you
 want.
 
 2. Unit tests are better as text oriented apps, and most modern applications
are
 graphical. You would have to make the unit test spew to a log, and then go
check
 the log. Similarly it is a pain to try to run a gui app in a script. So if you
 want to make scripts that regress your libraries you have to make dummy console
 apps that only run the unit test in a text mode.
 
 3. Unit tests running at application startup time are not running when they
 should run. You want a unit test to run every time you change the code that it
 tests - not necessarily every time you start up your main application. 
 
 Making a unit test function for a class be a class member function has another
 disadvantage: You are running the unit test after the class constructor, and
the
 unit test is oriented toward testing only that one instance. To unit test a
 class I prefer to make a non-member function that exercises the whole API in
one
 function - including all the constructors.
 
 Experienced programmers can disagree about the best way to do unit tests - and
 that is exactly why unit tests should not be built in to the language. 
 
 The way that D supports unit tests is not the way most programmers who
implement
 unit test chose to implement them. So you are codifying into the language
 something which is not established as the best practice anyway.

Ok, I can see your point.  I don't think, though that they're insanely 
harmful, as you can still write traditional unittests of the kind that 
you describe, and if people decide that the unittests don't work as a 
normal feature, then they'll go away eventually.  I've used the other 
DBC features more, so you're likely right.  Without a proper computer or 
internet connection at home, I haven't had a chance to program in D as 
much as I would like to, and the programs that I have had a chance to 
write were untested and undocumented hacks, so what do i know about 
testing? :)

Evan

Jan 13 2003

"Sean L. Palmer" <seanpalmer directvinternet.com> writes:

"rafael b aptista" <rafael_member pathlink.com> wrote in message
news:avukdf$2f0g$1 digitaldaemon.com...
 In article <avu432$224e$1 digitaldaemon.com>, Evan McClanahan says...
I know Mr. Baptisa by reputation, so I think that I can anticipate some
of his concerns (if he's the same person I'm thinking of).  As he's a
games programmer, he's very concered with more or less realtime
performance, and performance with small bits of memory to allocate.

 Yeah, that describes me pretty well. Wow. I had no idea I had a

reputation!
 Evan, you and I are in perfect agreement on GC, I think - it may be good

for
 some other projects but no for what we do.

On a console game, it's always possible to just allocate the entire memory
from the system and manage it yourself.  That seems to be what everybody
does in C++.  In D, you can disable GC, and even if you don't, it doesn't
run unless you run out of memory.

The problem is that D doesn't have good RAII semantics, so implementing your
own memory management is a painful process resembling C++ explicit new and
delete.  That method is bug ridden.

In fact I am not sure D allows you to run ctors on arbitrary memory, so it
wouldn't be possible to write your own 'new' function template.

The unittest comments are less well founded, since unittest go away in
release builds.

 Yeah, I don't agree with you about that. I think unit tests are a good

idea.
 Some people would implement them as a series of quick sanity checks on

their
 libraries that would run at load time in debug builds. A proper unit test

in my
 mind is a console app that does an exhaustive check on all of your code

and
 tests all of your assumptions and prints out a long log of all the things

it
 tried. Every time you make changes to a lib, you run the unit test and

then diff
 the log against a previous run of the log and make sure that the results

are the
 same - or that they only differ in expected ways. You then check into

source
 code control the updated diff log. I have known companies that have

implemented
 source code control in such a way that you cannot check your files back in

until
 the system has regressed all the unit tests.

 Running unit tests at startup time has many disadvantages:

 1. Even in a debug build you have limits on how long you are willing to

wait
 through a unit test. With an offline unit test you can make it do

everything you
 want.

 2. Unit tests are better as text oriented apps, and most modern

applications are
 graphical. You would have to make the unit test spew to a log, and then go

check
 the log. Similarly it is a pain to try to run a gui app in a script. So if

you
 want to make scripts that regress your libraries you have to make dummy

console
 apps that only run the unit test in a text mode.

 3. Unit tests running at application startup time are not running when

they
 should run. You want a unit test to run every time you change the code

that it
 tests - not necessarily every time you start up your main application.

 Making a unit test function for a class be a class member function has

another
 disadvantage: You are running the unit test after the class constructor,

and the
 unit test is oriented toward testing only that one instance. To unit test

a
 class I prefer to make a non-member function that exercises the whole API

in one
 function - including all the constructors.

Unit tests can be global module-scope functions too.

 Experienced programmers can disagree about the best way to do unit tests -

and
 that is exactly why unit tests should not be built in to the language.

Seems like it's more of a

#ifdef UNITTESTS
RunUnitTest();
#endif UNITTESTS

 The way that D supports unit tests is not the way most programmers who

implement
 unit test chose to implement them. So you are codifying into the language
 something which is not established as the best practice anyway.

I kinda like integrating the code for the unit tests into the module it
tests.  But I would like more control over when they are run.

Sean

Jan 13 2003

"Mike Wynn" <mike.wynn l8night.co.uk> writes:

 On a console game, it's always possible to just allocate the entire memory
 from the system and manage it yourself.  That seems to be what everybody
 does in C++.  In D, you can disable GC, and even if you don't, it doesn't
 run unless you run out of memory.

and you can therefore create all your required objects up front, and manage
then manually;
(as Java games / embedded programmers do)

 The problem is that D doesn't have good RAII semantics, so implementing

your
 own memory management is a painful process resembling C++ explicit new and
 delete.  That method is bug ridden.
 In fact I am not sure D allows you to run ctors on arbitrary memory, so it
 wouldn't be possible to write your own 'new' function template.

agree, and D with placement constructors would be good, the effect on the GC
(if not disabled) might be undesirable
and would allow some nasty 'unions' to be created (construct two objects in
the same space).
not only would you need to inform the GC that a memory range was under your
control, but any object in there would still need to be walked as they could
contain refs to GC managed objects. (one of the stumbleing blocks with
realtime Java)

Jan 13 2003

"Walter" <walter digitalmars.com> writes:

Some great comments! My replies are inline below.

"rafael b aptista" <rafael_member pathlink.com> wrote in message
news:avukdf$2f0g$1 digitaldaemon.com...
 Running unit tests at startup time has many disadvantages:
 1. Even in a debug build you have limits on how long you are willing to

wait
 through a unit test. With an offline unit test you can make it do

everything you
 want.

A unit test is not a complete replacement for a comprehensive test suite. It
just tests the functionality of individual parts of the project. Unit tests
can also be turned on or off on a module by module basis - once one section
is debugged, you can turn off unit tests for it until it gets changed. My
own experience with D unit testing has been that it has saved me from many,
many embarassingly stupid bugs. I don't mind waiting a couple seconds on
startup for a debug build. If it takes a long time to run the unit tests,
it's probably worth looking at the unit tests and pulling the time consuming
bits out into an offline test.

That's really what I do with the D compiler system itself. The basics are
done with unit tests, the comprehensive time consuming tests are done with a
separate test suite. But the unit tests assure me a basic level of
functionality every time without needing to run the long test suite.

 2. Unit tests are better as text oriented apps, and most modern

applications are
 graphical. You would have to make the unit test spew to a log, and then go

check
 the log. Similarly it is a pain to try to run a gui app in a script. So if

you
 want to make scripts that regress your libraries you have to make dummy

console
 apps that only run the unit test in a text mode.

Unit tests in D are designed to throw an exception when they fail. Catching
exceptions and putting up a message box for them isn't a big problem. When
I've worked on gui apps, I did use log files extensively. Instead of
printf'ing to the console, I just printf'd to a log file.

 3. Unit tests running at application startup time are not running when

they
 should run. You want a unit test to run every time you change the code

that it
 tests - not necessarily every time you start up your main application.

Doing it when the program starts is a reliable way of doing it. For example,
if you have module A and module B, A depends on B, and the code for B
changes, it's nice to get the unit test for A run too. Having some external
mechanism decide which unit tests to run when particular modules change is
difficult to get right, and is certainly wrong if it decides solely on the
basis of source file changes.

 Making a unit test function for a class be a class member function has

another
 disadvantage: You are running the unit test after the class constructor,

and the
 unit test is oriented toward testing only that one instance. To unit test

a
 class I prefer to make a non-member function that exercises the whole API

in one
 function - including all the constructors.

You can do that in D with a module level unit test.

 Experienced programmers can disagree about the best way to do unit tests -

and
 that is exactly why unit tests should not be built in to the language.

I suggest you could say that about every language feature - just look at
this forum! In any case, in my experience, very very few programmers do unit
tests. Why? Because it's a pain to do without language support. In D, unit
tests are quick and easy to add, they are right next to the code being
tested, and so are more likely to get written and be up to date. Running
them at startup means they are more likely to be run.

 The way that D supports unit tests is not the way most programmers who

implement
 unit test chose to implement them. So you are codifying into the language
 something which is not established as the best practice anyway.

D does not prevent doing unit tests in a non-builtin manner. But (as far as
I know) D is the very first language to build in the concept of unit tests,
in a quick, convenient, and reliable manner. I suggest that this will
encourage a much greater percentage of programmers taking the time to even
write any unit tests, which will be all to the better.

I won't argue that D is the best way possible to do unit tests, experience
using the language over time may easilly suggest improvements. But it's a
great start, and unit tests are a badly needed language feature.

Jan 23 2003

Ilya Minkov <midiclub tiscali.de> writes:

Hello.

There exists a GC algorithm, which can run without stopping the 
execution at all, in parallel, and is thus okay for realtime systems.

Three Colour Marking GC

Each object has a marking field, which can hold one of three colors:
  - white: objects to which no references were found. All objects start 
out white.
  - grey: object is known to be reachable (referenced), but wasn't 
scanned for references yet;
  - black: completely inspected. Shall not be revisited by a scanner, 
unless it changes color.

The only runtime modification is that whenever pointers are being 
modifed, their containers have to be re-colored from black to grey. 
After there are no more grey objects, white ones can be collected.

That's how it should work in principle. You can imagine some sufficently 
efficient implementatations.

This algorithm is also in a modified form used in OCaml - that's how 
come that games are written in it. After starting out in mid-seventies 
it has born a number of related algorithms.

Hm. This leads again to the question of a flexible code generation, so 
that it's possible to plug in one or another GC system, or possibly 
another memory managers. Or structural optimisations. A complex feature 
not only by itself, but also for interface design. But on the other hand 
it could take some burden off compiler writer and pass it to library 
writers.

-i.

Jan 28 2003

Russell Lewis <spamhole-2001-07-16 deming-os.org> writes:

rafael baptista wrote:
 I just saw the D spec today... and it is quite wonderful. There are so many
ways
 that C and C++ could be better and D addresses them. But I think D is a little
 too ambitious to its detriment. It has a good side: it fixes syntactic and
 semantic problems with C/C++. It eliminates all kinds of misfeatures and cruft
 in C++.
 
 And D has a bad side: it attempts to implement all kinds of features that are
 best left to the programmer - not the language spec: built in string
comparison,
 dynamic arrays, associative arrays, but the worst in my mind is the inclusion
of
 garbage collection.

gc.disable();



In Walter's compiler (although other compilers will differ in the 
future, I suppose) the GC only runs when the memory allocation routines 
have run out of buffers that they've gotten from the OS.  That is, the 
underlying routines for 'new' grab somewhat large blocks of virtual 
memory from the OS, and piece them up into individual allocations.  When 
the allocator has used all of its memory, it runs the garbage collector 
BEFORE it asks the OS for more memory.  It only talks to the OS if it 
cannot find any place in its current space to make the allocation.

If you disable the garbage collector, then it just doesn't do GC when 
the memory allocator runs out of memory.  Just use 'new' and 'delete' as 
you would have in C++.

You can also just disable the gc for a small time:
	gc.disable();
	DoTimeCriticalThing();
	gc.enable();

Jan 13 2003

Ilya Minkov <midiclub 8ung.at> writes:

Please, believe me garbage collection (GC) is the future of game design. 
"Realtime programming" doesn't really include games and other things 
called "realtime", since they have to cope with disc access and such, 
graphic card, other devices, which are  NOT REALTIME! You send some 
command and wait for it to be completed, and this usually lasts much, 
much, much, much longer than a GC cycle.


guess. (www.hugi.de) He proposes highly efficient structures for games, 
which are highly complex meshes of (concave) objects. Now, as I read 
this through i was fascinated. But as soon as i thought of 
implementation, i got sudden strong headaches. As later on i was 
learning about Hans Boehms GC for C, then about functional programming 
in OCaml (GC-ed language), it became evident to me, that such structures 
as proposed by TAD can be very easily and efficiently implemented with 
GC and only this way. Ref-counting ultimately fails in many cases.

To the question of efficiency. Though the current implementation of D GC 
is said to be very simple-minded and thus not very efficient, there 
exists a very efficient implementation for C which is farly popular. 
I've read of experiences of creating a simple IDE (Wedit) for LCC-Win32, 
a very simple C compiler. The author claims, that plugging in the Hans 
Boehm GC into the project reduced the time requiered to load a project 
to 1/8 of original, because GC-malloc is simply better optimized than 
the normal one.

The manual says that Boehm GC proves to be significantly faster for 
almost any application managing many tiny objects, but tends to be 
slower for applications working with large objects only. The reason for 
the first claim is that it performs a number of optimisations (heap 
compaction), helping keep objects which are used together near each 
other, which drastically reduces cache misses, and as you might know the 
cache is about as fast as the CPU, while the RAM speed has been 
improving much slower than the CPU/Cache speed and the impact is already 
huge, and is likely to grow in the future. I guess D GC doesn't do such 
things yet, but they are to come. These optimisations can only be 
efficiently implemented in conjuction with a GC, that is, if you do 
them, you've got a GC basically for free.

The second claim is based upon the fact, that Boehm GC tends to spend 
too much time figuring out what is a pointer and what not by content. 
You can limit such pointless search for large pointerless (pun intended) 
objects, but it doesn't solve the issue in general. D GC uses type 
information and *never* searches large masses without use, so that you 
can assume that it should be faster than Boehm GC for large objects. 
That is, D gc should also work cleaner.

And if you're still not convinced, go on and turn it off. You're just 
not doing yourself a favor. D GC does the same amount of work what 
ref-counting does once in a while, while ref-counting has to do that at 
almost any operation.

You can also cause GC cycles by timer once every now and then, keeping 
memory usage constantly low and cycle times short, at the cost of making 
them longer in general. The way i'm currently aware of is to disable and 
then enable the GC, not sure it works.

regards,
-i./midiclub

Jan 13 2003

Burton Radons <loth users.sourceforge.net> writes:

Ilya Minkov wrote:
 The second claim is based upon the fact, that Boehm GC tends to spend 
 too much time figuring out what is a pointer and what not by content. 
 You can limit such pointless search for large pointerless (pun intended) 
 objects, but it doesn't solve the issue in general. D GC uses type 
 information and *never* searches large masses without use, so that you 
 can assume that it should be faster than Boehm GC for large objects. 
 That is, D gc should also work cleaner.

Type-based searching isn't implemented yet.  DMD should produce a 
super-optimised scanning function for each allocated and public type in 
this module.

D's GC could also do block allocation for small objects or those which 
have proper characteristics in previous runs (good but fairly constant 
use, with high turnover of allocation and freeing).  That makes 
allocation and freeing take just a few cycles and reduces overhead.

The effect of this is incalculable at this time as it gets into cache 
issues, but I've seen the content-to-pointers ratio go in the hundreds 
when dealing with mesh vertices (textures were uploaded and then freed, 
otherwise it would be one pointer for 700Kb data).  I'm very optimistic 
that I'll be able to continually run the GC on any game.  But I'm even 
more sure that it's worth working for.

If the GC delay is still too great, then all games have sequence points. 
  When the player changes a level or a discrete segment, or is dead, or 
is clearly AFK, or if the player is swapping out, loading, saving, 
switching from the menu, or is not being attacked, then is your time.

Jan 13 2003

rafael baptista <rafael_member pathlink.com> writes:

In article <avv7ve$2tos$1 digitaldaemon.com>, Ilya Minkov says...
Please, believe me garbage collection (GC) is the future of game design.

I doubt it.

"Realtime programming" doesn't really include games and other things 
called "realtime", since they have to cope with disc access and such, 
graphic card, other devices, which are  NOT REALTIME! 

You are right. That is why game programmers avoid accessing slow devices like
disks in the middle of a game. If you absolutely must get something off disk you
normally stream it in. You set up an interrupt driven process or something like
that. So say I need a new model off the disk in a few seconds, I would place a
request to the disk streaming subsystem to start getting the stuff off the disk.
If you sit there like fool waiting for your CD drive to spin up, then  your game
is very choppy.

You don't even have to believe me. 

I don't have to believe you. I write games for a living. ;)

The author claims, that plugging in the Hans 
Boehm GC into the project reduced the time requiered to load a project 
to 1/8 of original, because GC-malloc is simply better optimized than 
the normal one.
The manual says that Boehm GC proves to be significantly faster for 
almost any application managing many tiny objects, but tends to be 
slower for applications working with large objects only.

I don't doubt it. Asking the operating system for a buffer is very very slow. On
windows it actually pauses your thread, and does an OS context switch. If you
are allocating tiny buffers straight from the OS all the time, you need to go
back to programming school. Memory should be allocated from the OS in large
chunks and then doled out inside your progam as needed - using your own
optimized allocators. This is what gets you cache coherency. This is what
guarantees that all the vertex data for a model is in cache when you transform
it, etc.

It is not fair to compare using the most highly optimized GC you have ever heard
of against bad manual memory management practices.

The reason for 
the first claim is that it performs a number of optimisations (heap 
compaction), 

There is no heap compaction better than me placing my data structures in memory
explicitly for cache coherency. Plus there is no penalty for me having holes in
the memory space that are bigger than a memory page anyway. If a whole page is
unused it will never be loaded into the cache and will never trouble me.
Compacting it away is just wasting bandwidth on the bus that I could be using to
load textures.

And if you're still not convinced, go on and turn it off. You're just 
not doing yourself a favor. D GC does the same amount of work what 
ref-counting does once in a while, while ref-counting has to do that at 
almost any operation.

The alternative to GC is not ref counting. Few objects have such an
unpredictable life cycle that they cannot be managed any other way. GC
essentially embed the ref counting inside the GC subsystem and makes you do some
kind of reference tracking for all memory buffer wether you need it or not. I
ref count very few objects in a given project. And when I do I'm typically ref
counting entire objects, not every single buffer. Plus I can manipulate the ref
counts only when it matters. I don't have a GC system internally twiddling
pointer ref counts mindlessly when all that is happening is that a temporary
pointer was made for a function call or some other limited scope.

Jan 13 2003

Ross Judson <d soletta.com> writes:

rafael baptista wrote:
 
 I don't doubt it. Asking the operating system for a buffer is very very slow.
On
 windows it actually pauses your thread, and does an OS context switch. If you
 are allocating tiny buffers straight from the OS all the time, you need to go
 back to programming school. Memory should be allocated from the OS in large
 chunks and then doled out inside your progam as needed - using your own
 optimized allocators. This is what gets you cache coherency. This is what
 guarantees that all the vertex data for a model is in cache when you transform
 it, etc.
 

Dude, what do you think Java (and most other GC-enabled environments) 
do?  They allocate big buffers from the OS, then divide it into various 
categories of storage (in Java's case, various generations).  Highly 
efficient strategies are then used to quickly allocate chunks of memory 
that used and discarded quickly.  More stable allocations are 
transferred into storage that has strategies appropriate to that.

If you're doing your own memory allocation code, you have to do a LOT of 
work, and write a LOT of bug free code, to get to where the Java 
allocator is.

D needs GC.  You're doing realtime work; stick with C, which is highly 
appropriate to that task!

RJ

Jan 13 2003

Evan McClanahan <evan dontSPAMaltarinteractive.com> writes:

Ross Judson wrote:
 Dude, what do you think Java (and most other GC-enabled environments) 
 do?  They allocate big buffers from the OS, then divide it into various 
 categories of storage (in Java's case, various generations).  Highly 
 efficient strategies are then used to quickly allocate chunks of memory 
 that used and discarded quickly.  More stable allocations are 
 transferred into storage that has strategies appropriate to that.
 
 If you're doing your own memory allocation code, you have to do a LOT of 
 work, and write a LOT of bug free code, to get to where the Java 
 allocator is.
 
 D needs GC.  You're doing realtime work; stick with C, which is highly 
 appropriate to that task!

But D is nicer than C, and I'd rather use it, and have it be a general 
usage language, rather than a rapid development language for apps that 
need more speed than VB.  I like GC, don't get me wrong.  But without 
hardware and OS support, GC is always going to have some problems with 
certain kinds of applications.  Since that support doesn't seem to be 
forthcoming in the near future, D should be ready to accept those cases 
and have a simple method for *transparently* swapping out what happens 
with someone writes new or delete.  I shouldn't have to be doing 
anything complicated to switch it all out.

Evan

Jan 14 2003

Bill Cox <bill viasic.com> writes:

Hi.

I'm new to this news group, but not to language design.  D can't be all 
things to all people, so we might as well live with it's focus.  Garbage 
collection is already there, and that defines the scope of the D 
language about as much as any other feature.

That said, here's my thoughts about GC in general:

GC in general buys very little.  Anyone writing complicated programs 
already have to write recursive destructors to remove objects from 
relationships with other objects.  The value of GC is simply that one 
extra line (delete myObject;) doesn't have to be typed.

People argue that GC helps reduce dangling pointer bugs, but rather than 
a dangling pointer, you've got a memory leak.

While GC doesn't really hurt a language like D for most applications, it 
trashes it's use (as D's author points out) for real time programming 
(not going to see any kernels written in D any time soon).

There's also another weak reason I dislike GC: synthesis of 
datastructures into hardware really doesn't work in a GC based language.

 If you're doing your own memory allocation code, you have to do a LOT 
 of work, and write a LOT of bug free code, to get to where the Java 
 allocator is.


In any current language, you'd be right.  However, highly optimized 
memory management can and should be generated by compilers.  We do this 
now where I work.  I type no extra code, but get automatically generated 
free-lists, and bug-free recursive destructors.  It's kind of nice.

Bill Cox

Jan 24 2003

Ilya Minkov <midiclub 8ung.at> writes:

Hello.

Mind what you say.

Though recursive destruction is sufficent for a great number of 
problems, for some it is not. I hve caught myself organising data nicely 
in a tree, then giving the pointer to some small branch somewhere and 
destroying the tree. Done. The program doesn't spend even a bit too much 
memory. Even less than requiered. Perfect, right?

Furthermore, i've accidentally deleted an object which is still in use, 
but the holder doesn't get notified of it anyway. It just *has* to crash 
as soon as any access is done.

Such domains lead to some necessity to keep track of how many users an 
object has, and this was *sometimes hard* in Delphi. And error-prone.

This manner of organising data in recursively-destroyed structures is 
typical of experienced programmers, to have some central point to keep 
it all, and is not requiered in most cases where GC is used. It simply 
requeres a different management for data.

Avoid common roots. Only actual users may have references to objects. 
Then they shall be collected if they're no use and are in a way.

Those people like you with their old habits may try GC someday on some 
program of theirs, which was working well, and notice that GC has 
spoiled it, but the usual cause for it is not obeying these simple 
rules, which only simplify the design and not complicate it.

For some uses though, a GC appears to be of no use. For these cases, i 
see a necessity of a separate heap for by-design non-GCed objects, since 
there still might be objects which requiere GC in a multi-module 
programme, like strings and such. This restricts nothing, and the "mark" 
phase time would decimate vastly if a significant portion is not GCed, 
not to mention it would already be very short as soon as mark-methods 
are implemented.

A simple implementation would be that "non-GCed" objects have an empty 
or programmer-written "mark" method (that is, a method which is called 
by a GC, so that an object can mark the objects it's using), and should 
not be deleted if it has not been marked by any other objects. Simply by 
marking itself, or by more sophisticated means.

I admit that my idea, that every object which provedes a corresponding 
method would be thus informed of GC-collected information is not always 
implementable. In some implementations it is quite possible, that the 
only thing which it would get is whether it was marked at all or not, 
but not how often it has been done. But i'd wish to know how many 
references to "me" exist at the time of mark phase, if it's possible.

-i.

PS. Embedded comments follow

  |
  |
\|/
  '

Bill Cox wrote:
 Hi.
 
 I'm new to this news group, but not to language design.  D can't be all 
 things to all people, so we might as well live with it's focus.  Garbage 
 collection is already there, and that defines the scope of the D 
 language about as much as any other feature.
 
 That said, here's my thoughts about GC in general:
 
 GC in general buys very little.  Anyone writing complicated programs 
 already have to write recursive destructors to remove objects from 
 relationships with other objects.  The value of GC is simply that one 
 extra line (delete myObject;) doesn't have to be typed.

It is a vast help in simple cases. Don't tell me that noone writes 
simple programs. Or doesn't want to write a complex programme in a 
simple manner. Simple units can very well evolve to complex programs, i 
hope in D to a much further extent as in C++.

 People argue that GC helps reduce dangling pointer bugs, but rather than 
 a dangling pointer, you've got a memory leak.

Only without a GC, or with non-GC-friendly structures, which in my 
experience tend not to appear by themselves, but rather on purpose of 
conventional memory management. And don't tell me you've nevel leaked 
memory... You may have leaked tiny objects and wouldn't notice it. I did 
notice sometimes, but only ocassionaly.

A good language supported GC (eg Caml) *can't* leak memory at a full 
mark cycle, because it colects all memory not being referenced. Simply 
NULL out your pointers and you're done. As soon as your application or 
possibly some other application needs memory, they would be collected. 
Do you really care what your "mem-meter" shows you if you don't intend 
to use it?

Boehm GC can leak memory, but the reason is the absense of language 
support, and that it's thus unable to be sure of some things.

 While GC doesn't really hurt a language like D for most applications, it 
 trashes it's use (as D's author points out) for real time programming 
 (not going to see any kernels written in D any time soon).

Why? D is not Java. GC is not forced. And else it is inherently not even 
a bit less efficient than C. Rather more, if Walter decides to get rid 
of C calling convention for usual fuctions. :>

 There's also another weak reason I dislike GC: synthesis of 
 datastructures into hardware really doesn't work in a GC based language.

???

 If you're doing your own memory allocation code, you have to do a LOT 
 of work, and write a LOT of bug free code, to get to where the Java 
 allocator is.


 
 
 In any current language, you'd be right.  However, highly optimized 
 memory management can and should be generated by compilers.  We do this 
 now where I work.  I type no extra code, but get automatically generated 
 free-lists, and bug-free recursive destructors.  It's kind of nice.

Bug-free recursive destruction is not a problem in my eyes. But keeping 
an eye that you always recursively copy objects and not give away 
references to them, or that you manipulate the refcounter correctly is 
tedious. GC would improve performance in certain cases.

Copy-on-write? Then you need a ref-counter. Or a garbage collection. 
Generational GC would do very well in such case. Like Caml shows, which 
(almost) always uses copy-on-write, eg when you change something in a 
linked list.

I'd rather be for a system-wide GC usage. Example: data passed to OpenGL 
is being copied. It is rediculous, simpler would be to use 
copy-on-change (and OpenGL usually doesn't change data), and to GC the 
leftovers. It is very probable that the data doesn't have to be copied 
at all then, because applications rarely change most kinds of data 
during some frame is drawn. Well, OpenGL inludes ways to make data 
transfers as rare as possible, but their use becomes tedious.

 
 Bill Cox

Jan 24 2003

Bill Cox <bill viasic.com> writes:

Hi.

 Though recursive destruction is sufficent for a great number of 
 problems, for some it is not. I hve caught myself organising data nicely 
 in a tree, then giving the pointer to some small branch somewhere and 
 destroying the tree. Done. The program doesn't spend even a bit too much 
 memory. Even less than requiered. Perfect, right?
 Furthermore, i've accidentally deleted an object which is still in use, 
 but the holder doesn't get notified of it anyway. It just *has* to crash 
 as soon as any access is done.

I didn't make myself clear.  I don't write recursive destructors.  Our 
data management CASE tool generates them for us, 100% automaticallay. 
It gets it right every time.  We never have any unaccounted for 
pointers.  It's fast and productive.  In fact, we don't even buy Purify, 
even though we program in C.  We just don't need it.  Seg-v's are mostly 
a thing of the past for us.

People using GC are still writing their own destructors, and still 
getting them wrong.  A great way to improve productivity is to use CASE 
tools to do it for you.  I really don't see the need for GC.

As case tools are often a sign that the language needs to be extended, 
I've recently written a language that eliminates the need for the CASE 
tools we use at work.  You have to go further.  One goal of the language 
was optimized to work well with code generators.  Every relationship 
(container class) generates code in both the parent and child classes, 
and updates the recursive destructor in each class.  You can't do that 
with templates.  You don't need a new language to do this, though.  As I 
said, we do this at work in C with code generators.

 There's also another weak reason I dislike GC: synthesis of 
 datastructures into hardware really doesn't work in a GC based language.

 
 
 ???
 

Basically, GC makes it harder to implement a program on hardware.  The 
field of reconfigurable computing is dedicated to doing this.  GC 
complicates things.

GC in D is fine.  You make your choices in language design, and 
obviously smart people often include GC.  It just makes the language 
less attractive for the few of us who need to code on the bleeding edge 
of speed (we write chip design software).

Bill Cox

Jan 25 2003

Ilya Minkov <midiclub 8ung.at> writes:

Bill Cox wrote:
 Hi.
 
 I didn't make myself clear.  I don't write recursive destructors.  Our 
 data management CASE tool generates them for us, 100% automaticallay. It 
 gets it right every time.  We never have any unaccounted for pointers.  
 It's fast and productive.  In fact, we don't even buy Purify, even 
 though we program in C.  We just don't need it.  Seg-v's are mostly a 
 thing of the past for us.

What further work does it do?
Recursive destruction is by far not the only problem. How is the rest of 
memory management dealt with? I mean, to prevent destruction of useful 
and how to track down objects not used anymore.

 People using GC are still writing their own destructors, and still 
 getting them wrong.  A great way to improve productivity is to use CASE 
 tools to do it for you.  I really don't see the need for GC.

Destructors are not requiered anymore, except for the cases where you 
need to destroy some hardware together with an object. :>  Well, there 
are a few special cases.

People make mistakes with any model. But IMO making no mistakes with GC 
is very easy.

Although i'm an advocate of GC, i'm open to every other solution.

 As case tools are often a sign that the language needs to be extended, 
 I've recently written a language that eliminates the need for the CASE 
 tools we use at work.  You have to go further.  One goal of the language 
 was optimized to work well with code generators.  Every relationship 
 (container class) generates code in both the parent and child classes, 
 and updates the recursive destructor in each class.  You can't do that 
 with templates.  You don't need a new language to do this, though.  As I 
 said, we do this at work in C with code generators.

Describe the extentions you consider useful in complete detail, else 
it's no use for Walter. Or give a link.

I appreciate that you have *done* something (esp. something that works), 
but what you've written here so far gives me no idea of what it is and 
how it should work. I'm not well-informed about CASE in general, and i 
bet the most of the newsgroup isn't as well.

There are currently multiple directions, where D should go. First, it 
has to become solid in its current form. Second, backends have to be 
written for different kinds of use, eg. a very portable VM. And then, 
i'll be making an effort to make writing compilers in D and language 
extentions to D easy, and i'll gladly take your proposal into account. 
Currently i'm looking at writing of COCOM-like library for D, enabling 
fast and flexible parsing and syntax tree transformation.

 Basically, GC makes it harder to implement a program on hardware.  The 
 field of reconfigurable computing is dedicated to doing this.  GC 
 complicates things.

Wouldn't you want to write a program in VHDL or such if you wanted to 
get hardware as output? Well, i can imagine that nowadays it is possible 
with higher-level languages. But isn't D an overkill?

Oh, you're working for an ASIC company. Then obviously "hard" management 
is better for *you* than a GC. But how should it work? A ref-counter can 
be a drop-in replacement for a GC (less reliable though), and how do you 
make the inegration of CASE?

-i.

Jan 25 2003

Bill Cox <bill viasic.com> writes:

HI, Ilya.

Ilya Minkov wrote:
 Bill Cox wrote:
 
 Hi.

 I didn't make myself clear.  I don't write recursive destructors.  Our 
 data management CASE tool generates them for us, 100% automaticallay. 
 It gets it right every time.  We never have any unaccounted for 
 pointers.  It's fast and productive.  In fact, we don't even buy 
 Purify, even though we program in C.  We just don't need it.  Seg-v's 
 are mostly a thing of the past for us.

 
 
 What further work does it do?
 Recursive destruction is by far not the only problem. How is the rest of 
 memory management dealt with? I mean, to prevent destruction of useful 
 and how to track down objects not used anymore.

Ok.  Here's a description of our CASE tool:

The case tool (DataDraw) is pretty simple.  It's open-source and 
compiles under Visual C++ on Windows.  I use it on Linux with wine (a 
Windows emulator).  You draw your class schema much as you would with a 
UML editor, or a database entity-relationship schema editor.  You add 
properties to classes by double clicking on them and adding the fields. 
  Then, to add relationships (container classes) between classes, you 
select a relationship drawing tool and draw arrows between the classes. 
  By double clicking on the relationship, you get to set it's type: 
one-to-one, linked list, tail-linked-list (linked list with a pointer to 
the last element), doubly-linked list, static array, dynamic array, 
name-based hash table, or static inheritance (standard C++ style 
inheritance).  You also get to specify whether or not the children 
should be deleted when the parent is deleted.  There are a couple other 
option: relationship names, and whether or not there is a back pointer 
to the parent embedded in the child.

After describing your data structures, it will generate code for you. 
There are several code generators to choose from, mostly variations of C 
coding styles, and one C++ style.  We use on of the C styles because it 
gives us dynamic inheritance, which is a big deal for the kind of 
software we write (integrated chip design tools that run off of a common 
database).  Static inheritance is great when you are constructing 
objects from scratch.  Our objects already exist in the database, and we 
need to extend them on-the-fly.  The code generator we use does this 
with no memory or speed penalty, since it represents all the class data 
in individual arrays of data.  To extend all the objects of a given 
class, the memory manager just allocates more arrays.  Instead of using 
C style pointers, we the code generator typedefs an index as the 
reference type for the class.  We get strong type checking in debug mode 
by typedefing the handles as pointers to dummy classes, and casting them 
to ints when accessed.  That sounds ugly, but it's all buried under the 
hood by automatically generated access macros that hide it all.

The memory managers in two of the code generators are quite advanced. 
First, each class has options you can select for how to manage it's 
memory.  Objects of some classes are never individually deleted, only 
all at once.  For those classes, the structures are put into a dynmic 
array.  Allocating an object typically is just a compare and an incremt 
(thee machine instructions, I think).  For most classes, however, you 
need both create and delete on the fly.  The code generator that 
generates more normal C code alloates memory one page at a time, and 
then sub-allocates your objects of a given class out of them, which 
optimizes both paging and cache hits.  Free lists are automatically 
generated and maintained.  Structures are also automatically alligned, 
with fields sorted by size.  The funkier code generator we use (the one 
that uses indexes into arrays instead of pointers to structures) uses 
dynamic arrays for all properties of all classes.  We've tested the 
performance of our code with both generators, and generally see a 10-20% 
speed increase using the dynamic array representation instead of 
structures.  This memory manager uses the trick of allocating 50% more 
memory each time a dynamic array needs more memory.

Recursive destructors are also generated, as are some other useful 
functions.  The recursive destructors delete each child in any 
relationship marked as needing the children deleted.  Other 
relationships are patched.  For example, code to remove the object from 
linked lists is generated.  Of particular interest to us is the very 
efficeint code for our container classes (relationships).  A linked list 
relationship generated by DataDraw embeds the next pointer in the child 
class, which is a far more efficeint implementation, in terms of both 
speed and memory.

In debug mode, all memory accesses are checked, including array indexing.

Some of the code generators also generate binary load/save, and full 
database integrety self-checks.  However, we've come to the conclusion 
that simple binary dumps make a poor save format, since it's hard to 
remain backwards compatible.  Also, since our pointers almost never get 
trashed, the database integrety self check is also no longer used.

There are ways our database can be trashed.  Users of fscanf run into 
it.  We sometimes get seg-v's with too few parameters to printf.  All in 
all, our coding pretty safe from these kinds of bugs.

...

 As case tools are often a sign that the language needs to be extended, 
 I've recently written a language that eliminates the need for the CASE 
 tools we use at work.  You have to go further.  One goal of the 
 language was optimized to work well with code generators.  Every 
 relationship (container class) generates code in both the parent and 
 child classes, and updates the recursive destructor in each class.  
 You can't do that with templates.  You don't need a new language to do 
 this, though.  As I said, we do this at work in C with code generators.

 
 
 Describe the extentions you consider useful in complete detail, else 
 it's no use for Walter. Or give a link.

I've written a simple code translator from a language that incorporates 
many of the featues of the DataDraw CASE tool.  To add the full power of 
code generators, you need to be able to compile and run code that 
generates code to be compiled.  This is basically templates on steriods.

I'm not familiar with the basic container class packages in D yet, so 
I'll just write out psuedo-code for an example code generator to add 
fields for a linked list to two classes.

generateRelFields(Relationship rel)
{
     Class parent = rel.parentClass;
     Class child = rel.childClass;
     if(rel.type == LINKED_LIST) {
         generate(parentName:parent.name, childName:child.name, 
relName:rel.name) {
             extend class <parentName> {
                 struct <relName> {
                     <childName> first;
                 }
                 add(<childName> child) {
                     child.<relName>.next = this.<relName>.first;
                     this.<relName>.first = child;
                 }
                 remove(<childName> child) {
                 ...
             }
             extend class <childName> {
                 struct <relName> {
                     <childName> next;
                     <parentName> owner;
                 }
             }
         }
         if(rel.deleteChildren) {
             generate(parentName:parent.name, childName:child.name, 
relName:rel.name) {
                 extend class <parentName> {
                     extend destroy() {
                         <childName> child, nextChild;
                         for(child = this.<relName>.first; child != 
null; child = nextChild) {
                             nextChild = child.<relName>.next;
                             child.destroy();
                         }
                     }
                 }
                 extend class <childName> {
                     extend destroy() {
                         if(child.<relName>.owner != null) {
                             child.<relName>.owner.remove(child);
                         }
                     }
                 }
             }
         }
     }
}

Note the addition of the generate statement, which is similar to 
templates.  Also notice that classes, methods, and functions can all be 
extended.  This is a code-generator friendly syntax.

 Basically, GC makes it harder to implement a program on hardware.  The 
 field of reconfigurable computing is dedicated to doing this.  GC 
 complicates things.

 
 
 Wouldn't you want to write a program in VHDL or such if you wanted to 
 get hardware as output? Well, i can imagine that nowadays it is possible 
 with higher-level languages. But isn't D an overkill?
 
 Oh, you're working for an ASIC company. Then obviously "hard" management 
 is better for *you* than a GC. But how should it work? A ref-counter can 
 be a drop-in replacement for a GC (less reliable though), and how do you 
 make the inegration of CASE?

VHDL (or Verilog) is the standard way to describe hardware.  Note that 
VHDL has a generate statement for generating hardware.  IMO, it's VHDL's 
single most important advantage over Verilog.

The reconfigurable computing guys are trying to compile normal programs 
into hardware.  It shouldn't be a goal of D, but it is something I 
dabble in.

Bill Cox

Jan 25 2003

Ilya Minkov <ilminkov planet-interkom.de> writes:

Hi.

Bill Cox wrote:
 The case tool (DataDraw) is pretty simple.  It's open-source and 
 compiles under Visual C++ on Windows. 

Seems to be kept secret though. I was searching for it, and was unable
to find.

The description is though very interesting, i still don't seem to
understand some mechanics about object deletion. This all seems to be
about having a central object repository this or another way, and scope
of objects has to be known explicitly. That is, AFAIU it automates the 
normal C/C++/Delphi data management ideoms?

I still don't understand how this simple (and for D vital) example is
handled: string operations.

Imagine, some function has created a string, and returns it. Now, some
operations were done on the string: copying, slicing, and so on. Copying
and some cases of slicing (i guess) don't change the string, so it is
not being copied for efficiency reasons. Now, the same piece of data has
multiple users. Now some user makes a change in data, desiring to keep
the changes private. Then there are two cases:
   - Noone else owns the data. It can be changed in-place;
   - Data has another users. It has to be copied.
This means it needs to know whether the piece of data has further users.
Delphi solves this for strings by maintainig the reference counter.
GC simplifies it, by allowing for "late" copying of data and then it
would find out what data needs to be removed. It was possible to prove,
that GC is more efficient that reference counting in a general case of
tiny objects and strings.

Come to think of it: why did the X-Window servers constantly leak memory 
before they became a GC attached? Because they handle small structures, 
not cenralized. And how do you get to know that you don't need a certain 
part in a centralized data storage? Or explain me what i get wrong.

-i.

Jan 31 2003

"Sean L. Palmer" <seanpalmer directvinternet.com> writes:

This is why functional languages always copy data.

I would think that the correct semantics is to always copy on write unless
the compiler can tell for absolute sure that there are no other users of the
data, in which case it can optimize to do the operation in place.  An array
passed by reference would be exempt from this;  however D currently has no
way at all to pass arrays by value, so we're forced to deal with the "is it
mine, is it yours, is it ours" mess on every front (user code, library code,
system code).

Ownership is the kind of thing that can't be verified in a precondition or
postcondition;  there's no way to tell in general without knowing the entire
program.  That is a job better suited to a machine than a human programmer.

That is why I would like the default to be pass by value, copy on write,
*unless* either the programmer requests pass by reference or the compiler
can tell for sure, due to scoping/data visibility/data flow analysis that
it's safe not to copy on write.  Err on the side of safety.

Sean

"Ilya Minkov" <ilminkov planet-interkom.de> wrote in message
news:b1ek0e$u42$1 digitaldaemon.com...
 Hi.

 Bill Cox wrote:
 The case tool (DataDraw) is pretty simple.  It's open-source and
 compiles under Visual C++ on Windows.

 Seems to be kept secret though. I was searching for it, and was unable
 to find.

 The description is though very interesting, i still don't seem to
 understand some mechanics about object deletion. This all seems to be
 about having a central object repository this or another way, and scope
 of objects has to be known explicitly. That is, AFAIU it automates the
 normal C/C++/Delphi data management ideoms?

 I still don't understand how this simple (and for D vital) example is
 handled: string operations.

 Imagine, some function has created a string, and returns it. Now, some
 operations were done on the string: copying, slicing, and so on. Copying
 and some cases of slicing (i guess) don't change the string, so it is
 not being copied for efficiency reasons. Now, the same piece of data has
 multiple users. Now some user makes a change in data, desiring to keep
 the changes private. Then there are two cases:
    - Noone else owns the data. It can be changed in-place;
    - Data has another users. It has to be copied.
 This means it needs to know whether the piece of data has further users.
 Delphi solves this for strings by maintainig the reference counter.
 GC simplifies it, by allowing for "late" copying of data and then it
 would find out what data needs to be removed. It was possible to prove,
 that GC is more efficient that reference counting in a general case of
 tiny objects and strings.

 Come to think of it: why did the X-Window servers constantly leak memory
 before they became a GC attached? Because they handle small structures,
 not cenralized. And how do you get to know that you don't need a certain
 part in a centralized data storage? Or explain me what i get wrong.

 -i.

Feb 01 2003

"Mike Wynn" <mike.wynn l8night.co.uk> writes:

"Sean L. Palmer" <seanpalmer directvinternet.com> wrote in message
news:b1ge7e$29kd$1 digitaldaemon.com...
 This is why functional languages always copy data.

 I would think that the correct semantics is to always copy on write unless
 the compiler can tell for absolute sure that there are no other users of

the
 data, in which case it can optimize to do the operation in place.  An

array
 passed by reference would be exempt from this;  however D currently has no
 way at all to pass arrays by value, so we're forced to deal with the "is

it
 mine, is it yours, is it ours" mess on every front (user code, library

code,
 system code).

it is, but there is an issue with threading,

if the function has a big delay, say it signal one event and then waits on
an other event, before reading or writing the item, which it expects is a
copy, and another thread is waiting for that first signal modifies the
original (which is was passed by ref or owns)
and signals the second event, the value with be the changed and not the
previous.
this can be avoided by copying in the function prolog if a write is detected
any where in the function, or programmer understanding the effects of lazy
copy on write. arrays already have odd 'in' behaviour, so I guess its o.k.
to perform a lazy copy as I doubt the condition would occur often and as
long as there is a way to force a copy
array.length += 0 ; // for instance (safe to do on a 0 length array unlike
array[0] = array[0];)

Mike.

Feb 01 2003

"Mike Wynn" <mike.wynn l8night.co.uk> writes:

"Sean L. Palmer" <seanpalmer directvinternet.com> wrote in message
news:b1ge7e$29kd$1 digitaldaemon.com...
 This is why functional languages always copy data.

 I would think that the correct semantics is to always copy on write unless
 the compiler can tell for absolute sure that there are no other users of

the
 data, in which case it can optimize to do the operation in place.  An

array
 passed by reference would be exempt from this;  however D currently has no
 way at all to pass arrays by value, so we're forced to deal with the "is

it
 mine, is it yours, is it ours" mess on every front (user code, library

code,
 system code).

it is, but there is an issue with threading,

if the function has a big delay, say it signal one event and then waits on
an other event, before reading or writing the item, which it expects is a
copy, and another thread is waiting for that first signal modifies the
original (which is was passed by ref or owns)
and signals the second event, the value with be the changed and not the
previous.
this can be avoided by copying in the function prolog if a write is detected
any where in the function, or programmer understanding the effects of lazy
copy on write. arrays already have odd 'in' behaviour, so I guess its o.k.
to perform a lazy copy as I doubt the condition would occur often and as
long as there is a way to force a copy
array.length += 0 ; // for instance (safe to do on a 0 length array unlike
array[0] = array[0];)

Mike.

Feb 01 2003

"Walter" <walter digitalmars.com> writes:

"Sean L. Palmer" <seanpalmer directvinternet.com> wrote in message
news:b1ge7e$29kd$1 digitaldaemon.com...
 I would think that the correct semantics is to always copy on write unless
 the compiler can tell for absolute sure that there are no other users of

the
 data, in which case it can optimize to do the operation in place.  An

array
 passed by reference would be exempt from this;  however D currently has no
 way at all to pass arrays by value, so we're forced to deal with the "is

it
 mine, is it yours, is it ours" mess on every front (user code, library

code,
 system code).

The best solution is, as you suggest, copy on write. In D, you don't really
need to know who owns it. Just practice copy on write. Any extra copies will
get cleaned up by the gc.

Feb 05 2003

Ilya Minkov <midiclub 8ung.at> writes:

Walter wrote:
 
 The best solution is, as you suggest, copy on write. In D, you don't really
 need to know who owns it. Just practice copy on write. Any extra copies will
 get cleaned up by the gc.
 

Granted the GC is on. If it's off, these all will pollute the memory. 
That's why a was arguing for 2 separate heaps, so that objects are 
created on GCed heap by default, or can (if the programmer is confident) 
created on a heap completely ignored by a GC.
I know that being on GC heap shouldn't be a performance hit, after GC is 
improved. Except for the cases when objects are very pointer-rich 
(trees, lists and such), but in some cases it just doesn't make sense to 
scan objects, because the programmer has decided to follow his old C 
habits and they are all referenced from somewhere anyway.
These should also be leak- and dangler-guarded at debug compile, and 
completely ignored by GC at release compile.

Feb 05 2003

"Walter" <walter digitalmars.com> writes:

"Ilya Minkov" <midiclub 8ung.at> wrote in message
news:b1rrn0$gps$1 digitaldaemon.com...
 Granted the GC is on. If it's off, these all will pollute the memory.
 That's why a was arguing for 2 separate heaps, so that objects are
 created on GCed heap by default, or can (if the programmer is confident)
 created on a heap completely ignored by a GC.
 I know that being on GC heap shouldn't be a performance hit, after GC is
 improved. Except for the cases when objects are very pointer-rich
 (trees, lists and such), but in some cases it just doesn't make sense to
 scan objects, because the programmer has decided to follow his old C
 habits and they are all referenced from somewhere anyway.
 These should also be leak- and dangler-guarded at debug compile, and
 completely ignored by GC at release compile.

I hear you, but I really think it's worth just taking advantage of the GC
and not worrying about it.

Feb 05 2003

Bill Cox <bill viasic.com> writes:

Hi, Ilya.

Ilya Minkov wrote:
 Hi.
 
 Bill Cox wrote:
 
 The case tool (DataDraw) is pretty simple.  It's open-source and 
 compiles under Visual C++ on Windows. 

 
 
 Seems to be kept secret though. I was searching for it, and was unable
 to find.

Try: www.viasic.com/download/datadraw.tar.gz

There ins't any active community working on it.  The code dates back to 
1991, and is a bit dated, and it's gotten pretty hacked up over the years.

Tools like these generally don't get used much.  I find it really odd. 
It seems to me that programmers aren't really focused on productivity. 
The entire field of CASE is full of good tools that no one ever uses.

 The description is though very interesting, i still don't seem to
 understand some mechanics about object deletion. This all seems to be
 about having a central object repository this or another way, and scope
 of objects has to be known explicitly. That is, AFAIU it automates the 
 normal C/C++/Delphi data management ideoms?

Most of the code generators generate simple custom object databases. 
There are ways, for example, of traversing all the objects of a given 
class.  However, there's really no magic here.  It just outputs code 
that you might be tempted to write, given the class definitions and 
relationships.  The nice thing is that it automates writing code that 
you might not have time to get right every time, like allocating memory 
for objects of a given class in pages, and then sub-allocating 
individual objects out of each page, and maintaining free lists for the 
class.

 I still don't understand how this simple (and for D vital) example is
 handled: string operations.
 
 Imagine, some function has created a string, and returns it. Now, some
 operations were done on the string: copying, slicing, and so on. Copying
 and some cases of slicing (i guess) don't change the string, so it is
 not being copied for efficiency reasons. Now, the same piece of data has
 multiple users. Now some user makes a change in data, desiring to keep
 the changes private. Then there are two cases:
   - Noone else owns the data. It can be changed in-place;
   - Data has another users. It has to be copied.
 This means it needs to know whether the piece of data has further users.
 Delphi solves this for strings by maintainig the reference counter.
 GC simplifies it, by allowing for "late" copying of data and then it
 would find out what data needs to be removed. It was possible to prove,
 that GC is more efficient that reference counting in a general case of
 tiny objects and strings.

Strings fall into a difficult space.  They're not really quit full blown
objects that represent something concrete.  DataDraw doesn't deal with
them, other than to allow you to copy them in and out of objects in the
database.

Strings, vectors, and matricies are "value" classes, rather than 
"object" classes.  "Values" tend to live on the stack, and go away at 
the end of scope.  "Objects" tend to live on the heap and have complex 
relationships to other objects.  The problem with strings is having to 
copy all that data around as you would a simple int value.  Yuk.

 Come to think of it: why did the X-Window servers constantly leak memory 
 before they became a GC attached? Because they handle small structures, 
 not cenralized. And how do you get to know that you don't need a certain 
 part in a centralized data storage? Or explain me what i get wrong.

Objects in the centralized storage go away when deleted.  It's really 
not much different than in a GC based system, since you still need to 
remove objects from their relationships to other objects when you want 
them to go away.  The anoying value objects with associated memory are a 
problem, but not for the centralized storage if you don't allow them there.

String manipulation is typically a significant focus for both language 
design and benchmarks.  Strings are a tricky issues that I feel tends to 
distract us from other important language applications.  D seems to have 
strings nailed.

Bill

Feb 02 2003

rafael baptista <rafael_member pathlink.com> writes:

rafael baptista wrote:
 
 Asking the operating system for a buffer is very very slow. On
 windows it actually pauses your thread, and does an OS context switch. If you
 are allocating tiny buffers straight from the OS all the time, you need to go
 back to programming school. Memory should be allocated from the OS in large
 chunks and then doled out inside your progam as needed - using your own
 optimized allocators. 
 

Ross Judson says...

Dude, what do you think Java (and most other GC-enabled environments) 
do?  They allocate big buffers from the OS, then divide it into various 
categories of storage (in Java's case, various generations).  Highly 
efficient strategies are then used to quickly allocate chunks of memory 
that used and discarded quickly.  More stable allocations are 
transferred into storage that has strategies appropriate to that.

If you're doing your own memory allocation code, you have to do a LOT of 
work, and write a LOT of bug free code, to get to where the Java 
allocator is.

D needs GC.  You're doing realtime work; stick with C, which is highly 
appropriate to that task!

RJ

Thats right, a GC will normally allocate memory in large blocks, which is a good
practice, and so will anyone doing proper memory management. So what I mean is
just that you can't say GC is faster by comparing it any number of bad practices
when managing memory manually.

There is no doubt about it, that you can make systems to automatically manage
memory that are resonably efficient in a given domain... which is why I want the
freedom to write my own. Its not that hard to do and has lots of advantages.

When necessary I will write my own GC-like manager for certain classes with
unpredictable life cycles ( and only for those ). I will write fixed size
allocators for other things. I may have tens of allocators going at once
guaranteeing that certain types of data will end up on the same page of memory -
for cache reasons.

Sometimes I want to replace my heap manager with a debug version that lets me
track memory leaks, or lets me graph what subsystems are using up the most
memory, or lets me see what subsystems are giving me the most cache misses etc.
I may want to receive a warning when certain systems are above their memory
budget. I may want to know about any buffers I have overwritten. I want low
level control of the memory management all the time. I want maximum flexibility
there.

I will accept a slow system defined "printf" function, or a slow file open
dialog, or a fat and slow directory listing function. Those are called
infrequently. But I don't want anything between me and the processor, and I
don't want anything between me and system memory.

Professional programmers know that their customers only care about performance.
No one buys software because the code is reusable, or easily to maintain, or
simple for newbie programmers to understand. Your customers buy what is the
fastest, has the most features, is easiest to use and makes best use of their
hardware. They don't care about anything that makes your life easier.

I won't be satisfied with the middle of the road, lowest common denominator,
useful for all purposes libraries that ship with my tools. I don't want to cede
even one cycle, one byte... nothing to my competitor.

Fast, proper, custom written memory management is not just for games. 

Also, ever wonder why you never see any shrink wrapped software written in Java?

Jan 14 2003

Ross Judson <d soletta.com> writes:

rafael baptista wrote:

 Professional programmers know that their customers only care about performance.
 No one buys software because the code is reusable, or easily to maintain, or
 simple for newbie programmers to understand. Your customers buy what is the
 fastest, has the most features, is easiest to use and makes best use of their
 hardware. They don't care about anything that makes your life easier.
 
 I won't be satisfied with the middle of the road, lowest common denominator,
 useful for all purposes libraries that ship with my tools. I don't want to cede
 even one cycle, one byte... nothing to my competitor.
 
 Fast, proper, custom written memory management is not just for games. 
 
 Also, ever wonder why you never see any shrink wrapped software written in
Java?
 

Two things come to mind...first, some customers care about performance. 
  Most care about _correctness_.  The more a programmer has to do 
himself, the more opportunity there is to blow the correctness goal. 
There's a reason we don't do much work in assembly any more.

I'd say that there are a decent number of Java client-side applications 
out there, but they're not all that well known.  Writing good 
client-side Java is hard work.  Check out http://www.jgoodies.com for 
some very polished work by a talented guy.

Ever wonder why you see less and less server-side software written in C? 
  Most of it is done in Java, these days...there's a reason for that!

RJ

Jan 14 2003

Ilya Minkov <midiclub 8ung.at> writes:

rafael baptista wrote:
 In article <avv7ve$2tos$1 digitaldaemon.com>, Ilya Minkov says...
 
 Please, believe me garbage collection (GC) is the future of game
 design.

 
 I doubt it.

OK. Let me see. I've got an OpenGL game somewhere which is written in 
OCaml. I'll try to compile it and tell you what it yields. I haven't 
seen it yet, it's not necessarily a complicated one where that all would 
matter.
OCaml is a 100% GC language with mixed powerful functional and 
imperative features, combined with OO, and there exists a GCC-based 
native compiler for it.

 and such, graphic card, other devices, which are  NOT REALTIME!

 
 You are right. That is why game programmers avoid accessing slow
 devices like disks in the middle of a game. If you absolutely must
 get something off disk you normally stream it in. You set up an
 interrupt driven process or something like that. So say I need a new
 model off the disk in a few seconds, I would place a request to the
 disk streaming subsystem to start getting the stuff off the disk. If
 you sit there like fool waiting for your CD drive to spin up, then
 your game is very choppy.

How do you know the OS doesn't want to access the disk for some purpose? 
It frequently does. :) BTW, i might be writing a game in a couple of 
erhh... years, and i'll try to remember what you said. ;)

 I don't have to believe you. I write games for a living. ;)

OK, OK, i respect Your opinion a lot, you old rusty... :>

 The author claims, that plugging in the Hans Boehm GC into ...

 I don't doubt it. Asking the operating system for a buffer is very
 very slow. On windows it actually pauses your thread, and does an OS
 context switch. If you are allocating tiny buffers straight from the
 OS all the time, you need to go back to programming school. Memory
 should be allocated from the OS in large chunks and then doled out
 inside your progam as needed - using your own optimized allocators.
 This is what gets you cache coherency. This is what guarantees that
 all the vertex data for a model is in cache when you transform it,
 etc.

Naturally, i won't be keeping vertices one by one. ;)

 It is not fair to compare using the most highly optimized GC you have
 ever heard of against bad manual memory management practices.

OK.

 There is no heap compaction better than me ...

Strong Ego. :>

 in memory explicitly for cache coherency. Plus there is no penalty
 for me having holes in the memory space that are bigger than a memory
 page anyway. If a whole page is unused it will never be loaded into
 the cache and will never trouble me. Compacting it away is just
 wasting bandwidth on the bus that I could be using to load textures.

It might just as well only compact the heap if it makes sense :) 
(probably late though). And it might avoid moving something if not 
really needed. It's more an implementation issue than of the principle.

 The alternative to GC is not ref counting. Few objects have such an 
 unpredictable life cycle that they cannot be managed any other way.

You probably mean unmanaged. :) Better GCs avoid bothering with old 
objects anyway.

 GC essentially embed the ref counting inside the GC subsystem and
 makes you do some kind of reference tracking for all memory buffer
 wether you need it or not. I ref count very few objects in a given
 project. 

But it does it only once in a while.

 And when I do I'm typically ref counting entire objects, not
 every single buffer. Plus I can manipulate the ref counts only when
 it matters. 

I know. I thought about it a lot. Of course even the structures i 
mentioned can be used with an efficient reference counting. I even came 
to a conclusion, that it's requiered and GC would be shut off unless 
either weak pointers, or object notification is implemented. under the 
last i mean, the GC could run a designated method of every object which 
provides it, to notify it that the GC cycle has just run and that they 
have so many "parents" and here's the handle to the rest of GC 
statistics and so on. So you could teach your object to commit a suicide 
after it's had only one parent for a while. This parent would be the 
central object repository, cache.

 I don't have a GC system internally twiddling pointer ref
 counts mindlessly when all that is happening is that a temporary 
 pointer was made for a function call or some other limited scope.

I don't think it's *that* important. It's just worth an experiment i'll 
be sure to do someday.

-i.

Jan 14 2003

Ilya Minkov <midiclub tiscali.de> writes:

Ilya Minkov wrote:
 which are highly complex meshes of (concave) objects. Now, as I read 

                                       ^^^^^^^
Sorry, it should read "convex". Please remind me if i ever make another 
mistake like that, and i'll blow my brain out. ;)

It has to do with the fact, that a ray hits a forward surface in a 
convex only once, so that back surface culling is sufficent for correct 
display, no sorting or z-buffer requiered. You simply start with a block 
you're in and draw the neighbors clipped through the portals. Though 
many aspects are not important for modern hardware anymore, the overdraw 
would get completely eliminated, and simlar techniques eliminate 
overdraw in the other cases.

-i.

Jan 14 2003

"Walter" <walter digitalmars.com> writes:

"rafael baptista" <rafael_member pathlink.com> wrote in message
news:avsfoq$6nc$1 digitaldaemon.com...
 The GC is worse though - because its in there operating - arbitrarily

stalling
 my program whether I use it or not.

In D, the GC *only* gets called when a call is made to allocate memory. It
is not in a separate thread arbitrarilly and randomly running.

To deal with time sensitive code, if you preallocate all the data you'll
need for the time sensitive portion, you'll never run the risk of the GC
running in the middle of it. Note that a similar problem exists if you use
malloc - the library can arbitrarilly decide to call the operating system to
allocate more memory to the process, which can take arbitrarilly long
amounts of time.

The only way to guarantee latency, even in C++, is to preallocate the data.

Jan 23 2003

"Achillefs Margaritis" <axilmar b-online.gr> writes:

C++ can be fully garbage-collected. Just make a GC super class of all
garbage collected classes that upon construction adds the object to a global
internal list, then clear the list from time to time.

For example:


class GC
{
public:
    GC();
    virtual ~GC();
    static void Clear();
};


static CArray<GC*> _gc;


GC::GC()
{
   _gc += this;
}


GC::~GC()
{
    _gc -= this;
}

void GC::Clear()
{
    for(i = 0; i < _gc.GetSize(); i++)
    {
        GC *gc = _gc[i];
        _gc[i] = 0;
        delete gc;
    }
    _gc.SetSize(0);
}


class Foo : public GC
{
};


int main()
{
    //foo is garbage collected
    Foo *foo = new Foo;

    //clear dangling references; will delete 'foo'
    GC::Clear();
}


C++ is so flexible!!! D should be that way also. Garbage collection should
be an option of the programmer.


"rafael baptista" <rafael_member pathlink.com> wrote in message
news:avsfoq$6nc$1 digitaldaemon.com...
 I just saw the D spec today... and it is quite wonderful. There are so

many ways
 that C and C++ could be better and D addresses them. But I think D is a

little
 too ambitious to its detriment. It has a good side: it fixes syntactic and
 semantic problems with C/C++. It eliminates all kinds of misfeatures and

cruft
 in C++.

 And D has a bad side: it attempts to implement all kinds of features that

are
 best left to the programmer - not the language spec: built in string

comparison,
 dynamic arrays, associative arrays, but the worst in my mind is the

inclusion of
 garbage collection.

 As is pointed out in the D overview garbage collection makes D unusable

for real
 time programming - this makes it unusable for Kernel programming, device
 drivers, computer games, any real time graphics, industrial control etc.

etc. It
 limits the language in a way that is really not necessary. Why not just

add in
 language features that make it easier to integrate GC without making it a

part
 of the language.

 For my part I wouldn't use GC on any project in any case. My experience

has been
 that GC is inferior to manually managing memory in every case anyway. Why?

 Garbage collection is slower: There is an argument in the Garbage

collection
 section of the D documents that GC is somehow faster than manually

managing
 memory. Essentially the argument boils down to that totally bungling

manual
 memory management is slower than GC. Ok that is true. If I program with

tons of
 redundant destructors, and have massive memory leaks, and spend lots of

time
 manually allocating and deallocating small buffers, and constantly

twiddling
 thousands of ref counts then managing memory manually is a disaster. Most
 programs manage memory properly and the overhead of memory management is

not a
 problem.

 The argument that GC does not suffer from memory leaks is also invalid. In

GC'd
 system, if I leave around references to memory that I never intend to use

again
 I am leaking memory just as surely as if I did not call delete on an

allocated
 buffer in c++.

 Finally there is the argument that manually managing memory is somehow

difficult
 and error prone. In C++ managing memory is simple - you create a few

templatized
 container classes at the start of a project. They will allocate and free

*all*
 of the memory in your program. You verify that they work correctly and

then
 write the rest of your program using these classes. If you are calling

"new" all
 over your C++ code you are asking for trouble. The advantage is that you

have
 full control over how memory is managed in your program. I can plan

exactly how
 I will deal with the different types of memory, where my data structures

are
 placed in memory. I can code explicity to make sure that certain data

structures
 will be in cache memory at the same time.

 I had a similar reaction of having the compiler automatically build me

hash
 tables, string classes and dynamic arrays. I normally code these things

myself
 with particular attention to how they are going to be used. The unit test

stuff
 is also redundant. I can chose to implement that stuff whether the

language
 supports it or not. I normally make unit test programs that generate

output that
 I can regress against earlier builds - but I wouldn't want to saddle

arbitrary
 programs with having to run my often quite extensive and time consuming

unit
 tests at startup. Imaginary numbers and bit fields should similarly be

left up
 to the programmer.

 I suppose I can just chose not to use these features if I don't want to -

but it
 strikes me as unnecessary cruft - and I subscribe to the idea beautifully
 expressed in the D overview that removing cruft makes a language better.

 The GC is worse though - because its in there operating - arbitrarily

stalling
 my program whether I use it or not.

 I urge the implementor of D to think about it this way: you have lots of

ideas.
 Many of them are really good - but it is likely that despite your best

efforts
 not all of them are. It would make your project much more likely to

succeed if
 you decoupled the ideas as much as possible such that the good ones can

succeed
 without being weighed down by the bad. So wherever possible I think you

should
 move functionality from the language spec to a standard library. This way

the
 library can continue to improve, while the language proper could stabilize
 early.

 This is probably old ground - I assume lots of people have probably

already
 commented on the GC. I looked for a thread on GC and could not find one.

 I for one would find D a lot more attractive if it did not have GC.

Jan 30 2003

Bill Cox <bill viasic.com> writes:

HI, Achillefs.

Achillefs Margaritis wrote:
 C++ can be fully garbage-collected. Just make a GC super class of all
 garbage collected classes that upon construction adds the object to a global
 internal list, then clear the list from time to time.
 
 For example:
 
 
 class GC
 {
 public:
     GC();
     virtual ~GC();
     static void Clear();
 };
 
 
 static CArray<GC*> _gc;
 
 
 GC::GC()
 {
    _gc += this;
 }
 
 
 GC::~GC()
 {
     _gc -= this;
 }
 
 void GC::Clear()
 {
     for(i = 0; i < _gc.GetSize(); i++)
     {
         GC *gc = _gc[i];
         _gc[i] = 0;
         delete gc;
     }
     _gc.SetSize(0);
 }
 
 
 class Foo : public GC
 {
 };
 
 
 int main()
 {
     //foo is garbage collected
     Foo *foo = new Foo;
 
     //clear dangling references; will delete 'foo'
     GC::Clear();
 }
 
 
 C++ is so flexible!!! D should be that way also. Garbage collection should
 be an option of the programmer.

This looks like fairly useful code in some cases, but it's not real GC. 
  It simply deletes ALL GC derived objects that have every been 
allocated but not manually deleted.

To make C++ do real GC is both very slow and very hard.  You can use 
handle classes that are actual class objects.  They get added to the 
root set when they are declared as local variables.  When they're 
destroyed, they are removed from the root set.  The container classes 
need to use handles that aren't in the root set, but which can be 
traversed given a pointer to the containing a GC derived object.  Yuk.

The more common way in C++ is reference counting, which can be done with 
handle classes easily, but at a significant performance cost.  Also, it 
doens't really work.  It can GC tree structures, but it fails to delete 
anything with a loop in it, which most data structures have.  Even a 
linked list typically has the child objects pointing back at the parent. 
  It's a loop.

If you want good GC, it needs to be supported directly in the language.

I just don't have any use for GC.  As I've said before, it doesn't 
really buy you much.  All it does is change a delete statement into a 
direct call to the destructor, which you have to write anyway, or you 
won't get the object out of it's relationships with other objects.  A 
better way is to automatically generate recursive destructors.  That's 
how it's done in relational databases (well, almost -- they do it in an 
interpreted way).  Just declare that a relationship delete's it children 
when the parent is deleted.  It couldn't be easier.

I've written languages that use GC (a Scheme interpreter).  I even used 
the cool in-place GC (not mark-and-sweep), that runs just as fast as 
stop-and-copy (faster if you take caching and paging into account). 
Sure GC can be fast.  But why complicate the language?  It just isn't 
worth it.

Bill

Jan 31 2003

Ilya Minkov <midiclub 8ung.at> writes:

Achillefs Margaritis wrote:
 C++ is so flexible!!! D should be that way also. Garbage collection should
 be an option of the programmer.

Walter considers C++ too flexible. And he doesn't rob you of C++, he 
gives you another language. And you can't implement an efficient GC in 
C. However, i can't see why such a C++ one can't be implemented on D - 
if you consider it better of course.

A GC can profit a lot from modified code generation. A compiler 
implementation could choose one of automatic memory management 
solutions, which would not requiere any changes in code, that is are 
equivalent to the programmer but having different efficiency 
characteristics.

  - simple mark-and-sweep collector, like D is using now is just the 
beginning - i have heard that it doesn't profit from code generation yet 
and hence keeps executable size small;
  - mark phase can be accelerated, if the compiler automatically 
generates an optimised marking method for each class. This is going to 
have a small size overhead, but reduce the total cost of GC to almost zero;
  - coloring collector: can work in background, and has very good 
run-time characteristics that it doesn't stop the main process, doesn't 
necessarily cause wild paging, and keeps memory footprint rather tight. 
It might slow down an application a bit, in part because it requieres 
wrapping pointer writes in a GC method. Might also increase the code size;
  - smart reference counting: i have stumbled over a document describing 
a reference counting memory manager, which can also deal with 
self-referencing structures. Its advantage is that the application uses 
exactly as much memory as it needs, but the slowdown is to be much 
higher than with a coloring GC, pauses might be the case, and a code 
size would increase significantly.

I'd say an implementation may be free to provide these or other 
automatic memory management systems, so far they are compliant with a GC 
interface.

I suppose that the memory manager interface is minimal on purpose, since 
things i have proposed before, like "mark-notification", are not 
possible with any managers except for mark-and-sweep.

I've read yet another, and this time better done comparison of "static" 
and "scripting" languages, and it turns out that the decisions of Perl 
and Python, to allow the programmer to do complex tasks in a simple way, 
usually shaves off half (or more) of the time requiered for writing an 
application. It may not be the most optimal way, but it is generally 
good for programme correctness. There, hundreds of programmers wrote a 
simple application in C, C++, Perl, Python, Ruby, Java, and TCL. It 
shows that though C and C++ might lead to an optimal solution, most 
often they lead to a very bad one since the programmer tries to 
accomplish a task in a limited time and simplifies the job. In that 
example, all entries done in languages with native, simple support for 
associative arrays were correct and run reasonbly fast, while the others 
lead to a number of broken programmes. It shows again that Walter's 
decision makes sense, since it makes it possible to write utilities in D 
about as simply and with more confidence, and resulting in faster 
execution, than in Perl. I'm pretty much sure D would devastatingly win 
this comparison.

-i.

Jan 31 2003

Burton Radons <loth users.sourceforge.net> writes:

Ilya Minkov wrote:
 A GC can profit a lot from modified code generation. A compiler 
 implementation could choose one of automatic memory management 
 solutions, which would not requiere any changes in code, that is are 
 equivalent to the programmer but having different efficiency 
 characteristics.

Let's see what happens when type-based scanning is in first; nobody 
knows what effect it'll have on scanning, but in a small utility I'm 
making (fractal browsing sample for dig) scanning takes 60 milliseconds 
but is scanning... 802 pointers over 1120001 dwords of data, so 1396 
times too much data.  Once that is pared down I predict I'll be able to 
call a full collection every second without having to worry about delays 
in almost every program.

The exception will be VERY heavily pointer-oriented programs.  We're 
talking Lisp level and programs where you have an "Int" class.  I don't 
care about those.

Jan 31 2003

"Walter" <walter digitalmars.com> writes:

"Burton Radons" <loth users.sourceforge.net> wrote in message
news:b1eo7q$15m6$1 digitaldaemon.com...
 Let's see what happens when type-based scanning is in first; nobody
 knows what effect it'll have on scanning, but in a small utility I'm
 making (fractal browsing sample for dig) scanning takes 60 milliseconds
 but is scanning... 802 pointers over 1120001 dwords of data, so 1396
 times too much data.  Once that is pared down I predict I'll be able to
 call a full collection every second without having to worry about delays
 in almost every program.

 The exception will be VERY heavily pointer-oriented programs.  We're
 talking Lisp level and programs where you have an "Int" class.  I don't
 care about those.

I have a lot of ideas about improving GC, such as making it a copying
generational collector, implementing hardware write barriers, and using type
info to mark whole regions as "not containing any pointers", etc. There is a
LOT that can be done to boost performance.

Feb 03 2003

Evan McClanahan <evan dontSPAMaltarinteractive.com> writes:

Walter wrote:
 "Burton Radons" <loth users.sourceforge.net> wrote in message
 news:b1eo7q$15m6$1 digitaldaemon.com...
 
Let's see what happens when type-based scanning is in first; nobody
knows what effect it'll have on scanning, but in a small utility I'm
making (fractal browsing sample for dig) scanning takes 60 milliseconds
but is scanning... 802 pointers over 1120001 dwords of data, so 1396
times too much data.  Once that is pared down I predict I'll be able to
call a full collection every second without having to worry about delays
in almost every program.

The exception will be VERY heavily pointer-oriented programs.  We're
talking Lisp level and programs where you have an "Int" class.  I don't
care about those.

 
 
 I have a lot of ideas about improving GC, such as making it a copying
 generational collector, implementing hardware write barriers, and using type
 info to mark whole regions as "not containing any pointers", etc. There is a
 LOT that can be done to boost performance.

Implementing hardware write barriers?  I didn't think that the hardware 
was there on most memory controllers.

Evan

Feb 09 2003

"Mike Wynn" <mike.wynn l8night.co.uk> writes:

 I have a lot of ideas about improving GC, such as making it a copying
 generational collector, implementing hardware write barriers, and using


type
 info to mark whole regions as "not containing any pointers", etc. There


is a
 LOT that can be done to boost performance.

 Implementing hardware write barriers?  I didn't think that the hardware
 was there on most memory controllers.

I took that to be; marking the page as read only. and catching the page
fault.

Feb 09 2003

Evan McClanahan <evan dontSPAMaltarinteractive.com> writes:

Mike Wynn wrote:
Implementing hardware write barriers?  I didn't think that the hardware
was there on most memory controllers.

 
 I took that to be; marking the page as read only. and catching the page
 fault.

 From what I have read, can't find the link right now, that usually 
results in a decrease in performance, because the whole page needs to be 
scanned once the barrer is faulted.  You only know that the page is 
dirty, not how dirty, and then have to scan 4MB for what is possibly 
only one allocated object.  I for one would rather go with real software 
read and write barriers, which can then be replaced with hooks to real 
hardware read and write barriers once they finally start being 
reincluded in memory chips, which I figure that they will be before too 
long, now that microsoft is pressing on the whole garbage collected 


Evan

Feb 09 2003

"Mike Wynn" <mike.wynn l8night.co.uk> writes:

"Evan McClanahan" <evan dontSPAMaltarinteractive.com> wrote in message
news:b25v5t$2bms$1 digitaldaemon.com...
 Mike Wynn wrote:
Implementing hardware write barriers?  I didn't think that the hardware
was there on most memory controllers.

 I took that to be; marking the page as read only. and catching the page
 fault.

  From what I have read, can't find the link right now, that usually
 results in a decrease in performance, because the whole page needs to be
 scanned once the barrer is faulted.  You only know that the page is
 dirty, not how dirty, and then have to scan 4MB for what is possibly
 only one allocated object.  I for one would rather go with real software
 read and write barriers, which can then be replaced with hooks to real
 hardware read and write barriers once they finally start being
 reincluded in memory chips, which I figure that they will be before too
 long, now that microsoft is pressing on the whole garbage collected


on a pentium the page can be 4K or 4M
and I believe there is a 2M page too (May be PPro)

and I agree that it well designed software write barriers are most likely
cheaper (performance, maintance and development)
3 colour non copying concurrent collectors only actually require write and
return barriers

the people I used to know who write GC's don't believe in h/w gc, partly
because it restrict what you can do with the object layouts and partly
because you can't change or upgrade the gc. h/w that would assist one type
of GC might hinder another.

I think it will be a while before hardware gc becomes standard, there are
CPU's that don't even have basic MM
most old consoles and pda's (although I think WinCE devices have to have MM,
I know PalmOS does not require it).
and I think the PS2 has MM 'cos it can run  linux (not uLinux)

and of course if your device has a JavaVM, why bother with hardware MM, when
you can do it all in software and save on sillicon. or use Tao intent or
similar system. I would expect a CLR bytecode cpu, like the Sun and Arm
attempts with Java Bytecode aware CPU's before h/w GC

Mike.

Feb 09 2003

C <blackmarlin asean-mail.com> writes:

Evan McClanahan wrote:
 Mike Wynn wrote:
 
 Implementing hardware write barriers?  I didn't think that the hardware
 was there on most memory controllers.

 I took that to be; marking the page as read only. and catching the page
 fault.

 
 
  From what I have read, can't find the link right now, that usually 
 results in a decrease in performance, because the whole page needs to be 
 scanned once the barrer is faulted.  You only know that the page is 
 dirty, not how dirty, and then have to scan 4MB for what is possibly 
 only one allocated object.  I for one would rather go with real software 
 read and write barriers, which can then be replaced with hooks to real 
 hardware read and write barriers once they finally start being 
 reincluded in memory chips, which I figure that they will be before too 
 long, now that microsoft is pressing on the whole garbage collected 

 
 Evan

Every time the page status is changed the TLB cache would
need to be flushed (an expensive operation), this change would also
have to be reflected across multiple processors on a SMP system.
All of this would probably require support from the operating system,
at least with respect to hooks to preset/reset/get the read/write
flag of the page.

The scanning of the page could be limited by picking a smaller
page size, the 80x86 supports 4Kb as well as 4Mb pages (and 2Mb
pages with page addressing extensions - see CR4.5 & CPUID(2)/EDX.6)

C 2003/2/9

Feb 08 2003

"Walter" <walter digitalmars.com> writes:

"Achillefs Margaritis" <axilmar b-online.gr> wrote in message
news:b1ce36$1ir3$1 digitaldaemon.com...
 C++ can be fully garbage-collected. Just make a GC super class of all
 garbage collected classes that upon construction adds the object to a

global
 internal list, then clear the list from time to time.

I know, I use GC in my C++ programs. There are some problems, though,
because the C++ compiler provides no type information to the runtime. The
way a C++ compiler is implemented restricts the kind of GC algorithms that
can be used.

Feb 03 2003

D Programming

C/C++ Programming

Other

D - Garbage Collection is bad... other comments