digitalmars.D - Dynamic Linking & Memory Management

Benji Smith (65/65) Jan 26 2005 I've been interested to read some of the recent discussion about the D

Kris (5/70) Jan 26 2005 The gods be praised! Good example, Benji.

Matthew (13/94) Jan 26 2005 Indeed.

Kris (14/114) Jan 26 2005 Encore!

Matthew (18/140) Jan 26 2005 As I've said before, I want both models supportable, and I don't accept ...

Matthew (3/6) Jan 26 2005 btw, Java-phobic hyperbole aside, I should point out that until D can ha...

John Reimer (5/13) Jan 26 2005 Umm... have you updated your newsreader? Did you see Walter's recent

Matthew (3/15) Jan 26 2005 Coolio. We can get back to telling big-W what a star we think he is, now...

Matthew (8/74) Jan 26 2005 Hear, hear! (And thanks for spending the time to write such an eloquent
Walter (3/17) Jan 26 2005 I agree. I'm working on it.

Kris (4/22) Jan 26 2005 Hallelujah!

John Reimer (7/17) Jan 26 2005 :-D

Matthew (27/45) Jan 26 2005 Cool. I can shut my overbusy chops now! :-)

Kris (5/28) Jan 26 2005 I didn't quite follow all of that, but here's something that Pragma sugg...

Matthew (4/53) Jan 26 2005 I want to be able to write C-API DLLs in D, which I think would be the

Kris (18/41) Jan 26 2005 I should point out all the tricky stuff you're talking about just goes a...

Matthew (48/118) Jan 26 2005 True. But then one has the inordinate hassle of managing a Phobos.DLL

Kris (15/28) Jan 26 2005 So how about this:

Matthew (5/43) Jan 27 2005 What about an application, written in D and statically linked to a GC,

Kris (38/41) Jan 27 2005 Right.

Walter (14/21) Jan 28 2005 utilize that

Kris (10/32) Jan 28 2005 We're misunderstanding each other, Walter. But there's nothing unusual a...

Walter (8/13) Jan 28 2005 The way to do it is construct a pool of those soft references, so the gc

Dave (40/54) Jan 28 2005

pragma (43/51) Jan 28 2005 Hey, sure thing.

Kris (21/74) Jan 28 2005 Good points, Pragma. Another thing to consider, regarding the explicit u...

pragma (27/44) Jan 28 2005 Yep. This is why I've advocated that we all get into the habit of namin...

Kris (21/38) Jan 28 2005 Ahh; I was just referring to the earlier assertion that placing the GC i...

Benji Smith (6/7) Jan 27 2005 Fantastic. Thanks, Walter. I really appreciate how receptive you are

Benji Smith <dlanguage xxagg.com> writes:

I've been interested to read some of the recent discussion about the D
garbage collector, and I'd like to describe my current Java project to
give some perspective on what I think is ideal memory management.

I'm working on a technical analysis and simulation application for
historical stock market data. And despite the fact that I've written
about ten thousand lines of code myself, I'm using third-party
libraries for many aspects of the project. First of all, I'm using
JDBC drivers for MySQL and MS SQL Server. The MySQL drivers consist of
3 different JAR files that I include in my application's classpath.
The MS SQL Server drivers are contained in another 3 JAR files. I'm
also using the Xerces XML parser from the Apache group (3 more JARs),
the JFreeChart graphical charting compontents (5 more JARs), the JUnit
testing framework (1 JAR), a GNU commandline parsing library (1 JAR),
and a few other miscellaneous libraries. All told, I'm importing
functionality from more than fifteen different libraries. And the
application is still very small; by the time I�ve finished developing
it, I�ll probably be using twice as many libraries.

But when I write my code, I can write it as though I'm statically
linking with each of those libraries. I don't need to use special
export semantics when I need to call code from any of those third
party vendors. And with a few of those libraries (the commandline
parsing library in particular), I may end up writing my own
implementation. When I do, I won't have to change the semantics to
reflect the fact that I'm no longer using a compiled library. The rest
of my code can be completely agnostic to whether I'm linking with
source files, compiled class files, class files bundled into a JAR
package, classes generated at runtime through reflection hooks, or
classes loaded dynamically using a custom classloader. Since my
application needs to support third-party plug-in development (users
can load their own classes as custom charting indicators), dynamic
runtime loading of classes is essential to my design.

Ubiquitous static-linking would not be an option for me with this
application.

But there�s another important issue, too: debugging.

Last week I discovered a memory leak somewhere in my application. If I
allowed some of the analysis code to run for a few hours--combing
through all 18 million data points from the last 25 years of stock
market data--the heap would grow from its initial allocation (8 MB) to
its maximum allocation (256 MB). Luckily for me, all of those
allocations take place within a single virtual machine, which uses a
single garbage collector to manage all of the memory from all of the
libraries I'm using. That allows me to use a profiling application to
monitor the allocations of all the objects in the heap and--much more
importantly--to find out which objects are holding references to which
other objects. Within moments, I could see that the JDBC allocations
were getting cleaned up properly, but one of my custom collection
classes was failing�about 2% of the time�to release object references
it was no longer using. After an hour or so of tinkering with the
profiler, I was able to track down and fix that memory leak. The
application now uses a steady 12 MB of heap memory, no matter how long
it runs.

If I were building the same application with D, there would be fifteen
different garbage collectors operating in fifteen different
heap-spaces, and the objects allocated within one heap space might be
referenced by objects in another heap space, each managed by a
different garbage collector. It would be much more difficult to
develop a heap profiling tool that could successfully allow a
developer to navigate through such a fragmented heap space,
particularly if the developer needed to figure out which GC was
supposed to collect each out-of-reach object. Tracking down and fixing
that memory leak probably would have taken a lot longer than an hour
and a half.

Consequently, I strongly support the development of a model within D
that allows for a single GC instance per process. Any other scenario
sounds like a development & debugging nightmare.

Jan 26 2005

Kris <Kris_member pathlink.com> writes:

The gods be praised! Good example, Benji.

This is why it's been noted that D would not pass muster, in much of the
all-important commercial field. 

- Kris


In article <ce4gv0dk6v1qgde6o94msecgi4huasdhga 4ax.com>, Benji Smith says...
I've been interested to read some of the recent discussion about the D
garbage collector, and I'd like to describe my current Java project to
give some perspective on what I think is ideal memory management.

I'm working on a technical analysis and simulation application for
historical stock market data. And despite the fact that I've written
about ten thousand lines of code myself, I'm using third-party
libraries for many aspects of the project. First of all, I'm using
JDBC drivers for MySQL and MS SQL Server. The MySQL drivers consist of
3 different JAR files that I include in my application's classpath.
The MS SQL Server drivers are contained in another 3 JAR files. I'm
also using the Xerces XML parser from the Apache group (3 more JARs),
the JFreeChart graphical charting compontents (5 more JARs), the JUnit
testing framework (1 JAR), a GNU commandline parsing library (1 JAR),
and a few other miscellaneous libraries. All told, I'm importing
functionality from more than fifteen different libraries. And the
application is still very small; by the time I�ve finished developing
it, I�ll probably be using twice as many libraries.

But when I write my code, I can write it as though I'm statically
linking with each of those libraries. I don't need to use special
export semantics when I need to call code from any of those third
party vendors. And with a few of those libraries (the commandline
parsing library in particular), I may end up writing my own
implementation. When I do, I won't have to change the semantics to
reflect the fact that I'm no longer using a compiled library. The rest
of my code can be completely agnostic to whether I'm linking with
source files, compiled class files, class files bundled into a JAR
package, classes generated at runtime through reflection hooks, or
classes loaded dynamically using a custom classloader. Since my
application needs to support third-party plug-in development (users
can load their own classes as custom charting indicators), dynamic
runtime loading of classes is essential to my design.

Ubiquitous static-linking would not be an option for me with this
application.

But there�s another important issue, too: debugging.

Last week I discovered a memory leak somewhere in my application. If I
allowed some of the analysis code to run for a few hours--combing
through all 18 million data points from the last 25 years of stock
market data--the heap would grow from its initial allocation (8 MB) to
its maximum allocation (256 MB). Luckily for me, all of those
allocations take place within a single virtual machine, which uses a
single garbage collector to manage all of the memory from all of the
libraries I'm using. That allows me to use a profiling application to
monitor the allocations of all the objects in the heap and--much more
importantly--to find out which objects are holding references to which
other objects. Within moments, I could see that the JDBC allocations
were getting cleaned up properly, but one of my custom collection
classes was failing�about 2% of the time�to release object references
it was no longer using. After an hour or so of tinkering with the
profiler, I was able to track down and fix that memory leak. The
application now uses a steady 12 MB of heap memory, no matter how long
it runs.

If I were building the same application with D, there would be fifteen
different garbage collectors operating in fifteen different
heap-spaces, and the objects allocated within one heap space might be
referenced by objects in another heap space, each managed by a
different garbage collector. It would be much more difficult to
develop a heap profiling tool that could successfully allow a
developer to navigate through such a fragmented heap space,
particularly if the developer needed to figure out which GC was
supposed to collect each out-of-reach object. Tracking down and fixing
that memory leak probably would have taken a lot longer than an hour
and a half.

Consequently, I strongly support the development of a model within D
that allows for a single GC instance per process. Any other scenario
sounds like a development & debugging nightmare.

Jan 26 2005

"Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:

Indeed.

I, for one, can live with D 1.0 not having dynamic class loading, but
that would be a sine qua non for 2.0. Furthermore, I think 1.0 must have
a cooperative/unified GC architecture between link units, otherwise,
again, we'll just be writing our DLLs in C and all non-compiled in code
to an app will be via D extensions (which, while fun to write
occasionaly, will get *really* tiresome).

Or maybe I'm missing some deeper truth on the viability of D. If so, can
someone please enlighten me so I can stop carping on like a harbinger of
Doom.

The Dr .....


"Kris" <Kris_member pathlink.com> wrote in message
news:ct95fa$2tv5$1 digitaldaemon.com...
 The gods be praised! Good example, Benji.

 This is why it's been noted that D would not pass muster, in much of
 the
 all-important commercial field.

 - Kris


 In article <ce4gv0dk6v1qgde6o94msecgi4huasdhga 4ax.com>, Benji Smith
 says...
I've been interested to read some of the recent discussion about the D
garbage collector, and I'd like to describe my current Java project to
give some perspective on what I think is ideal memory management.

I'm working on a technical analysis and simulation application for
historical stock market data. And despite the fact that I've written
about ten thousand lines of code myself, I'm using third-party
libraries for many aspects of the project. First of all, I'm using
JDBC drivers for MySQL and MS SQL Server. The MySQL drivers consist of
3 different JAR files that I include in my application's classpath.
The MS SQL Server drivers are contained in another 3 JAR files. I'm
also using the Xerces XML parser from the Apache group (3 more JARs),
the JFreeChart graphical charting compontents (5 more JARs), the JUnit
testing framework (1 JAR), a GNU commandline parsing library (1 JAR),
and a few other miscellaneous libraries. All told, I'm importing
functionality from more than fifteen different libraries. And the
application is still very small; by the time I've finished developing
it, I'll probably be using twice as many libraries.

But when I write my code, I can write it as though I'm statically
linking with each of those libraries. I don't need to use special
export semantics when I need to call code from any of those third
party vendors. And with a few of those libraries (the commandline
parsing library in particular), I may end up writing my own
implementation. When I do, I won't have to change the semantics to
reflect the fact that I'm no longer using a compiled library. The rest
of my code can be completely agnostic to whether I'm linking with
source files, compiled class files, class files bundled into a JAR
package, classes generated at runtime through reflection hooks, or
classes loaded dynamically using a custom classloader. Since my
application needs to support third-party plug-in development (users
can load their own classes as custom charting indicators), dynamic
runtime loading of classes is essential to my design.

Ubiquitous static-linking would not be an option for me with this
application.

But there's another important issue, too: debugging.

Last week I discovered a memory leak somewhere in my application. If I
allowed some of the analysis code to run for a few hours--combing
through all 18 million data points from the last 25 years of stock
market data--the heap would grow from its initial allocation (8 MB) to
its maximum allocation (256 MB). Luckily for me, all of those
allocations take place within a single virtual machine, which uses a
single garbage collector to manage all of the memory from all of the
libraries I'm using. That allows me to use a profiling application to
monitor the allocations of all the objects in the heap and--much more
importantly--to find out which objects are holding references to which
other objects. Within moments, I could see that the JDBC allocations
were getting cleaned up properly, but one of my custom collection
classes was failing-about 2% of the time-to release object references
it was no longer using. After an hour or so of tinkering with the
profiler, I was able to track down and fix that memory leak. The
application now uses a steady 12 MB of heap memory, no matter how long
it runs.

If I were building the same application with D, there would be fifteen
different garbage collectors operating in fifteen different
heap-spaces, and the objects allocated within one heap space might be
referenced by objects in another heap space, each managed by a
different garbage collector. It would be much more difficult to
develop a heap profiling tool that could successfully allow a
developer to navigate through such a fragmented heap space,
particularly if the developer needed to figure out which GC was
supposed to collect each out-of-reach object. Tracking down and fixing
that memory leak probably would have taken a lot longer than an hour
and a half.

Consequently, I strongly support the development of a model within D
that allows for a single GC instance per process. Any other scenario
sounds like a development & debugging nightmare.

Jan 26 2005

Kris <Kris_member pathlink.com> writes:

Encore!

And lest we forget: D would happily support the all-important unified GC (per
process) if the GC were simply moved to a shared-lib. Further, Sean has already
invested the majority of effort in carefully extracting the GC from the runtime,
such that this is a near reality.

Walter notes that he's had customer-support difficulties in the past over
shared-libs, due to the vagaries of Win32. Unfortunately, that negative
experience is being reflected directly in the range of valid programming models
effectively supported by the D language. 

We /really/ need to move forward on this issue. Perhaps we can start (yet again)
by asking why Walter feels we're all so much better off with a proliferation of
GC instances instead of just one, easily manageable, instance?

- Kris


In article <ct968s$2uv3$1 digitaldaemon.com>, Matthew says...
Indeed.

I, for one, can live with D 1.0 not having dynamic class loading, but
that would be a sine qua non for 2.0. Furthermore, I think 1.0 must have
a cooperative/unified GC architecture between link units, otherwise,
again, we'll just be writing our DLLs in C and all non-compiled in code
to an app will be via D extensions (which, while fun to write
occasionaly, will get *really* tiresome).

Or maybe I'm missing some deeper truth on the viability of D. If so, can
someone please enlighten me so I can stop carping on like a harbinger of
Doom.

The Dr .....


"Kris" <Kris_member pathlink.com> wrote in message
news:ct95fa$2tv5$1 digitaldaemon.com...
 The gods be praised! Good example, Benji.

 This is why it's been noted that D would not pass muster, in much of
 the
 all-important commercial field.

 - Kris


 In article <ce4gv0dk6v1qgde6o94msecgi4huasdhga 4ax.com>, Benji Smith
 says...
I've been interested to read some of the recent discussion about the D
garbage collector, and I'd like to describe my current Java project to
give some perspective on what I think is ideal memory management.

I'm working on a technical analysis and simulation application for
historical stock market data. And despite the fact that I've written
about ten thousand lines of code myself, I'm using third-party
libraries for many aspects of the project. First of all, I'm using
JDBC drivers for MySQL and MS SQL Server. The MySQL drivers consist of
3 different JAR files that I include in my application's classpath.
The MS SQL Server drivers are contained in another 3 JAR files. I'm
also using the Xerces XML parser from the Apache group (3 more JARs),
the JFreeChart graphical charting compontents (5 more JARs), the JUnit
testing framework (1 JAR), a GNU commandline parsing library (1 JAR),
and a few other miscellaneous libraries. All told, I'm importing
functionality from more than fifteen different libraries. And the
application is still very small; by the time I've finished developing
it, I'll probably be using twice as many libraries.

But when I write my code, I can write it as though I'm statically
linking with each of those libraries. I don't need to use special
export semantics when I need to call code from any of those third
party vendors. And with a few of those libraries (the commandline
parsing library in particular), I may end up writing my own
implementation. When I do, I won't have to change the semantics to
reflect the fact that I'm no longer using a compiled library. The rest
of my code can be completely agnostic to whether I'm linking with
source files, compiled class files, class files bundled into a JAR
package, classes generated at runtime through reflection hooks, or
classes loaded dynamically using a custom classloader. Since my
application needs to support third-party plug-in development (users
can load their own classes as custom charting indicators), dynamic
runtime loading of classes is essential to my design.

Ubiquitous static-linking would not be an option for me with this
application.

But there's another important issue, too: debugging.

Last week I discovered a memory leak somewhere in my application. If I
allowed some of the analysis code to run for a few hours--combing
through all 18 million data points from the last 25 years of stock
market data--the heap would grow from its initial allocation (8 MB) to
its maximum allocation (256 MB). Luckily for me, all of those
allocations take place within a single virtual machine, which uses a
single garbage collector to manage all of the memory from all of the
libraries I'm using. That allows me to use a profiling application to
monitor the allocations of all the objects in the heap and--much more
importantly--to find out which objects are holding references to which
other objects. Within moments, I could see that the JDBC allocations
were getting cleaned up properly, but one of my custom collection
classes was failing-about 2% of the time-to release object references
it was no longer using. After an hour or so of tinkering with the
profiler, I was able to track down and fix that memory leak. The
application now uses a steady 12 MB of heap memory, no matter how long
it runs.

If I were building the same application with D, there would be fifteen
different garbage collectors operating in fifteen different
heap-spaces, and the objects allocated within one heap space might be
referenced by objects in another heap space, each managed by a
different garbage collector. It would be much more difficult to
develop a heap profiling tool that could successfully allow a
developer to navigate through such a fragmented heap space,
particularly if the developer needed to figure out which GC was
supposed to collect each out-of-reach object. Tracking down and fixing
that memory leak probably would have taken a lot longer than an hour
and a half.

Consequently, I strongly support the development of a model within D
that allows for a single GC instance per process. Any other scenario
sounds like a development & debugging nightmare.

Jan 26 2005

"Matthew" <admin.hat stlsoft.dot.org> writes:

 And lest we forget: D would happily support the all-important unified GC (per
 process) if the GC were simply moved to a shared-lib. Further, Sean has already
 invested the majority of effort in carefully extracting the GC from the
runtime,
 such that this is a near reality.

As I've said before, I want both models supportable, and I don't accept that
this is technically infeasible.

Notwithstanding, if both are not, then we must go for a separation between the
pure statically linked model and the 
dynamically linked GC model. If we stay with static linking only, D's a joke,
isn't it?

 Walter notes that he's had customer-support difficulties in the past over
 shared-libs, due to the vagaries of Win32. Unfortunately, that negative
 experience is being reflected directly in the range of valid programming models
 effectively supported by the D language.

I agree. And I think this will kill D. As I've whinged and whined on, I can't
understand how Walter thinks that D will 
be viable with the status quo.

Alas, though Walter has huge amounts of valuable experience and insight (more
than mine, I would hazard), I think he 
fails to recognise, or at least act on, two important facts:

    1. he doesn't have *all* experience. None of us do. And, much more
importantly, ...
    2. many of us do not have any serious problems doing *very successful* (see
below) work in C/C++. If D is not a 
quantum leap forward, _without_ new hassles, then why the hell is anyone ever
going to use it? Because it's better than 
Java?!? Pah!

 We /really/ need to move forward on this issue. Perhaps we can start (yet
again)
 by asking why Walter feels we're all so much better off with a proliferation of
 GC instances instead of just one, easily manageable, instance?

We're not better off with that. We're nowhere with that! Someone turn out the
lights on their way out ...

The Dr .....

(below to be seen): I've worked on several highly commercially important
projects over the last several years, most of 
which have been (primarily) implemented in C++. All the guff that people
generally whinge on about as problems in C++ 
have proved either non-existant, irrelevant, or easily amenable to good
practice. Some of these are still in production, 
2, 4, 5 years later, and have never had a millisecond of downtime. So why do we
need D, if it's going to be 
hassle-bundled?



 In article <ct968s$2uv3$1 digitaldaemon.com>, Matthew says...
Indeed.

I, for one, can live with D 1.0 not having dynamic class loading, but
that would be a sine qua non for 2.0. Furthermore, I think 1.0 must have
a cooperative/unified GC architecture between link units, otherwise,
again, we'll just be writing our DLLs in C and all non-compiled in code
to an app will be via D extensions (which, while fun to write
occasionaly, will get *really* tiresome).

Or maybe I'm missing some deeper truth on the viability of D. If so, can
someone please enlighten me so I can stop carping on like a harbinger of
Doom.

The Dr .....


"Kris" <Kris_member pathlink.com> wrote in message
news:ct95fa$2tv5$1 digitaldaemon.com...
 The gods be praised! Good example, Benji.

 This is why it's been noted that D would not pass muster, in much of
 the
 all-important commercial field.

 - Kris


 In article <ce4gv0dk6v1qgde6o94msecgi4huasdhga 4ax.com>, Benji Smith
 says...
I've been interested to read some of the recent discussion about the D
garbage collector, and I'd like to describe my current Java project to
give some perspective on what I think is ideal memory management.

I'm working on a technical analysis and simulation application for
historical stock market data. And despite the fact that I've written
about ten thousand lines of code myself, I'm using third-party
libraries for many aspects of the project. First of all, I'm using
JDBC drivers for MySQL and MS SQL Server. The MySQL drivers consist of
3 different JAR files that I include in my application's classpath.
The MS SQL Server drivers are contained in another 3 JAR files. I'm
also using the Xerces XML parser from the Apache group (3 more JARs),
the JFreeChart graphical charting compontents (5 more JARs), the JUnit
testing framework (1 JAR), a GNU commandline parsing library (1 JAR),
and a few other miscellaneous libraries. All told, I'm importing
functionality from more than fifteen different libraries. And the
application is still very small; by the time I've finished developing
it, I'll probably be using twice as many libraries.

But when I write my code, I can write it as though I'm statically
linking with each of those libraries. I don't need to use special
export semantics when I need to call code from any of those third
party vendors. And with a few of those libraries (the commandline
parsing library in particular), I may end up writing my own
implementation. When I do, I won't have to change the semantics to
reflect the fact that I'm no longer using a compiled library. The rest
of my code can be completely agnostic to whether I'm linking with
source files, compiled class files, class files bundled into a JAR
package, classes generated at runtime through reflection hooks, or
classes loaded dynamically using a custom classloader. Since my
application needs to support third-party plug-in development (users
can load their own classes as custom charting indicators), dynamic
runtime loading of classes is essential to my design.

Ubiquitous static-linking would not be an option for me with this
application.

But there's another important issue, too: debugging.

Last week I discovered a memory leak somewhere in my application. If I
allowed some of the analysis code to run for a few hours--combing
through all 18 million data points from the last 25 years of stock
market data--the heap would grow from its initial allocation (8 MB) to
its maximum allocation (256 MB). Luckily for me, all of those
allocations take place within a single virtual machine, which uses a
single garbage collector to manage all of the memory from all of the
libraries I'm using. That allows me to use a profiling application to
monitor the allocations of all the objects in the heap and--much more
importantly--to find out which objects are holding references to which
other objects. Within moments, I could see that the JDBC allocations
were getting cleaned up properly, but one of my custom collection
classes was failing-about 2% of the time-to release object references
it was no longer using. After an hour or so of tinkering with the
profiler, I was able to track down and fix that memory leak. The
application now uses a steady 12 MB of heap memory, no matter how long
it runs.

If I were building the same application with D, there would be fifteen
different garbage collectors operating in fifteen different
heap-spaces, and the objects allocated within one heap space might be
referenced by objects in another heap space, each managed by a
different garbage collector. It would be much more difficult to
develop a heap profiling tool that could successfully allow a
developer to navigate through such a fragmented heap space,
particularly if the developer needed to figure out which GC was
supposed to collect each out-of-reach object. Tracking down and fixing
that memory leak probably would have taken a lot longer than an hour
and a half.

Consequently, I strongly support the development of a model within D
that allows for a single GC instance per process. Any other scenario
sounds like a development & debugging nightmare.

Jan 26 2005

"Matthew" <admin.hat stlsoft.dot.org> writes:

"Matthew" <admin.hat stlsoft.dot.org> wrote in message
news:ct9gl9$9k4$1 digitaldaemon.com...
    2. many of us do not have any serious problems doing *very successful* (see
below) work in C/C++. If D is not a 
 quantum leap forward, _without_ new hassles, then why the hell is anyone ever
going to use it? Because it's better 
 than Java?!? Pah!

btw, Java-phobic hyperbole aside, I should point out that until D can handle
scenarios such as outlined in Benji's 
excellent post, D isn't even fit to kiss the bloated arse of Java. And that's a
sad position to be in, to be sure ...

Jan 26 2005

John Reimer <brk_6502 yahoo.com> writes:

On Thu, 27 Jan 2005 12:42:57 +1100, Matthew wrote:

 
 "Matthew" <admin.hat stlsoft.dot.org> wrote in message
news:ct9gl9$9k4$1 digitaldaemon.com...
    2. many of us do not have any serious problems doing *very successful* (see
below) work in C/C++. If D is not a 
 quantum leap forward, _without_ new hassles, then why the hell is anyone ever
going to use it? Because it's better 
 than Java?!? Pah!

 
 btw, Java-phobic hyperbole aside, I should point out that until D can handle
scenarios such as outlined in Benji's 
 excellent post, D isn't even fit to kiss the bloated arse of Java. And that's
a sad position to be in, to be sure ...

Umm... have you updated your newsreader?  Did you see Walter's recent
response in this topic?

It seems the point has been taken. :-)

- John R.

Jan 26 2005

"Matthew" <admin.hat stlsoft.dot.org> writes:

"John Reimer" <brk_6502 yahoo.com> wrote in message
news:pan.2005.01.27.01.43.39.773589 yahoo.com...
 On Thu, 27 Jan 2005 12:42:57 +1100, Matthew wrote:

 "Matthew" <admin.hat stlsoft.dot.org> wrote in message
news:ct9gl9$9k4$1 digitaldaemon.com...
    2. many of us do not have any serious problems doing *very successful* (see
below) work in C/C++. If D is not a
 quantum leap forward, _without_ new hassles, then why the hell is anyone ever
going to use it? Because it's better
 than Java?!? Pah!

 btw, Java-phobic hyperbole aside, I should point out that until D can handle
scenarios such as outlined in Benji's
 excellent post, D isn't even fit to kiss the bloated arse of Java. And that's
a sad position to be in, to be sure ...

 Umm... have you updated your newsreader?  Did you see Walter's recent
 response in this topic?

Been having issues with my cable, and local net.

 It seems the point has been taken. :-)

Coolio. We can get back to telling big-W what a star we think he is, now. <CG>

Jan 26 2005

"Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:

Hear, hear! (And thanks for spending the time to write such an eloquent 
and persuasive post.)

I not only agree with everything you say, I think we should overtly 
state that without support such as this, D is Doomed: Dead, Duck-like, 
save for small self-contained utility programs (for which C, never mind 
C++, suffices adequately, IMO).


"Benji Smith" <dlanguage xxagg.com> wrote in message 
news:ce4gv0dk6v1qgde6o94msecgi4huasdhga 4ax.com...
 I've been interested to read some of the recent discussion about the D
 garbage collector, and I'd like to describe my current Java project to
 give some perspective on what I think is ideal memory management.

 I'm working on a technical analysis and simulation application for
 historical stock market data. And despite the fact that I've written
 about ten thousand lines of code myself, I'm using third-party
 libraries for many aspects of the project. First of all, I'm using
 JDBC drivers for MySQL and MS SQL Server. The MySQL drivers consist of
 3 different JAR files that I include in my application's classpath.
 The MS SQL Server drivers are contained in another 3 JAR files. I'm
 also using the Xerces XML parser from the Apache group (3 more JARs),
 the JFreeChart graphical charting compontents (5 more JARs), the JUnit
 testing framework (1 JAR), a GNU commandline parsing library (1 JAR),
 and a few other miscellaneous libraries. All told, I'm importing
 functionality from more than fifteen different libraries. And the
 application is still very small; by the time I've finished developing
 it, I'll probably be using twice as many libraries.

 But when I write my code, I can write it as though I'm statically
 linking with each of those libraries. I don't need to use special
 export semantics when I need to call code from any of those third
 party vendors. And with a few of those libraries (the commandline
 parsing library in particular), I may end up writing my own
 implementation. When I do, I won't have to change the semantics to
 reflect the fact that I'm no longer using a compiled library. The rest
 of my code can be completely agnostic to whether I'm linking with
 source files, compiled class files, class files bundled into a JAR
 package, classes generated at runtime through reflection hooks, or
 classes loaded dynamically using a custom classloader. Since my
 application needs to support third-party plug-in development (users
 can load their own classes as custom charting indicators), dynamic
 runtime loading of classes is essential to my design.

 Ubiquitous static-linking would not be an option for me with this
 application.

 But there's another important issue, too: debugging.

 Last week I discovered a memory leak somewhere in my application. If I
 allowed some of the analysis code to run for a few hours--combing
 through all 18 million data points from the last 25 years of stock
 market data--the heap would grow from its initial allocation (8 MB) to
 its maximum allocation (256 MB). Luckily for me, all of those
 allocations take place within a single virtual machine, which uses a
 single garbage collector to manage all of the memory from all of the
 libraries I'm using. That allows me to use a profiling application to
 monitor the allocations of all the objects in the heap and--much more
 importantly--to find out which objects are holding references to which
 other objects. Within moments, I could see that the JDBC allocations
 were getting cleaned up properly, but one of my custom collection
 classes was failing-about 2% of the time-to release object references
 it was no longer using. After an hour or so of tinkering with the
 profiler, I was able to track down and fix that memory leak. The
 application now uses a steady 12 MB of heap memory, no matter how long
 it runs.

 If I were building the same application with D, there would be fifteen
 different garbage collectors operating in fifteen different
 heap-spaces, and the objects allocated within one heap space might be
 referenced by objects in another heap space, each managed by a
 different garbage collector. It would be much more difficult to
 develop a heap profiling tool that could successfully allow a
 developer to navigate through such a fragmented heap space,
 particularly if the developer needed to figure out which GC was
 supposed to collect each out-of-reach object. Tracking down and fixing
 that memory leak probably would have taken a lot longer than an hour
 and a half.

 Consequently, I strongly support the development of a model within D
 that allows for a single GC instance per process. Any other scenario
 sounds like a development & debugging nightmare.

Jan 26 2005

"Walter" <newshound digitalmars.com> writes:

"Benji Smith" <dlanguage xxagg.com> wrote in message
news:ce4gv0dk6v1qgde6o94msecgi4huasdhga 4ax.com...
 If I were building the same application with D, there would be fifteen
 different garbage collectors operating in fifteen different
 heap-spaces, and the objects allocated within one heap space might be
 referenced by objects in another heap space, each managed by a
 different garbage collector. It would be much more difficult to
 develop a heap profiling tool that could successfully allow a
 developer to navigate through such a fragmented heap space,
 particularly if the developer needed to figure out which GC was
 supposed to collect each out-of-reach object. Tracking down and fixing
 that memory leak probably would have taken a lot longer than an hour
 and a half.

 Consequently, I strongly support the development of a model within D
 that allows for a single GC instance per process. Any other scenario
 sounds like a development & debugging nightmare.

I agree. I'm working on it.

Jan 26 2005

Kris <Kris_member pathlink.com> writes:

In article <ct9afo$2l8$1 digitaldaemon.com>, Walter says...
"Benji Smith" <dlanguage xxagg.com> wrote in message
news:ce4gv0dk6v1qgde6o94msecgi4huasdhga 4ax.com...
 If I were building the same application with D, there would be fifteen
 different garbage collectors operating in fifteen different
 heap-spaces, and the objects allocated within one heap space might be
 referenced by objects in another heap space, each managed by a
 different garbage collector. It would be much more difficult to
 develop a heap profiling tool that could successfully allow a
 developer to navigate through such a fragmented heap space,
 particularly if the developer needed to figure out which GC was
 supposed to collect each out-of-reach object. Tracking down and fixing
 that memory leak probably would have taken a lot longer than an hour
 and a half.

 Consequently, I strongly support the development of a model within D
 that allows for a single GC instance per process. Any other scenario
 sounds like a development & debugging nightmare.

I agree. I'm working on it.

Hallelujah!

I will now shutup about this for a while :-)

(can I get another Hallelujah?)

Jan 26 2005

John Reimer <brk_6502 yahoo.com> writes:

On Thu, 27 Jan 2005 00:16:37 +0000, Kris wrote:

 In article <ct9afo$2l8$1 digitaldaemon.com>, Walter says...

I agree. I'm working on it.

 
 Hallelujah!
 
 I will now shutup about this for a while :-)
 
 (can I get another Hallelujah?)

:-D

This is indeed good news!

And Walter, thanks for listening.  Nice to see that you can be so
resilient despite being pounded like a fence post (but it was for a good
cause). :-)

- John R.

Jan 26 2005

"Matthew" <admin.hat stlsoft.dot.org> writes:

"Walter" <newshound digitalmars.com> wrote in message
news:ct9afo$2l8$1 digitaldaemon.com...
 "Benji Smith" <dlanguage xxagg.com> wrote in message
 news:ce4gv0dk6v1qgde6o94msecgi4huasdhga 4ax.com...
 If I were building the same application with D, there would be fifteen
 different garbage collectors operating in fifteen different
 heap-spaces, and the objects allocated within one heap space might be
 referenced by objects in another heap space, each managed by a
 different garbage collector. It would be much more difficult to
 develop a heap profiling tool that could successfully allow a
 developer to navigate through such a fragmented heap space,
 particularly if the developer needed to figure out which GC was
 supposed to collect each out-of-reach object. Tracking down and fixing
 that memory leak probably would have taken a lot longer than an hour
 and a half.

 Consequently, I strongly support the development of a model within D
 that allows for a single GC instance per process. Any other scenario
 sounds like a development & debugging nightmare.

 I agree. I'm working on it.

Cool. I can shut my overbusy chops now! :-)

Please may we have intelligent GCs that can detect each other, within a process
space, and connect up in a sensible, and 
dynamic-lib-unloading proof fashion?

Here's how I think it might work, roughly:

    If the process is written in D, it'll have a GC built-in (statically
linked). [I agree with Walter 100% that 
processes should have a GC built in, and not have to rely on a DLL. Painful for
small process distribution, along with 
other DLL issues ...]
    Now any subsequent D dynamic link-unit, whether that be a DDL (i.e.
'exposing' D classes in an analogous fashion to 
Java's JARs), or a DLL (i.e. exposing a C API/interface, implemented in D),
when loaded, does not create its own GC but 
links to that of the process.
    Easy peasy, because its axiomatic that the life of any loaded dynamic
link-units cannot exceed the life of the host 
process

    If the process is *not* written in D, here's where it gets interesting.
IMO, it should be possible to have the same 
GC-locating mechanism attach a subsequent D link-unit (whether DDL or DLL). The
problem is, how do we keep the first D 
link-unit's code and data locked in memory after the application has unloaded
it.
    One answer would be to just increment the first link-unit's ref count (e.g.
another call to dlopen()/LoadLibrary()) 
and then discard it. This would work fine, never crash, but would cause issues
in long running servers that need to 
unload modules to pick up newer versions.
    Another answer would be that each GC-dependent link-unit would add such a
lock onto the first-one-in, and release it 
when they're released. There may be issues here, however, since one can get
into trouble (un)loading libs during lib 
(un)loading (albeit that I've never run into this outside C++.NET).
    In either case, a linked-to GC would have to expose a method for either
locking it, or passing its instance 
name/handle to be locked

Naturally, we have the issue of how a to-be-created GC detects and connects to
a prior instance, and how this can be 
made thread-safe. That's a topic for discussion ...

The Dr .....

Jan 26 2005

Kris <Kris_member pathlink.com> writes:

In article <ct9hki$afi$1 digitaldaemon.com>, Matthew says...
Here's how I think it might work, roughly:

    If the process is written in D, it'll have a GC built-in (statically
linked). [I agree with Walter 100% that 
processes should have a GC built in, and not have to rely on a DLL. Painful for
small process distribution, along with 
other DLL issues ...]
    Now any subsequent D dynamic link-unit, whether that be a DDL (i.e.
'exposing' D classes in an analogous fashion to 
Java's JARs), or a DLL (i.e. exposing a C API/interface, implemented in D),
when loaded, does not create its own GC but 
links to that of the process.
    Easy peasy, because its axiomatic that the life of any loaded dynamic
link-units cannot exceed the life of the host 
process

    If the process is *not* written in D, here's where it gets interesting.
IMO, it should be possible to have the same 
GC-locating mechanism attach a subsequent D link-unit (whether DDL or DLL). The
problem is, how do we keep the first D 
link-unit's code and data locked in memory after the application has unloaded
it.
    One answer would be to just increment the first link-unit's ref count (e.g.
another call to dlopen()/LoadLibrary()) 
and then discard it. This would work fine, never crash, but would cause issues
in long running servers that need to 
unload modules to pick up newer versions.
    Another answer would be that each GC-dependent link-unit would add such a
lock onto the first-one-in, and release it 
when they're released. There may be issues here, however, since one can get
into trouble (un)loading libs during lib 
(un)loading (albeit that I've never run into this outside C++.NET).
    In either case, a linked-to GC would have to expose a method for either
locking it, or passing its instance 
name/handle to be locked

Naturally, we have the issue of how a to-be-created GC detects and connects to
a prior instance, and how this can be 
made thread-safe. That's a topic for discussion ...

The Dr .....

I didn't quite follow all of that, but here's something that Pragma suggested
many moons ago: if external link-units were managed by the "one & only" GC, it
could happily reap them when, and only when, there are no more live references.

I know that's a somewhat trivial statement, but it has powerful mojo;

Jan 26 2005

"Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:

"Kris" <Kris_member pathlink.com> wrote in message 
news:ct9k70$cv5$1 digitaldaemon.com...
 In article <ct9hki$afi$1 digitaldaemon.com>, Matthew says...
Here's how I think it might work, roughly:

    If the process is written in D, it'll have a GC built-in 
 (statically linked). [I agree with Walter 100% that
processes should have a GC built in, and not have to rely on a DLL. 
Painful for small process distribution, along with
other DLL issues ...]
    Now any subsequent D dynamic link-unit, whether that be a DDL 
 (i.e. 'exposing' D classes in an analogous fashion to
Java's JARs), or a DLL (i.e. exposing a C API/interface, implemented 
in D), when loaded, does not create its own GC but
links to that of the process.
    Easy peasy, because its axiomatic that the life of any loaded 
 dynamic link-units cannot exceed the life of the host
process

    If the process is *not* written in D, here's where it gets 
 interesting. IMO, it should be possible to have the same
GC-locating mechanism attach a subsequent D link-unit (whether DDL or 
DLL). The problem is, how do we keep the first D
link-unit's code and data locked in memory after the application has 
unloaded it.
    One answer would be to just increment the first link-unit's ref 
 count (e.g. another call to dlopen()/LoadLibrary())
and then discard it. This would work fine, never crash, but would 
cause issues in long running servers that need to
unload modules to pick up newer versions.
    Another answer would be that each GC-dependent link-unit would add 
 such a lock onto the first-one-in, and release it
when they're released. There may be issues here, however, since one 
can get into trouble (un)loading libs during lib
(un)loading (albeit that I've never run into this outside C++.NET).
    In either case, a linked-to GC would have to expose a method for 
 either locking it, or passing its instance
name/handle to be locked

Naturally, we have the issue of how a to-be-created GC detects and 
connects to a prior instance, and how this can be
made thread-safe. That's a topic for discussion ...

The Dr .....

 I didn't quite follow all of that, but here's something that Pragma 
 suggested
 many moons ago: if external link-units were managed by the "one & 
 only" GC, it
 could happily reap them when, and only when, there are no more live 
 references.

 I know that's a somewhat trivial statement, but it has powerful mojo;

I want to be able to write C-API DLLs in D, which I think would be the 
fly in that mojo.

Jan 26 2005

Kris <Kris_member pathlink.com> writes:

In article <ct9hki$afi$1 digitaldaemon.com>, Matthew says...
Here's how I think it might work, roughly:

    If the process is written in D, it'll have a GC built-in (statically
linked). [I agree with Walter 100% that 
processes should have a GC built in, and not have to rely on a DLL. Painful for
small process distribution, along with 
other DLL issues ...]
    Now any subsequent D dynamic link-unit, whether that be a DDL (i.e.
'exposing' D classes in an analogous fashion to 
Java's JARs), or a DLL (i.e. exposing a C API/interface, implemented in D),
when loaded, does not create its own GC but 
links to that of the process.
    Easy peasy, because its axiomatic that the life of any loaded dynamic
link-units cannot exceed the life of the host 
process

    If the process is *not* written in D, here's where it gets interesting.
IMO, it should be possible to have the same 
GC-locating mechanism attach a subsequent D link-unit (whether DDL or DLL). The
problem is, how do we keep the first D 
link-unit's code and data locked in memory after the application has unloaded
it.
    One answer would be to just increment the first link-unit's ref count (e.g.
another call to dlopen()/LoadLibrary()) 
and then discard it. This would work fine, never crash, but would cause issues
in long running servers that need to 
unload modules to pick up newer versions.
    Another answer would be that each GC-dependent link-unit would add such a
lock onto the first-one-in, and release it 
when they're released. There may be issues here, however, since one can get
into trouble (un)loading libs during lib 
(un)loading (albeit that I've never run into this outside C++.NET).
    In either case, a linked-to GC would have to expose a method for either
locking it, or passing its instance 
name/handle to be locked

Naturally, we have the issue of how a to-be-created GC detects and connects to
a prior instance, and how this can be 
made thread-safe. That's a topic for discussion ...

The Dr .....


I should point out all the tricky stuff you're talking about just goes away if
the GC is simply a shared-lib (DLL or DDL). May I ask, Matthew: what is your
discomfort with shared-libs? I ask because I just don't know why the Microsoft
O/S-related version-issue can't be resolved (to an acceptable degree) for one
specific instance (the GC). 

On a regular basis, I build commercial frameworks that dynamically load/unload
mobile code; part of any true solution has to allow for unloading said code, and
the mobile code itself should not be bloated out with multiple instances of the
GC implementation. 

Nor, come to think of it, should it be carrying all the floating-point support
that the damned Object.printf() brings in with it. Both go against Walter's "no
code bloat!" mantra and, franky, are completely unecessary. Oh, and before
anyone say's something about disk-space; the latter is purely about transmission
bandwidth & latency.

Still; I'll try not speculate upon the outcome.

- Kris


I said I'd shutup; I guess I lied :-(

Jan 26 2005

"Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:

"Kris" <Kris_member pathlink.com> wrote in message 
news:ct9ltc$es1$1 digitaldaemon.com...
 In article <ct9hki$afi$1 digitaldaemon.com>, Matthew says...
Here's how I think it might work, roughly:

    If the process is written in D, it'll have a GC built-in 
 (statically linked). [I agree with Walter 100% that
processes should have a GC built in, and not have to rely on a DLL. 
Painful for small process distribution, along with
other DLL issues ...]
    Now any subsequent D dynamic link-unit, whether that be a DDL 
 (i.e. 'exposing' D classes in an analogous fashion to
Java's JARs), or a DLL (i.e. exposing a C API/interface, implemented 
in D), when loaded, does not create its own GC but
links to that of the process.
    Easy peasy, because its axiomatic that the life of any loaded 
 dynamic link-units cannot exceed the life of the host
process

    If the process is *not* written in D, here's where it gets 
 interesting. IMO, it should be possible to have the same
GC-locating mechanism attach a subsequent D link-unit (whether DDL or 
DLL). The problem is, how do we keep the first D
link-unit's code and data locked in memory after the application has 
unloaded it.
    One answer would be to just increment the first link-unit's ref 
 count (e.g. another call to dlopen()/LoadLibrary())
and then discard it. This would work fine, never crash, but would 
cause issues in long running servers that need to
unload modules to pick up newer versions.
    Another answer would be that each GC-dependent link-unit would add 
 such a lock onto the first-one-in, and release it
when they're released. There may be issues here, however, since one 
can get into trouble (un)loading libs during lib
(un)loading (albeit that I've never run into this outside C++.NET).
    In either case, a linked-to GC would have to expose a method for 
 either locking it, or passing its instance
name/handle to be locked

Naturally, we have the issue of how a to-be-created GC detects and 
connects to a prior instance, and how this can be
made thread-safe. That's a topic for discussion ...

The Dr .....


 I should point out all the tricky stuff you're talking about just goes 
 away if
 the GC is simply a shared-lib (DLL or DDL).

True. But then one has the inordinate hassle of managing a Phobos.DLL 
for any little utility program.

Note: I use the word 'inordinate' there in context. Having to care about 
Phobos.DLL for a utility would, for me, rule out D as a language for 
implementing all the small tools and utilities (as in 
http://synesis.com.au/systools.html) that I write. That's a 100% 
certainty. Hence, I agree with Walter, _in this regard_, that Phobos 
should not have to be dynamically linked.

 May I ask, Matthew: what is your
 discomfort with shared-libs? I ask because I just don't know why the 
 Microsoft
 O/S-related version-issue can't be resolved (to an acceptable degree) 
 for one
 specific instance (the GC).

I don't want to have to write installers for simple programs / plug-in 
DLLs (e.g. shell extensions; http://shellext.com/).

Note: I *absolutely* want/must have Phobos in a DLL for serious / 
large-scale work. To not have it as such would as surely rule out D for 
my consideration for any large scale project.

Naturally, these provide a contradiction, without a straightforward 
solution. Therefore, I believe that the only viable solution is that D 
does something smarter than C/C++, and support both. It requires a 
degree of sophistication in the runtime arbitration between multiple GC 
creation events, but I hardly think this is an insurmountable problem. 
I'd be very surprised if this is something that we cannot all work 
through, or that will confound big-W's programming skill. In any case, I 
don't see that there's a (commercially) viable alternative.

 On a regular basis, I build commercial frameworks that dynamically 
 load/unload
 mobile code; part of any true solution has to allow for unloading said 
 code, and
 the mobile code itself should not be bloated out with multiple 
 instances of the
 GC implementation.

Exactly.

 Nor, come to think of it, should it be carrying all the floating-point 
 support
 that the damned Object.printf() brings in with it.

This is another issue, but one on which I completely agree.

 Both go against Walter's "no
 code bloat!" mantra and, franky, are completely unecessary.

Agreed.

 Oh, and before
 anyone say's something about disk-space; the latter is purely about 
 transmission
 bandwidth & latency.

Not so. There is a more fundamental objection: coupling should be 
minimised in all circumstances. I might offer something like "anyone 
exposed to enough software engineering on a commercial scale will come 
to believe this" but that'd get me involved in YABA, so I'll just say 
"it has been my unwavering experience, in a multitude of languages, 
technologies, application domains, that increasing coupling is bad, and 
decreasing coupling is good".

If someone were to look at D, then look at Phobos, then look at Object, 
then see the coupling between Object and printf() and the CRT, and 
follow their own experiential logical flow to come to the conclusion 
that "D is crap", I would wish to persuade them otherwise, but I could 
not fault their reasoning.

 Still; I'll try not speculate upon the outcome.

I would think that the recent, singular (?), success of our carping and 
whineing in getting a movement from big-W on the dynamic-linking issue 
offers all kinds of encouragment to those who are yet to hit their D 
sweet spot.

 - Kris

 I said I'd shutup; I guess I lied :-(

It's people like you, Kris, who have the knowledge, the general 
experience, and, perhaps most importantly, the significant real-world D 
experience, who provide the meat on the bones of the instinctual (some 
would say half-cocked) mutterings of the likes of me. Please don't shut 
up.


The Dr .....

Jan 26 2005

Kris <Kris_member pathlink.com> writes:

In article <ct9mvn$g05$1 digitaldaemon.com>, Matthew says...
I don't want to have to write installers for simple programs / plug-in 
DLLs (e.g. shell extensions; http://shellext.com/).

Note: I *absolutely* want/must have Phobos in a DLL for serious / 
large-scale work. To not have it as such would as surely rule out D for 
my consideration for any large scale project.

Naturally, these provide a contradiction, without a straightforward 
solution. Therefore, I believe that the only viable solution is that D 
does something smarter than C/C++, and support both. It requires a 
degree of sophistication in the runtime arbitration between multiple GC 
creation events, but I hardly think this is an insurmountable problem. 
I'd be very surprised if this is something that we cannot all work 
through, or that will confound big-W's programming skill. In any case, I 
don't see that there's a (commercially) viable alternative.

So how about this: 

1) the GC lives within a library (as it does now)

2) there is an optional DLL, compiled with the library GC, and there is a
seperate library shim to bind said DLL instead

3) the developer makes a choice to either (a) link with the library GC directly
or (b) link with the DLL GC shim

4) the choice is made by adding the DLL-shim library-name to the dmd
command-line, causing the linker to select the DLL GC rather than the default
statically-linked GC (the linker will not try to bind the static GC instance if
those symbols have already been satisfied)

Does that covers all bases? There's probably several other ways to acheive a
similar effect. Note that the default, and simple, behaviour is to statically
link the GC ...

- Kris

Jan 26 2005

"Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:

"Kris" <Kris_member pathlink.com> wrote in message 
news:ct9s2l$lo5$1 digitaldaemon.com...
 In article <ct9mvn$g05$1 digitaldaemon.com>, Matthew says...
I don't want to have to write installers for simple programs / plug-in
DLLs (e.g. shell extensions; http://shellext.com/).

Note: I *absolutely* want/must have Phobos in a DLL for serious /
large-scale work. To not have it as such would as surely rule out D 
for
my consideration for any large scale project.

Naturally, these provide a contradiction, without a straightforward
solution. Therefore, I believe that the only viable solution is that D
does something smarter than C/C++, and support both. It requires a
degree of sophistication in the runtime arbitration between multiple 
GC
creation events, but I hardly think this is an insurmountable problem.
I'd be very surprised if this is something that we cannot all work
through, or that will confound big-W's programming skill. In any case, 
I
don't see that there's a (commercially) viable alternative.

 So how about this:

 1) the GC lives within a library (as it does now)

 2) there is an optional DLL, compiled with the library GC, and there 
 is a
 seperate library shim to bind said DLL instead

 3) the developer makes a choice to either (a) link with the library GC 
 directly
 or (b) link with the DLL GC shim

 4) the choice is made by adding the DLL-shim library-name to the dmd
 command-line, causing the linker to select the DLL GC rather than the 
 default
 statically-linked GC (the linker will not try to bind the static GC 
 instance if
 those symbols have already been satisfied)

 Does that covers all bases? There's probably several other ways to 
 acheive a
 similar effect. Note that the default, and simple, behaviour is to 
 statically
 link the GC ...

What about an application, written in D and statically linked to a GC, 
that may or may not load a DDL to get some D classes, depending on its 
cmd-line params?

Jan 27 2005

Kris <Kris_member pathlink.com> writes:

In article <ctcbun$v1o$1 digitaldaemon.com>, Matthew says...
What about an application, written in D and statically linked to a GC, 
that may or may not load a DDL to get some D classes, depending on its 
cmd-line params? 

Right.

Well, I had assumed that any developer going to the trouble of supporting
dynamically loadable modules (providing a 'container') would have statically
linked to the DLL version GC, since all those lovely little loadable modules
will be using said DLL anyway. 

There are certain considerations that apply to containers, particularly those of
a dynamic variety. As such, I don't think it's much of a stretch to note that
such designs should use the DLL GC instead. In the end, that takes care of all
those hideously complex issues you noted prior, in a robust manner, and it's
simpler than /consistently/ following all those little details Walter added to
the DMD doc :-)

Having said that, Walter has at least provided the bare-bones. I'll utilize that
to provide a means of hiding the grubby details, such that both dynamic & static
linking of DLLs will be both thoroughly transparent and painless. 

IMO, this kind of thing should ideally be left to the O/S; not re-invented by
the language runtime (someone else had noted this, also). Sometimes one has to
sidestep the O/S, but in this case I don't feel the complexity tradeoffs are
reasonable. 

That is; I believe containers will be simpler and probably more robust if they
avoid trying to do some fancy internal sharing of multiple GC instances. Just
going with a single, shared GC instance, managed by the O/S is the better
option. That simplicity might hopefully lead to more people writing dynamically
loadable code, such as D Servlets. It also makes it easier for others to write
alternate GC implementations, without the added complexity of re-implementing
and thoroughly testing all that GC-sharing 'stuff'!

That's just my opinion, but it is the manner in which I will personally awaken
the two containers currently slumbering within Mango; along with the mobile-code
to go with them :-)

Lastly, I should note that this is just for the dynamic 'containment' style of
programming (the specific case we're talking about). Other types of programs
would link the GC in whatever means was appropriate to them (where static
linking of the static-library GC would be the default, static, behavior).

Thoughts, Matthew? And how many times can one legitimally say 'static' in a
single sentence?

- Kris

p.s. Pragma is building a container also, so I'd like to get his perspective on
this too.

Jan 27 2005

"Walter" <newshound digitalmars.com> writes:

"Kris" <Kris_member pathlink.com> wrote in message
news:ctcgoa$13ro$1 digitaldaemon.com...
 Having said that, Walter has at least provided the bare-bones. I'll

utilize that
 to provide a means of hiding the grubby details, such that both dynamic &

static
 linking of DLLs will be both thoroughly transparent and painless.

 IMO, this kind of thing should ideally be left to the O/S; not re-invented

by
 the language runtime (someone else had noted this, also). Sometimes one

has to
 sidestep the O/S, but in this case I don't feel the complexity tradeoffs

are
 reasonable.

Most of the time, all you need to do is cut & paste from the examples given.
One reason the details are shown is because D is a systems programming
language, and knowing the how & the why of the details means one is much
more likely to use it successfully. It also enables one to modify it for
special purposes.

I also agree that the OS should provide gc services. But I am not in a
position to design an OS <g>, so we must work with what we have.

Jan 28 2005

Kris <Kris_member pathlink.com> writes:

In article <ctd1v8$2239$1 digitaldaemon.com>, Walter says...
"Kris" <Kris_member pathlink.com> wrote in message
news:ctcgoa$13ro$1 digitaldaemon.com...
 Having said that, Walter has at least provided the bare-bones. I'll

utilize that
 to provide a means of hiding the grubby details, such that both dynamic &

static
 linking of DLLs will be both thoroughly transparent and painless.

 IMO, this kind of thing should ideally be left to the O/S; not re-invented

by
 the language runtime (someone else had noted this, also). Sometimes one

has to
 sidestep the O/S, but in this case I don't feel the complexity tradeoffs

are
 reasonable.

Most of the time, all you need to do is cut & paste from the examples given.
One reason the details are shown is because D is a systems programming
language, and knowing the how & the why of the details means one is much
more likely to use it successfully. It also enables one to modify it for
special purposes.

It's /great/ that you documented all the details! 

I also agree that the OS should provide gc services. But I am not in a
position to design an OS <g>, so we mst work with what we have.

We're misunderstanding each other, Walter. But there's nothing unusual about
that :-)

Thanks for addressing the issue. Everyone has their own idea of how to skin the
proverbial cat, but the end result is typically the same: one dead cat.

Can you perhaps enlighten us on how to contruct robust "soft-references"? It
appears that the GC disables all threads whilst reaping allocations, which could
then lead to deadlock between the GC and a soft-reference manager. 

Are all threads (except the GC) halted when a destructor is invoked?

Jan 28 2005

"Walter" <newshound digitalmars.com> writes:

"Kris" <Kris_member pathlink.com> wrote in message
news:cte22g$97t$1 digitaldaemon.com...
 Can you perhaps enlighten us on how to contruct robust "soft-references"?

The way to do it is construct a pool of those soft references, so the gc
won't reap them.

 It
 appears that the GC disables all threads whilst reaping allocations, which

could
 then lead to deadlock between the GC and a soft-reference manager.

 Are all threads (except the GC) halted when a destructor is invoked?

All the threads that the gc knows about (via std.thread). If you create a
thread directly, not using std.thread, the gc won't stop it, scan it, or
know anything about it.

Jan 28 2005

"Dave" <Dave_member pathlink.com> writes:

"Walter" <newshound digitalmars.com> wrote in message 
news:ctd1v8$2239$1 digitaldaemon.com...
 "Kris" <Kris_member pathlink.com> wrote in message
 news:ctcgoa$13ro$1 digitaldaemon.com...
 Having said that, Walter has at least provided the bare-bones. I'll

 utilize that
 to provide a means of hiding the grubby details, such that both dynamic &

 static
 linking of DLLs will be both thoroughly transparent and painless.


<snip>
 Most of the time, all you need to do is cut & paste from the examples 
 given.
 One reason the details are shown is because D is a systems programming
 language, and knowing the how & the why of the details means one is much
 more likely to use it successfully. It also enables one to modify it for
 special purposes.

<snip>

Walter - thank you for the DLL/GC addition!!

I gotta add my $0.02 on this though..

If the code inside DllMain, MyDLL_Initialize and MyDLL_Terminate can be 
handled by some sort of boiler-plate wrapper for 8 of 10 uses, I think it 
would be a /very/ good thing to provide it (while still allowing the 
developer to use the detailed version).

This would be especially true if it would make shared library development 
more portable between Win and the 'nix's for the majority of cases where the 
code in MyDLL_Initialize and MyDLL_Terminate can be handled by an import and 
a few wrapper functions like:

import std.gc;
import std.slinit;        // extern(C) { _minit(), etc... }

version (Windows) {

HINSTANCE g_hInst;

extern (Windows)
    BOOL DllMain(HINSTANCE hInstance, ULONG ulReason, LPVOID pvReserved)
{
    return SL_DllMain(hInstance,ulReason,g_hInst);
}

} // version (Windows)

export void MySharedLib_Initialize(void* gc)
{
    SL_Init(gc);
}

export void MySharedLib_Terminate()
{
    SL_Term();
}

I think it worth the effort just to minimize the code overhead (and learning 
curve and clutter) needed for most shared libs. But if it also turns out 
that the standard copy and paste code (of your example) needs to be 
different between Win and Linux, wrapper functions will make things that 
much more elegant for portable library development, IMHO.

To me, this would coincide with the D philosophy of hiding the messy details 
for the general case while still providing for their use if needed.

- Dave

Jan 28 2005

pragma <pragma_member pathlink.com> writes:

In article <ctcgoa$13ro$1 digitaldaemon.com>, Kris says...
p.s. Pragma is building a container also, so I'd like to get his perspective on
this too.

Hey, sure thing.

Not to dilute Kris' argument here, but I think that Walter has given us what was
needed for GC management between dll's and processes.  I haven't thought around
all the corners of the problem space yet, but it looks more and more to me that
using a separate dll for the GC may acutally further complicate things.  At
first, I didn't think this was so.  But the updated model now creates a 1-to-1
mapping between GC's and processes, irrespective of how many dll's are in use.
To me, that seems a damn fine solution, if not a step in the right direction.

That aside, the bigger issue is class management across dll boundaries.

Most applications do not need to worry about the validity of v-tables and
delegates, since the dll is usually freed at program termination (this goes
especially for static linking).  It is a problem that is not covered by the GC
at all, so it requires additional management; hence Kris' notion of
"Containers".

For those not familiar with the problem, here's what can easily happen.  Say I
have an export from a dll that returns an object of class "Foobar".  I then free
the dll since its no longer needed.  Finally, I attempt to print the contents of
Foobar.

 // given: mylibrary represents a dll
 void makeSegFault(){
    Foobar foo = mylibrary.getNewFoobar();
    mylibrary.unload();
    writeln(foo.toString());
 }

This will segfault since the vtable for 'foo' was a part of the dll.  

Thankfully, the recent GC enhancements allow us to at least keep foo's memory
footprint intact, but the methods are history.  Also, reloading the dll cannot
be reliably used to 'magically' restore that vtable.

This pattern is easier to create than one would think, especially when one is
cramming data into generic AA's and references become widely dispersed inside a
large system.


For DSP, the solution I'm going to use involves a combination of object-proxies
and reference counting of said proxies per dll.  A dll reload will not break
code, since the proxies can be prodded to re-constitute their dll-bound
counterparts.  This way, the proxies can be freely refrenced throughout the
application, save the dll they're interfacing with (feedback would be *bad*).


The only other airtight solution I can think of, would be to apply the GC
pattern to dll's.  This means that a dll is not unloaded until the heap is free
from all refrences into a dll's address space (lazy unloading via garbage
collection).  Adding a given dll's address space as a root to the GC should
cover this.  The only drawback here is that its effectively the same as the
present situation given that you cannot force a dll unload without potentially
breaking something; the real advantage of dll's is to load and unload at will.


Aside: does anyone know what happens if you touch a used .class file while a
Java app is running?  Can Java's ClassLoader be told to unload or reload a
class file that's in use?  I'm curious since I'd like to know how other
platforms have handled this space.


- EricAnderton at yahoo

Jan 28 2005

Kris <Kris_member pathlink.com> writes:

In article <ctdqsl$30sc$1 digitaldaemon.com>, pragma says...
In article <ctcgoa$13ro$1 digitaldaemon.com>, Kris says...
p.s. Pragma is building a container also, so I'd like to get his perspective on
this too.

Hey, sure thing.

Not to dilute Kris' argument here, but I think that Walter has given us what was
needed for GC management between dll's and processes.  I haven't thought around
all the corners of the problem space yet, but it looks more and more to me that
using a separate dll for the GC may acutally further complicate things.  At
first, I didn't think this was so.  But the updated model now creates a 1-to-1
mapping between GC's and processes, irrespective of how many dll's are in use.
To me, that seems a damn fine solution, if not a step in the right direction.

That aside, the bigger issue is class management across dll boundaries.

Most applications do not need to worry about the validity of v-tables and
delegates, since the dll is usually freed at program termination (this goes
especially for static linking).  It is a problem that is not covered by the GC
at all, so it requires additional management; hence Kris' notion of
"Containers".

For those not familiar with the problem, here's what can easily happen.  Say I
have an export from a dll that returns an object of class "Foobar".  I then free
the dll since its no longer needed.  Finally, I attempt to print the contents of
Foobar.

 // given: mylibrary represents a dll
 void makeSegFault(){
    Foobar foo = mylibrary.getNewFoobar();
    mylibrary.unload();
    writeln(foo.toString());
 }

This will segfault since the vtable for 'foo' was a part of the dll.  

Thankfully, the recent GC enhancements allow us to at least keep foo's memory
footprint intact, but the methods are history.  Also, reloading the dll cannot
be reliably used to 'magically' restore that vtable.

This pattern is easier to create than one would think, especially when one is
cramming data into generic AA's and references become widely dispersed inside a
large system.


For DSP, the solution I'm going to use involves a combination of object-proxies
and reference counting of said proxies per dll.  A dll reload will not break
code, since the proxies can be prodded to re-constitute their dll-bound
counterparts.  This way, the proxies can be freely refrenced throughout the
application, save the dll they're interfacing with (feedback would be *bad*).


The only other airtight solution I can think of, would be to apply the GC
pattern to dll's.  This means that a dll is not unloaded until the heap is free
from all refrences into a dll's address space (lazy unloading via garbage
collection).  Adding a given dll's address space as a root to the GC should
cover this.  The only drawback here is that its effectively the same as the
present situation given that you cannot force a dll unload without potentially
breaking something; the real advantage of dll's is to load and unload at will.


Aside: does anyone know what happens if you touch a used .class file while a
Java app is running?  Can Java's ClassLoader be told to unload or reload a
class file that's in use?  I'm curious since I'd like to know how other
platforms have handled this space.


- EricAnderton at yahoo

Good points, Pragma. Another thing to consider, regarding the explicit unloading
of DLLs, is the 'version' issue. If one replaces an existing instance of some
dynamically-loaded module with another, newer version, then the contract between
the container and any existing (remote) clients has effectively been broken.

I note this because each newer version should be loaded as such; as a distinct
and seperate instance in addition to any prior version instances. Doing so leads
to long-term stability.

The upshot is that such a container would not have a regular need to
/explicitly/ drop any particular (and previously loaded) module. Therefore, your
approach of using the GC to manage module 'liveness' is rather suitable. Placing
the GC within a DLL does not complicate this, as far as I can tell.

There's at least one tricky part there: how to know whether or not each
dynamically loaded-module is still actually loaded. I think soft-references
would alleviate that problem, and there are some ways to do that in D, although
there's a subtle danger of deadlock since it appears that the GC halts all other
threads when it reaps the heap :-(

Perhaps Walter could enlighten us on how to construct robust soft-references?

Thinking about this brings up another issue to consider; starting a thread from
within a DLL will potentially cause the GC, and the process, to fail. Something
to be careful of.

Jan 28 2005

pragma <pragma_member pathlink.com> writes:

In article <cte18s$82o$1 digitaldaemon.com>, Kris says...

Good points, Pragma. Another thing to consider, regarding the explicit unloading
of DLLs, is the 'version' issue. If one replaces an existing instance of some
dynamically-loaded module with another, newer version, then the contract between
the container and any existing (remote) clients has effectively been broken.

Yep.  This is why I've advocated that we all get into the habit of naming our
dlls with the version number as a part of the name.  It solves the majority of
these problems.  The other techniques I've proposed in the past, may very well
be suitable in an application-to-application manner.

Overall, this is an area where sufficent (and justified) pushback from Walter
would have us forge an open standard for this kind of thing.  

The upshot is that such a container would not have a regular need to
/explicitly/ drop any particular (and previously loaded) module. Therefore, your
approach of using the GC to manage module 'liveness' is rather suitable. 

I see where you're going with this.  Assuming that the only reason for a reload
is to grab a newer version, you don't need to unload the old one at all.

Placing the GC within a DLL does not complicate this, as far as I can tell.

I'm confused.  Did you mean "manage the dll with the GC" instead?


There's at least one tricky part there: how to know whether or not each
dynamically loaded-module is still actually loaded. I think soft-references
would alleviate that problem, and there are some ways to do that in D, although
there's a subtle danger of deadlock since it appears that the GC halts all other
threads when it reaps the heap :-(

Perhaps Walter could enlighten us on how to construct robust soft-references?

You're talking about having a soft (weak?) reference to the library in question,
correct?  Constructing weaak-refrences in D should be as easy as writing a
wrapper class that tells the GC to ignore the weak-pointer's address when
checking for roots.  Now, checking their validitiy is tough to solve, since the
GC doesn't expose any way to check if a pointer is under it's control (sure you
could use win32, but it's not portable)

And as for deadlock: what if the call to unload a library is called on the GC's
thread via a destructor?  Would that fix the problem?  I suppose if the dll held
some kind of mutex inside of dllmain, that it would cause trouble.  But this may
come back to "Best Practices" for managing such a mechanism.

Thinking about this brings up another issue to consider; starting a thread from
within a DLL will potentially cause the GC, and the process, to fail. Something
to be careful of.

I'll have to take your word for this.  Perhaps you can furnish me with a more
concrete example?  Unless you're inside of dllMain, there shouldn't be any side
effects that I'm aware of.  Also, the MSDN library has a slew of articles of
what to do and not to do inside of dllMain.  The gist of it all is that you
should do the absolute minimum needed inside that routine as to avoid problems
just within win32 itself.

- EricAnderton at yahoo

Jan 28 2005

Kris <Kris_member pathlink.com> writes:

In article <cte4c0$c8n$1 digitaldaemon.com>, pragma says...
In article <cte18s$82o$1 digitaldaemon.com>, Kris says...
Placing the GC within a DLL does not complicate this, as far as I can tell.

I'm confused.  Did you mean "manage the dll with the GC" instead?

Ahh; I was just referring to the earlier assertion that placing the GC itself
within a seperate DLL might actually increase complexity. I don't think it does,
but I could be wrong.


There's at least one tricky part there: how to know whether or not each
dynamically loaded-module is still actually loaded. I think soft-references
would alleviate that problem, and there are some ways to do that in D, although
there's a subtle danger of deadlock since it appears that the GC halts all other
threads when it reaps the heap :-(


And as for deadlock: what if the call to unload a library is called on the GC's
thread via a destructor?  Would that fix the problem?  I suppose if the dll held
some kind of mutex inside of dllmain, that it would cause trouble.  But this may
come back to "Best Practices" for managing such a mechanism.


Deadlock could occur if (a) the destructor is used to unload the module, (b) all
threads are halted whilst the GC runs (and hence during the destructor call),
and (c) the mutex protecting the "module is currently loaded" flag is held by
one of the stalled threads; one which was 'concurrently' asking for a handle to
that specific module. The GC thread would stall on that same mutex. 

One way around this would be to utilize a mutex-free queue, to stack up
destructor requests for unloading reaped module instances -- thereby decoupling
the GC from aforementioned mutex. 


Thinking about this brings up another issue to consider; starting a thread from
within a DLL will potentially cause the GC, and the process, to fail. Something
to be careful of.

I'll have to take your word for this.  Perhaps you can furnish me with a more
concrete example?  


If one assumes that the GC has a valid reason for halting all threads during a
sweep, then any thread it does not know about is a potential threat to
stability. Since Phobos (and thus std.Thread) is still linked statically, all
DLLs will have their own std.Thread instance, yet will be sharing a single GC. 

The single GC only knows about one instance of std.Thread, and subsequently can
only halt those threads created via that particular instance. Any thread created
via a DLL will be noted only within that DLL std.Thread pool, and thus will not
be stalled during a GC sweep. Therein lies trouble :-) 

Full resolution is conceptually trivial, but apparently controversial.

Jan 28 2005

Benji Smith <dlanguage xxagg.com> writes:

On Wed, 26 Jan 2005 14:19:04 -0800, "Walter"
<newshound digitalmars.com> wrote:

I agree. I'm working on it.

Fantastic. Thanks, Walter. I really appreciate how receptive you are
to the whims (er...I mean, the intelligent and informed opinions) of
the people in this ng.

--Benji

Jan 27 2005

D Programming

C/C++ Programming

Other

digitalmars.D - Dynamic Linking & Memory Management