www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Dynamic Linking & Memory Management

reply Benji Smith <dlanguage xxagg.com> writes:
I've been interested to read some of the recent discussion about the D
garbage collector, and I'd like to describe my current Java project to
give some perspective on what I think is ideal memory management.

I'm working on a technical analysis and simulation application for
historical stock market data. And despite the fact that I've written
about ten thousand lines of code myself, I'm using third-party
libraries for many aspects of the project. First of all, I'm using
JDBC drivers for MySQL and MS SQL Server. The MySQL drivers consist of
3 different JAR files that I include in my application's classpath.
The MS SQL Server drivers are contained in another 3 JAR files. I'm
also using the Xerces XML parser from the Apache group (3 more JARs),
the JFreeChart graphical charting compontents (5 more JARs), the JUnit
testing framework (1 JAR), a GNU commandline parsing library (1 JAR),
and a few other miscellaneous libraries. All told, I'm importing
functionality from more than fifteen different libraries. And the
application is still very small; by the time I’ve finished developing
it, I’ll probably be using twice as many libraries.

But when I write my code, I can write it as though I'm statically
linking with each of those libraries. I don't need to use special
export semantics when I need to call code from any of those third
party vendors. And with a few of those libraries (the commandline
parsing library in particular), I may end up writing my own
implementation. When I do, I won't have to change the semantics to
reflect the fact that I'm no longer using a compiled library. The rest
of my code can be completely agnostic to whether I'm linking with
source files, compiled class files, class files bundled into a JAR
package, classes generated at runtime through reflection hooks, or
classes loaded dynamically using a custom classloader. Since my
application needs to support third-party plug-in development (users
can load their own classes as custom charting indicators), dynamic
runtime loading of classes is essential to my design.

Ubiquitous static-linking would not be an option for me with this
application.

But there’s another important issue, too: debugging.

Last week I discovered a memory leak somewhere in my application. If I
allowed some of the analysis code to run for a few hours--combing
through all 18 million data points from the last 25 years of stock
market data--the heap would grow from its initial allocation (8 MB) to
its maximum allocation (256 MB). Luckily for me, all of those
allocations take place within a single virtual machine, which uses a
single garbage collector to manage all of the memory from all of the
libraries I'm using. That allows me to use a profiling application to
monitor the allocations of all the objects in the heap and--much more
importantly--to find out which objects are holding references to which
other objects. Within moments, I could see that the JDBC allocations
were getting cleaned up properly, but one of my custom collection
classes was failing—about 2% of the time—to release object references
it was no longer using. After an hour or so of tinkering with the
profiler, I was able to track down and fix that memory leak. The
application now uses a steady 12 MB of heap memory, no matter how long
it runs.

If I were building the same application with D, there would be fifteen
different garbage collectors operating in fifteen different
heap-spaces, and the objects allocated within one heap space might be
referenced by objects in another heap space, each managed by a
different garbage collector. It would be much more difficult to
develop a heap profiling tool that could successfully allow a
developer to navigate through such a fragmented heap space,
particularly if the developer needed to figure out which GC was
supposed to collect each out-of-reach object. Tracking down and fixing
that memory leak probably would have taken a lot longer than an hour
and a half.

Consequently, I strongly support the development of a model within D
that allows for a single GC instance per process. Any other scenario
sounds like a development & debugging nightmare.
Jan 26 2005
next sibling parent reply Kris <Kris_member pathlink.com> writes:
The gods be praised! Good example, Benji.

This is why it's been noted that D would not pass muster, in much of the
all-important commercial field. 

- Kris


In article <ce4gv0dk6v1qgde6o94msecgi4huasdhga 4ax.com>, Benji Smith says...
I've been interested to read some of the recent discussion about the D
garbage collector, and I'd like to describe my current Java project to
give some perspective on what I think is ideal memory management.

I'm working on a technical analysis and simulation application for
historical stock market data. And despite the fact that I've written
about ten thousand lines of code myself, I'm using third-party
libraries for many aspects of the project. First of all, I'm using
JDBC drivers for MySQL and MS SQL Server. The MySQL drivers consist of
3 different JAR files that I include in my application's classpath.
The MS SQL Server drivers are contained in another 3 JAR files. I'm
also using the Xerces XML parser from the Apache group (3 more JARs),
the JFreeChart graphical charting compontents (5 more JARs), the JUnit
testing framework (1 JAR), a GNU commandline parsing library (1 JAR),
and a few other miscellaneous libraries. All told, I'm importing
functionality from more than fifteen different libraries. And the
application is still very small; by the time I’ve finished developing
it, I’ll probably be using twice as many libraries.

But when I write my code, I can write it as though I'm statically
linking with each of those libraries. I don't need to use special
export semantics when I need to call code from any of those third
party vendors. And with a few of those libraries (the commandline
parsing library in particular), I may end up writing my own
implementation. When I do, I won't have to change the semantics to
reflect the fact that I'm no longer using a compiled library. The rest
of my code can be completely agnostic to whether I'm linking with
source files, compiled class files, class files bundled into a JAR
package, classes generated at runtime through reflection hooks, or
classes loaded dynamically using a custom classloader. Since my
application needs to support third-party plug-in development (users
can load their own classes as custom charting indicators), dynamic
runtime loading of classes is essential to my design.

Ubiquitous static-linking would not be an option for me with this
application.

But there’s another important issue, too: debugging.

Last week I discovered a memory leak somewhere in my application. If I
allowed some of the analysis code to run for a few hours--combing
through all 18 million data points from the last 25 years of stock
market data--the heap would grow from its initial allocation (8 MB) to
its maximum allocation (256 MB). Luckily for me, all of those
allocations take place within a single virtual machine, which uses a
single garbage collector to manage all of the memory from all of the
libraries I'm using. That allows me to use a profiling application to
monitor the allocations of all the objects in the heap and--much more
importantly--to find out which objects are holding references to which
other objects. Within moments, I could see that the JDBC allocations
were getting cleaned up properly, but one of my custom collection
classes was failing—about 2% of the time—to release object references
it was no longer using. After an hour or so of tinkering with the
profiler, I was able to track down and fix that memory leak. The
application now uses a steady 12 MB of heap memory, no matter how long
it runs.

If I were building the same application with D, there would be fifteen
different garbage collectors operating in fifteen different
heap-spaces, and the objects allocated within one heap space might be
referenced by objects in another heap space, each managed by a
different garbage collector. It would be much more difficult to
develop a heap profiling tool that could successfully allow a
developer to navigate through such a fragmented heap space,
particularly if the developer needed to figure out which GC was
supposed to collect each out-of-reach object. Tracking down and fixing
that memory leak probably would have taken a lot longer than an hour
and a half.

Consequently, I strongly support the development of a model within D
that allows for a single GC instance per process. Any other scenario
sounds like a development & debugging nightmare.
Jan 26 2005
parent reply "Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:
Indeed.

I, for one, can live with D 1.0 not having dynamic class loading, but
that would be a sine qua non for 2.0. Furthermore, I think 1.0 must have
a cooperative/unified GC architecture between link units, otherwise,
again, we'll just be writing our DLLs in C and all non-compiled in code
to an app will be via D extensions (which, while fun to write
occasionaly, will get *really* tiresome).

Or maybe I'm missing some deeper truth on the viability of D. If so, can
someone please enlighten me so I can stop carping on like a harbinger of
Doom.

The Dr .....


"Kris" <Kris_member pathlink.com> wrote in message
news:ct95fa$2tv5$1 digitaldaemon.com...
 The gods be praised! Good example, Benji.

 This is why it's been noted that D would not pass muster, in much of
 the
 all-important commercial field.

 - Kris


 In article <ce4gv0dk6v1qgde6o94msecgi4huasdhga 4ax.com>, Benji Smith
 says...
I've been interested to read some of the recent discussion about the D
garbage collector, and I'd like to describe my current Java project to
give some perspective on what I think is ideal memory management.

I'm working on a technical analysis and simulation application for
historical stock market data. And despite the fact that I've written
about ten thousand lines of code myself, I'm using third-party
libraries for many aspects of the project. First of all, I'm using
JDBC drivers for MySQL and MS SQL Server. The MySQL drivers consist of
3 different JAR files that I include in my application's classpath.
The MS SQL Server drivers are contained in another 3 JAR files. I'm
also using the Xerces XML parser from the Apache group (3 more JARs),
the JFreeChart graphical charting compontents (5 more JARs), the JUnit
testing framework (1 JAR), a GNU commandline parsing library (1 JAR),
and a few other miscellaneous libraries. All told, I'm importing
functionality from more than fifteen different libraries. And the
application is still very small; by the time I've finished developing
it, I'll probably be using twice as many libraries.

But when I write my code, I can write it as though I'm statically
linking with each of those libraries. I don't need to use special
export semantics when I need to call code from any of those third
party vendors. And with a few of those libraries (the commandline
parsing library in particular), I may end up writing my own
implementation. When I do, I won't have to change the semantics to
reflect the fact that I'm no longer using a compiled library. The rest
of my code can be completely agnostic to whether I'm linking with
source files, compiled class files, class files bundled into a JAR
package, classes generated at runtime through reflection hooks, or
classes loaded dynamically using a custom classloader. Since my
application needs to support third-party plug-in development (users
can load their own classes as custom charting indicators), dynamic
runtime loading of classes is essential to my design.

Ubiquitous static-linking would not be an option for me with this
application.

But there's another important issue, too: debugging.

Last week I discovered a memory leak somewhere in my application. If I
allowed some of the analysis code to run for a few hours--combing
through all 18 million data points from the last 25 years of stock
market data--the heap would grow from its initial allocation (8 MB) to
its maximum allocation (256 MB). Luckily for me, all of those
allocations take place within a single virtual machine, which uses a
single garbage collector to manage all of the memory from all of the
libraries I'm using. That allows me to use a profiling application to
monitor the allocations of all the objects in the heap and--much more
importantly--to find out which objects are holding references to which
other objects. Within moments, I could see that the JDBC allocations
were getting cleaned up properly, but one of my custom collection
classes was failing-about 2% of the time-to release object references
it was no longer using. After an hour or so of tinkering with the
profiler, I was able to track down and fix that memory leak. The
application now uses a steady 12 MB of heap memory, no matter how long
it runs.

If I were building the same application with D, there would be fifteen
different garbage collectors operating in fifteen different
heap-spaces, and the objects allocated within one heap space might be
referenced by objects in another heap space, each managed by a
different garbage collector. It would be much more difficult to
develop a heap profiling tool that could successfully allow a
developer to navigate through such a fragmented heap space,
particularly if the developer needed to figure out which GC was
supposed to collect each out-of-reach object. Tracking down and fixing
that memory leak probably would have taken a lot longer than an hour
and a half.

Consequently, I strongly support the development of a model within D
that allows for a single GC instance per process. Any other scenario
sounds like a development & debugging nightmare.
Jan 26 2005
parent reply Kris <Kris_member pathlink.com> writes:
Encore!

And lest we forget: D would happily support the all-important unified GC (per
process) if the GC were simply moved to a shared-lib. Further, Sean has already
invested the majority of effort in carefully extracting the GC from the runtime,
such that this is a near reality.

Walter notes that he's had customer-support difficulties in the past over
shared-libs, due to the vagaries of Win32. Unfortunately, that negative
experience is being reflected directly in the range of valid programming models
effectively supported by the D language. 

We /really/ need to move forward on this issue. Perhaps we can start (yet again)
by asking why Walter feels we're all so much better off with a proliferation of
GC instances instead of just one, easily manageable, instance?

- Kris


In article <ct968s$2uv3$1 digitaldaemon.com>, Matthew says...
Indeed.

I, for one, can live with D 1.0 not having dynamic class loading, but
that would be a sine qua non for 2.0. Furthermore, I think 1.0 must have
a cooperative/unified GC architecture between link units, otherwise,
again, we'll just be writing our DLLs in C and all non-compiled in code
to an app will be via D extensions (which, while fun to write
occasionaly, will get *really* tiresome).

Or maybe I'm missing some deeper truth on the viability of D. If so, can
someone please enlighten me so I can stop carping on like a harbinger of
Doom.

The Dr .....


"Kris" <Kris_member pathlink.com> wrote in message
news:ct95fa$2tv5$1 digitaldaemon.com...
 The gods be praised! Good example, Benji.

 This is why it's been noted that D would not pass muster, in much of
 the
 all-important commercial field.

 - Kris


 In article <ce4gv0dk6v1qgde6o94msecgi4huasdhga 4ax.com>, Benji Smith
 says...
I've been interested to read some of the recent discussion about the D
garbage collector, and I'd like to describe my current Java project to
give some perspective on what I think is ideal memory management.

I'm working on a technical analysis and simulation application for
historical stock market data. And despite the fact that I've written
about ten thousand lines of code myself, I'm using third-party
libraries for many aspects of the project. First of all, I'm using
JDBC drivers for MySQL and MS SQL Server. The MySQL drivers consist of
3 different JAR files that I include in my application's classpath.
The MS SQL Server drivers are contained in another 3 JAR files. I'm
also using the Xerces XML parser from the Apache group (3 more JARs),
the JFreeChart graphical charting compontents (5 more JARs), the JUnit
testing framework (1 JAR), a GNU commandline parsing library (1 JAR),
and a few other miscellaneous libraries. All told, I'm importing
functionality from more than fifteen different libraries. And the
application is still very small; by the time I've finished developing
it, I'll probably be using twice as many libraries.

But when I write my code, I can write it as though I'm statically
linking with each of those libraries. I don't need to use special
export semantics when I need to call code from any of those third
party vendors. And with a few of those libraries (the commandline
parsing library in particular), I may end up writing my own
implementation. When I do, I won't have to change the semantics to
reflect the fact that I'm no longer using a compiled library. The rest
of my code can be completely agnostic to whether I'm linking with
source files, compiled class files, class files bundled into a JAR
package, classes generated at runtime through reflection hooks, or
classes loaded dynamically using a custom classloader. Since my
application needs to support third-party plug-in development (users
can load their own classes as custom charting indicators), dynamic
runtime loading of classes is essential to my design.

Ubiquitous static-linking would not be an option for me with this
application.

But there's another important issue, too: debugging.

Last week I discovered a memory leak somewhere in my application. If I
allowed some of the analysis code to run for a few hours--combing
through all 18 million data points from the last 25 years of stock
market data--the heap would grow from its initial allocation (8 MB) to
its maximum allocation (256 MB). Luckily for me, all of those
allocations take place within a single virtual machine, which uses a
single garbage collector to manage all of the memory from all of the
libraries I'm using. That allows me to use a profiling application to
monitor the allocations of all the objects in the heap and--much more
importantly--to find out which objects are holding references to which
other objects. Within moments, I could see that the JDBC allocations
were getting cleaned up properly, but one of my custom collection
classes was failing-about 2% of the time-to release object references
it was no longer using. After an hour or so of tinkering with the
profiler, I was able to track down and fix that memory leak. The
application now uses a steady 12 MB of heap memory, no matter how long
it runs.

If I were building the same application with D, there would be fifteen
different garbage collectors operating in fifteen different
heap-spaces, and the objects allocated within one heap space might be
referenced by objects in another heap space, each managed by a
different garbage collector. It would be much more difficult to
develop a heap profiling tool that could successfully allow a
developer to navigate through such a fragmented heap space,
particularly if the developer needed to figure out which GC was
supposed to collect each out-of-reach object. Tracking down and fixing
that memory leak probably would have taken a lot longer than an hour
and a half.

Consequently, I strongly support the development of a model within D
that allows for a single GC instance per process. Any other scenario
sounds like a development & debugging nightmare.
Jan 26 2005
parent reply "Matthew" <admin.hat stlsoft.dot.org> writes:
 And lest we forget: D would happily support the all-important unified GC (per
 process) if the GC were simply moved to a shared-lib. Further, Sean has already
 invested the majority of effort in carefully extracting the GC from the
runtime,
 such that this is a near reality.
As I've said before, I want both models supportable, and I don't accept that this is technically infeasible. Notwithstanding, if both are not, then we must go for a separation between the pure statically linked model and the dynamically linked GC model. If we stay with static linking only, D's a joke, isn't it?
 Walter notes that he's had customer-support difficulties in the past over
 shared-libs, due to the vagaries of Win32. Unfortunately, that negative
 experience is being reflected directly in the range of valid programming models
 effectively supported by the D language.
I agree. And I think this will kill D. As I've whinged and whined on, I can't understand how Walter thinks that D will be viable with the status quo. Alas, though Walter has huge amounts of valuable experience and insight (more than mine, I would hazard), I think he fails to recognise, or at least act on, two important facts: 1. he doesn't have *all* experience. None of us do. And, much more importantly, ... 2. many of us do not have any serious problems doing *very successful* (see below) work in C/C++. If D is not a quantum leap forward, _without_ new hassles, then why the hell is anyone ever going to use it? Because it's better than Java?!? Pah!
 We /really/ need to move forward on this issue. Perhaps we can start (yet
again)
 by asking why Walter feels we're all so much better off with a proliferation of
 GC instances instead of just one, easily manageable, instance?
We're not better off with that. We're nowhere with that! Someone turn out the lights on their way out ... The Dr ..... (below to be seen): I've worked on several highly commercially important projects over the last several years, most of which have been (primarily) implemented in C++. All the guff that people generally whinge on about as problems in C++ have proved either non-existant, irrelevant, or easily amenable to good practice. Some of these are still in production, 2, 4, 5 years later, and have never had a millisecond of downtime. So why do we need D, if it's going to be hassle-bundled?
 In article <ct968s$2uv3$1 digitaldaemon.com>, Matthew says...
Indeed.

I, for one, can live with D 1.0 not having dynamic class loading, but
that would be a sine qua non for 2.0. Furthermore, I think 1.0 must have
a cooperative/unified GC architecture between link units, otherwise,
again, we'll just be writing our DLLs in C and all non-compiled in code
to an app will be via D extensions (which, while fun to write
occasionaly, will get *really* tiresome).

Or maybe I'm missing some deeper truth on the viability of D. If so, can
someone please enlighten me so I can stop carping on like a harbinger of
Doom.

The Dr .....


"Kris" <Kris_member pathlink.com> wrote in message
news:ct95fa$2tv5$1 digitaldaemon.com...
 The gods be praised! Good example, Benji.

 This is why it's been noted that D would not pass muster, in much of
 the
 all-important commercial field.

 - Kris


 In article <ce4gv0dk6v1qgde6o94msecgi4huasdhga 4ax.com>, Benji Smith
 says...
I've been interested to read some of the recent discussion about the D
garbage collector, and I'd like to describe my current Java project to
give some perspective on what I think is ideal memory management.

I'm working on a technical analysis and simulation application for
historical stock market data. And despite the fact that I've written
about ten thousand lines of code myself, I'm using third-party
libraries for many aspects of the project. First of all, I'm using
JDBC drivers for MySQL and MS SQL Server. The MySQL drivers consist of
3 different JAR files that I include in my application's classpath.
The MS SQL Server drivers are contained in another 3 JAR files. I'm
also using the Xerces XML parser from the Apache group (3 more JARs),
the JFreeChart graphical charting compontents (5 more JARs), the JUnit
testing framework (1 JAR), a GNU commandline parsing library (1 JAR),
and a few other miscellaneous libraries. All told, I'm importing
functionality from more than fifteen different libraries. And the
application is still very small; by the time I've finished developing
it, I'll probably be using twice as many libraries.

But when I write my code, I can write it as though I'm statically
linking with each of those libraries. I don't need to use special
export semantics when I need to call code from any of those third
party vendors. And with a few of those libraries (the commandline
parsing library in particular), I may end up writing my own
implementation. When I do, I won't have to change the semantics to
reflect the fact that I'm no longer using a compiled library. The rest
of my code can be completely agnostic to whether I'm linking with
source files, compiled class files, class files bundled into a JAR
package, classes generated at runtime through reflection hooks, or
classes loaded dynamically using a custom classloader. Since my
application needs to support third-party plug-in development (users
can load their own classes as custom charting indicators), dynamic
runtime loading of classes is essential to my design.

Ubiquitous static-linking would not be an option for me with this
application.

But there's another important issue, too: debugging.

Last week I discovered a memory leak somewhere in my application. If I
allowed some of the analysis code to run for a few hours--combing
through all 18 million data points from the last 25 years of stock
market data--the heap would grow from its initial allocation (8 MB) to
its maximum allocation (256 MB). Luckily for me, all of those
allocations take place within a single virtual machine, which uses a
single garbage collector to manage all of the memory from all of the
libraries I'm using. That allows me to use a profiling application to
monitor the allocations of all the objects in the heap and--much more
importantly--to find out which objects are holding references to which
other objects. Within moments, I could see that the JDBC allocations
were getting cleaned up properly, but one of my custom collection
classes was failing-about 2% of the time-to release object references
it was no longer using. After an hour or so of tinkering with the
profiler, I was able to track down and fix that memory leak. The
application now uses a steady 12 MB of heap memory, no matter how long
it runs.

If I were building the same application with D, there would be fifteen
different garbage collectors operating in fifteen different
heap-spaces, and the objects allocated within one heap space might be
referenced by objects in another heap space, each managed by a
different garbage collector. It would be much more difficult to
develop a heap profiling tool that could successfully allow a
developer to navigate through such a fragmented heap space,
particularly if the developer needed to figure out which GC was
supposed to collect each out-of-reach object. Tracking down and fixing
that memory leak probably would have taken a lot longer than an hour
and a half.

Consequently, I strongly support the development of a model within D
that allows for a single GC instance per process. Any other scenario
sounds like a development & debugging nightmare.
Jan 26 2005
parent reply "Matthew" <admin.hat stlsoft.dot.org> writes:
"Matthew" <admin.hat stlsoft.dot.org> wrote in message
news:ct9gl9$9k4$1 digitaldaemon.com...
    2. many of us do not have any serious problems doing *very successful* (see
below) work in C/C++. If D is not a 
 quantum leap forward, _without_ new hassles, then why the hell is anyone ever
going to use it? Because it's better 
 than Java?!? Pah!
btw, Java-phobic hyperbole aside, I should point out that until D can handle scenarios such as outlined in Benji's excellent post, D isn't even fit to kiss the bloated arse of Java. And that's a sad position to be in, to be sure ...
Jan 26 2005
parent reply John Reimer <brk_6502 yahoo.com> writes:
On Thu, 27 Jan 2005 12:42:57 +1100, Matthew wrote:

 
 "Matthew" <admin.hat stlsoft.dot.org> wrote in message
news:ct9gl9$9k4$1 digitaldaemon.com...
    2. many of us do not have any serious problems doing *very successful* (see
below) work in C/C++. If D is not a 
 quantum leap forward, _without_ new hassles, then why the hell is anyone ever
going to use it? Because it's better 
 than Java?!? Pah!
btw, Java-phobic hyperbole aside, I should point out that until D can handle scenarios such as outlined in Benji's excellent post, D isn't even fit to kiss the bloated arse of Java. And that's a sad position to be in, to be sure ...
Umm... have you updated your newsreader? Did you see Walter's recent response in this topic? It seems the point has been taken. :-) - John R.
Jan 26 2005
parent "Matthew" <admin.hat stlsoft.dot.org> writes:
"John Reimer" <brk_6502 yahoo.com> wrote in message
news:pan.2005.01.27.01.43.39.773589 yahoo.com...
 On Thu, 27 Jan 2005 12:42:57 +1100, Matthew wrote:

 "Matthew" <admin.hat stlsoft.dot.org> wrote in message
news:ct9gl9$9k4$1 digitaldaemon.com...
    2. many of us do not have any serious problems doing *very successful* (see
below) work in C/C++. If D is not a
 quantum leap forward, _without_ new hassles, then why the hell is anyone ever
going to use it? Because it's better
 than Java?!? Pah!
btw, Java-phobic hyperbole aside, I should point out that until D can handle scenarios such as outlined in Benji's excellent post, D isn't even fit to kiss the bloated arse of Java. And that's a sad position to be in, to be sure ...
Umm... have you updated your newsreader? Did you see Walter's recent response in this topic?
Been having issues with my cable, and local net.
 It seems the point has been taken. :-)
Coolio. We can get back to telling big-W what a star we think he is, now. <CG>
Jan 26 2005
prev sibling next sibling parent "Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:
Hear, hear! (And thanks for spending the time to write such an eloquent 
and persuasive post.)

I not only agree with everything you say, I think we should overtly 
state that without support such as this, D is Doomed: Dead, Duck-like, 
save for small self-contained utility programs (for which C, never mind 
C++, suffices adequately, IMO).


"Benji Smith" <dlanguage xxagg.com> wrote in message 
news:ce4gv0dk6v1qgde6o94msecgi4huasdhga 4ax.com...
 I've been interested to read some of the recent discussion about the D
 garbage collector, and I'd like to describe my current Java project to
 give some perspective on what I think is ideal memory management.

 I'm working on a technical analysis and simulation application for
 historical stock market data. And despite the fact that I've written
 about ten thousand lines of code myself, I'm using third-party
 libraries for many aspects of the project. First of all, I'm using
 JDBC drivers for MySQL and MS SQL Server. The MySQL drivers consist of
 3 different JAR files that I include in my application's classpath.
 The MS SQL Server drivers are contained in another 3 JAR files. I'm
 also using the Xerces XML parser from the Apache group (3 more JARs),
 the JFreeChart graphical charting compontents (5 more JARs), the JUnit
 testing framework (1 JAR), a GNU commandline parsing library (1 JAR),
 and a few other miscellaneous libraries. All told, I'm importing
 functionality from more than fifteen different libraries. And the
 application is still very small; by the time I've finished developing
 it, I'll probably be using twice as many libraries.

 But when I write my code, I can write it as though I'm statically
 linking with each of those libraries. I don't need to use special
 export semantics when I need to call code from any of those third
 party vendors. And with a few of those libraries (the commandline
 parsing library in particular), I may end up writing my own
 implementation. When I do, I won't have to change the semantics to
 reflect the fact that I'm no longer using a compiled library. The rest
 of my code can be completely agnostic to whether I'm linking with
 source files, compiled class files, class files bundled into a JAR
 package, classes generated at runtime through reflection hooks, or
 classes loaded dynamically using a custom classloader. Since my
 application needs to support third-party plug-in development (users
 can load their own classes as custom charting indicators), dynamic
 runtime loading of classes is essential to my design.

 Ubiquitous static-linking would not be an option for me with this
 application.

 But there's another important issue, too: debugging.

 Last week I discovered a memory leak somewhere in my application. If I
 allowed some of the analysis code to run for a few hours--combing
 through all 18 million data points from the last 25 years of stock
 market data--the heap would grow from its initial allocation (8 MB) to
 its maximum allocation (256 MB). Luckily for me, all of those
 allocations take place within a single virtual machine, which uses a
 single garbage collector to manage all of the memory from all of the
 libraries I'm using. That allows me to use a profiling application to
 monitor the allocations of all the objects in the heap and--much more
 importantly--to find out which objects are holding references to which
 other objects. Within moments, I could see that the JDBC allocations
 were getting cleaned up properly, but one of my custom collection
 classes was failing-about 2% of the time-to release object references
 it was no longer using. After an hour or so of tinkering with the
 profiler, I was able to track down and fix that memory leak. The
 application now uses a steady 12 MB of heap memory, no matter how long
 it runs.

 If I were building the same application with D, there would be fifteen
 different garbage collectors operating in fifteen different
 heap-spaces, and the objects allocated within one heap space might be
 referenced by objects in another heap space, each managed by a
 different garbage collector. It would be much more difficult to
 develop a heap profiling tool that could successfully allow a
 developer to navigate through such a fragmented heap space,
 particularly if the developer needed to figure out which GC was
 supposed to collect each out-of-reach object. Tracking down and fixing
 that memory leak probably would have taken a lot longer than an hour
 and a half.

 Consequently, I strongly support the development of a model within D
 that allows for a single GC instance per process. Any other scenario
 sounds like a development & debugging nightmare.
 
Jan 26 2005
prev sibling parent reply "Walter" <newshound digitalmars.com> writes:
"Benji Smith" <dlanguage xxagg.com> wrote in message
news:ce4gv0dk6v1qgde6o94msecgi4huasdhga 4ax.com...
 If I were building the same application with D, there would be fifteen
 different garbage collectors operating in fifteen different
 heap-spaces, and the objects allocated within one heap space might be
 referenced by objects in another heap space, each managed by a
 different garbage collector. It would be much more difficult to
 develop a heap profiling tool that could successfully allow a
 developer to navigate through such a fragmented heap space,
 particularly if the developer needed to figure out which GC was
 supposed to collect each out-of-reach object. Tracking down and fixing
 that memory leak probably would have taken a lot longer than an hour
 and a half.

 Consequently, I strongly support the development of a model within D
 that allows for a single GC instance per process. Any other scenario
 sounds like a development & debugging nightmare.
I agree. I'm working on it.
Jan 26 2005
next sibling parent reply Kris <Kris_member pathlink.com> writes:
In article <ct9afo$2l8$1 digitaldaemon.com>, Walter says...
"Benji Smith" <dlanguage xxagg.com> wrote in message
news:ce4gv0dk6v1qgde6o94msecgi4huasdhga 4ax.com...
 If I were building the same application with D, there would be fifteen
 different garbage collectors operating in fifteen different
 heap-spaces, and the objects allocated within one heap space might be
 referenced by objects in another heap space, each managed by a
 different garbage collector. It would be much more difficult to
 develop a heap profiling tool that could successfully allow a
 developer to navigate through such a fragmented heap space,
 particularly if the developer needed to figure out which GC was
 supposed to collect each out-of-reach object. Tracking down and fixing
 that memory leak probably would have taken a lot longer than an hour
 and a half.

 Consequently, I strongly support the development of a model within D
 that allows for a single GC instance per process. Any other scenario
 sounds like a development & debugging nightmare.
I agree. I'm working on it.
Hallelujah! I will now shutup about this for a while :-) (can I get another Hallelujah?)
Jan 26 2005
parent John Reimer <brk_6502 yahoo.com> writes:
On Thu, 27 Jan 2005 00:16:37 +0000, Kris wrote:

 In article <ct9afo$2l8$1 digitaldaemon.com>, Walter says...
I agree. I'm working on it.
Hallelujah! I will now shutup about this for a while :-) (can I get another Hallelujah?)
:-D This is indeed good news! And Walter, thanks for listening. Nice to see that you can be so resilient despite being pounded like a fence post (but it was for a good cause). :-) - John R.
Jan 26 2005
prev sibling next sibling parent reply "Matthew" <admin.hat stlsoft.dot.org> writes:
"Walter" <newshound digitalmars.com> wrote in message
news:ct9afo$2l8$1 digitaldaemon.com...
 "Benji Smith" <dlanguage xxagg.com> wrote in message
 news:ce4gv0dk6v1qgde6o94msecgi4huasdhga 4ax.com...
 If I were building the same application with D, there would be fifteen
 different garbage collectors operating in fifteen different
 heap-spaces, and the objects allocated within one heap space might be
 referenced by objects in another heap space, each managed by a
 different garbage collector. It would be much more difficult to
 develop a heap profiling tool that could successfully allow a
 developer to navigate through such a fragmented heap space,
 particularly if the developer needed to figure out which GC was
 supposed to collect each out-of-reach object. Tracking down and fixing
 that memory leak probably would have taken a lot longer than an hour
 and a half.

 Consequently, I strongly support the development of a model within D
 that allows for a single GC instance per process. Any other scenario
 sounds like a development & debugging nightmare.
I agree. I'm working on it.
Cool. I can shut my overbusy chops now! :-) Please may we have intelligent GCs that can detect each other, within a process space, and connect up in a sensible, and dynamic-lib-unloading proof fashion? Here's how I think it might work, roughly: If the process is written in D, it'll have a GC built-in (statically linked). [I agree with Walter 100% that processes should have a GC built in, and not have to rely on a DLL. Painful for small process distribution, along with other DLL issues ...] Now any subsequent D dynamic link-unit, whether that be a DDL (i.e. 'exposing' D classes in an analogous fashion to Java's JARs), or a DLL (i.e. exposing a C API/interface, implemented in D), when loaded, does not create its own GC but links to that of the process. Easy peasy, because its axiomatic that the life of any loaded dynamic link-units cannot exceed the life of the host process If the process is *not* written in D, here's where it gets interesting. IMO, it should be possible to have the same GC-locating mechanism attach a subsequent D link-unit (whether DDL or DLL). The problem is, how do we keep the first D link-unit's code and data locked in memory after the application has unloaded it. One answer would be to just increment the first link-unit's ref count (e.g. another call to dlopen()/LoadLibrary()) and then discard it. This would work fine, never crash, but would cause issues in long running servers that need to unload modules to pick up newer versions. Another answer would be that each GC-dependent link-unit would add such a lock onto the first-one-in, and release it when they're released. There may be issues here, however, since one can get into trouble (un)loading libs during lib (un)loading (albeit that I've never run into this outside C++.NET). In either case, a linked-to GC would have to expose a method for either locking it, or passing its instance name/handle to be locked Naturally, we have the issue of how a to-be-created GC detects and connects to a prior instance, and how this can be made thread-safe. That's a topic for discussion ... The Dr .....
Jan 26 2005
next sibling parent reply Kris <Kris_member pathlink.com> writes:
In article <ct9hki$afi$1 digitaldaemon.com>, Matthew says...
Here's how I think it might work, roughly:

    If the process is written in D, it'll have a GC built-in (statically
linked). [I agree with Walter 100% that 
processes should have a GC built in, and not have to rely on a DLL. Painful for
small process distribution, along with 
other DLL issues ...]
    Now any subsequent D dynamic link-unit, whether that be a DDL (i.e.
'exposing' D classes in an analogous fashion to 
Java's JARs), or a DLL (i.e. exposing a C API/interface, implemented in D),
when loaded, does not create its own GC but 
links to that of the process.
    Easy peasy, because its axiomatic that the life of any loaded dynamic
link-units cannot exceed the life of the host 
process

    If the process is *not* written in D, here's where it gets interesting.
IMO, it should be possible to have the same 
GC-locating mechanism attach a subsequent D link-unit (whether DDL or DLL). The
problem is, how do we keep the first D 
link-unit's code and data locked in memory after the application has unloaded
it.
    One answer would be to just increment the first link-unit's ref count (e.g.
another call to dlopen()/LoadLibrary()) 
and then discard it. This would work fine, never crash, but would cause issues
in long running servers that need to 
unload modules to pick up newer versions.
    Another answer would be that each GC-dependent link-unit would add such a
lock onto the first-one-in, and release it 
when they're released. There may be issues here, however, since one can get
into trouble (un)loading libs during lib 
(un)loading (albeit that I've never run into this outside C++.NET).
    In either case, a linked-to GC would have to expose a method for either
locking it, or passing its instance 
name/handle to be locked

Naturally, we have the issue of how a to-be-created GC detects and connects to
a prior instance, and how this can be 
made thread-safe. That's a topic for discussion ...

The Dr .....
I didn't quite follow all of that, but here's something that Pragma suggested many moons ago: if external link-units were managed by the "one & only" GC, it could happily reap them when, and only when, there are no more live references. I know that's a somewhat trivial statement, but it has powerful mojo;
Jan 26 2005
parent "Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:
"Kris" <Kris_member pathlink.com> wrote in message 
news:ct9k70$cv5$1 digitaldaemon.com...
 In article <ct9hki$afi$1 digitaldaemon.com>, Matthew says...
Here's how I think it might work, roughly:

    If the process is written in D, it'll have a GC built-in 
 (statically linked). [I agree with Walter 100% that
processes should have a GC built in, and not have to rely on a DLL. 
Painful for small process distribution, along with
other DLL issues ...]
    Now any subsequent D dynamic link-unit, whether that be a DDL 
 (i.e. 'exposing' D classes in an analogous fashion to
Java's JARs), or a DLL (i.e. exposing a C API/interface, implemented 
in D), when loaded, does not create its own GC but
links to that of the process.
    Easy peasy, because its axiomatic that the life of any loaded 
 dynamic link-units cannot exceed the life of the host
process

    If the process is *not* written in D, here's where it gets 
 interesting. IMO, it should be possible to have the same
GC-locating mechanism attach a subsequent D link-unit (whether DDL or 
DLL). The problem is, how do we keep the first D
link-unit's code and data locked in memory after the application has 
unloaded it.
    One answer would be to just increment the first link-unit's ref 
 count (e.g. another call to dlopen()/LoadLibrary())
and then discard it. This would work fine, never crash, but would 
cause issues in long running servers that need to
unload modules to pick up newer versions.
    Another answer would be that each GC-dependent link-unit would add 
 such a lock onto the first-one-in, and release it
when they're released. There may be issues here, however, since one 
can get into trouble (un)loading libs during lib
(un)loading (albeit that I've never run into this outside C++.NET).
    In either case, a linked-to GC would have to expose a method for 
 either locking it, or passing its instance
name/handle to be locked

Naturally, we have the issue of how a to-be-created GC detects and 
connects to a prior instance, and how this can be
made thread-safe. That's a topic for discussion ...

The Dr .....
I didn't quite follow all of that, but here's something that Pragma suggested many moons ago: if external link-units were managed by the "one & only" GC, it could happily reap them when, and only when, there are no more live references. I know that's a somewhat trivial statement, but it has powerful mojo;
I want to be able to write C-API DLLs in D, which I think would be the fly in that mojo.
Jan 26 2005
prev sibling parent reply Kris <Kris_member pathlink.com> writes:
In article <ct9hki$afi$1 digitaldaemon.com>, Matthew says...
Here's how I think it might work, roughly:

    If the process is written in D, it'll have a GC built-in (statically
linked). [I agree with Walter 100% that 
processes should have a GC built in, and not have to rely on a DLL. Painful for
small process distribution, along with 
other DLL issues ...]
    Now any subsequent D dynamic link-unit, whether that be a DDL (i.e.
'exposing' D classes in an analogous fashion to 
Java's JARs), or a DLL (i.e. exposing a C API/interface, implemented in D),
when loaded, does not create its own GC but 
links to that of the process.
    Easy peasy, because its axiomatic that the life of any loaded dynamic
link-units cannot exceed the life of the host 
process

    If the process is *not* written in D, here's where it gets interesting.
IMO, it should be possible to have the same 
GC-locating mechanism attach a subsequent D link-unit (whether DDL or DLL). The
problem is, how do we keep the first D 
link-unit's code and data locked in memory after the application has unloaded
it.
    One answer would be to just increment the first link-unit's ref count (e.g.
another call to dlopen()/LoadLibrary()) 
and then discard it. This would work fine, never crash, but would cause issues
in long running servers that need to 
unload modules to pick up newer versions.
    Another answer would be that each GC-dependent link-unit would add such a
lock onto the first-one-in, and release it 
when they're released. There may be issues here, however, since one can get
into trouble (un)loading libs during lib 
(un)loading (albeit that I've never run into this outside C++.NET).
    In either case, a linked-to GC would have to expose a method for either
locking it, or passing its instance 
name/handle to be locked

Naturally, we have the issue of how a to-be-created GC detects and connects to
a prior instance, and how this can be 
made thread-safe. That's a topic for discussion ...

The Dr .....
I should point out all the tricky stuff you're talking about just goes away if the GC is simply a shared-lib (DLL or DDL). May I ask, Matthew: what is your discomfort with shared-libs? I ask because I just don't know why the Microsoft O/S-related version-issue can't be resolved (to an acceptable degree) for one specific instance (the GC). On a regular basis, I build commercial frameworks that dynamically load/unload mobile code; part of any true solution has to allow for unloading said code, and the mobile code itself should not be bloated out with multiple instances of the GC implementation. Nor, come to think of it, should it be carrying all the floating-point support that the damned Object.printf() brings in with it. Both go against Walter's "no code bloat!" mantra and, franky, are completely unecessary. Oh, and before anyone say's something about disk-space; the latter is purely about transmission bandwidth & latency. Still; I'll try not speculate upon the outcome. - Kris I said I'd shutup; I guess I lied :-(
Jan 26 2005
parent reply "Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:
"Kris" <Kris_member pathlink.com> wrote in message 
news:ct9ltc$es1$1 digitaldaemon.com...
 In article <ct9hki$afi$1 digitaldaemon.com>, Matthew says...
Here's how I think it might work, roughly:

    If the process is written in D, it'll have a GC built-in 
 (statically linked). [I agree with Walter 100% that
processes should have a GC built in, and not have to rely on a DLL. 
Painful for small process distribution, along with
other DLL issues ...]
    Now any subsequent D dynamic link-unit, whether that be a DDL 
 (i.e. 'exposing' D classes in an analogous fashion to
Java's JARs), or a DLL (i.e. exposing a C API/interface, implemented 
in D), when loaded, does not create its own GC but
links to that of the process.
    Easy peasy, because its axiomatic that the life of any loaded 
 dynamic link-units cannot exceed the life of the host
process

    If the process is *not* written in D, here's where it gets 
 interesting. IMO, it should be possible to have the same
GC-locating mechanism attach a subsequent D link-unit (whether DDL or 
DLL). The problem is, how do we keep the first D
link-unit's code and data locked in memory after the application has 
unloaded it.
    One answer would be to just increment the first link-unit's ref 
 count (e.g. another call to dlopen()/LoadLibrary())
and then discard it. This would work fine, never crash, but would 
cause issues in long running servers that need to
unload modules to pick up newer versions.
    Another answer would be that each GC-dependent link-unit would add 
 such a lock onto the first-one-in, and release it
when they're released. There may be issues here, however, since one 
can get into trouble (un)loading libs during lib
(un)loading (albeit that I've never run into this outside C++.NET).
    In either case, a linked-to GC would have to expose a method for 
 either locking it, or passing its instance
name/handle to be locked

Naturally, we have the issue of how a to-be-created GC detects and 
connects to a prior instance, and how this can be
made thread-safe. That's a topic for discussion ...

The Dr .....
I should point out all the tricky stuff you're talking about just goes away if the GC is simply a shared-lib (DLL or DDL).
True. But then one has the inordinate hassle of managing a Phobos.DLL for any little utility program. Note: I use the word 'inordinate' there in context. Having to care about Phobos.DLL for a utility would, for me, rule out D as a language for implementing all the small tools and utilities (as in http://synesis.com.au/systools.html) that I write. That's a 100% certainty. Hence, I agree with Walter, _in this regard_, that Phobos should not have to be dynamically linked.
 May I ask, Matthew: what is your
 discomfort with shared-libs? I ask because I just don't know why the 
 Microsoft
 O/S-related version-issue can't be resolved (to an acceptable degree) 
 for one
 specific instance (the GC).
I don't want to have to write installers for simple programs / plug-in DLLs (e.g. shell extensions; http://shellext.com/). Note: I *absolutely* want/must have Phobos in a DLL for serious / large-scale work. To not have it as such would as surely rule out D for my consideration for any large scale project. Naturally, these provide a contradiction, without a straightforward solution. Therefore, I believe that the only viable solution is that D does something smarter than C/C++, and support both. It requires a degree of sophistication in the runtime arbitration between multiple GC creation events, but I hardly think this is an insurmountable problem. I'd be very surprised if this is something that we cannot all work through, or that will confound big-W's programming skill. In any case, I don't see that there's a (commercially) viable alternative.
 On a regular basis, I build commercial frameworks that dynamically 
 load/unload
 mobile code; part of any true solution has to allow for unloading said 
 code, and
 the mobile code itself should not be bloated out with multiple 
 instances of the
 GC implementation.
Exactly.
 Nor, come to think of it, should it be carrying all the floating-point 
 support
 that the damned Object.printf() brings in with it.
This is another issue, but one on which I completely agree.
 Both go against Walter's "no
 code bloat!" mantra and, franky, are completely unecessary.
Agreed.
 Oh, and before
 anyone say's something about disk-space; the latter is purely about 
 transmission
 bandwidth & latency.
Not so. There is a more fundamental objection: coupling should be minimised in all circumstances. I might offer something like "anyone exposed to enough software engineering on a commercial scale will come to believe this" but that'd get me involved in YABA, so I'll just say "it has been my unwavering experience, in a multitude of languages, technologies, application domains, that increasing coupling is bad, and decreasing coupling is good". If someone were to look at D, then look at Phobos, then look at Object, then see the coupling between Object and printf() and the CRT, and follow their own experiential logical flow to come to the conclusion that "D is crap", I would wish to persuade them otherwise, but I could not fault their reasoning.
 Still; I'll try not speculate upon the outcome.
I would think that the recent, singular (?), success of our carping and whineing in getting a movement from big-W on the dynamic-linking issue offers all kinds of encouragment to those who are yet to hit their D sweet spot.
 - Kris

 I said I'd shutup; I guess I lied :-(
It's people like you, Kris, who have the knowledge, the general experience, and, perhaps most importantly, the significant real-world D experience, who provide the meat on the bones of the instinctual (some would say half-cocked) mutterings of the likes of me. Please don't shut up. The Dr .....
Jan 26 2005
parent reply Kris <Kris_member pathlink.com> writes:
In article <ct9mvn$g05$1 digitaldaemon.com>, Matthew says...
I don't want to have to write installers for simple programs / plug-in 
DLLs (e.g. shell extensions; http://shellext.com/).

Note: I *absolutely* want/must have Phobos in a DLL for serious / 
large-scale work. To not have it as such would as surely rule out D for 
my consideration for any large scale project.

Naturally, these provide a contradiction, without a straightforward 
solution. Therefore, I believe that the only viable solution is that D 
does something smarter than C/C++, and support both. It requires a 
degree of sophistication in the runtime arbitration between multiple GC 
creation events, but I hardly think this is an insurmountable problem. 
I'd be very surprised if this is something that we cannot all work 
through, or that will confound big-W's programming skill. In any case, I 
don't see that there's a (commercially) viable alternative.
So how about this: 1) the GC lives within a library (as it does now) 2) there is an optional DLL, compiled with the library GC, and there is a seperate library shim to bind said DLL instead 3) the developer makes a choice to either (a) link with the library GC directly or (b) link with the DLL GC shim 4) the choice is made by adding the DLL-shim library-name to the dmd command-line, causing the linker to select the DLL GC rather than the default statically-linked GC (the linker will not try to bind the static GC instance if those symbols have already been satisfied) Does that covers all bases? There's probably several other ways to acheive a similar effect. Note that the default, and simple, behaviour is to statically link the GC ... - Kris
Jan 26 2005
parent reply "Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:
"Kris" <Kris_member pathlink.com> wrote in message 
news:ct9s2l$lo5$1 digitaldaemon.com...
 In article <ct9mvn$g05$1 digitaldaemon.com>, Matthew says...
I don't want to have to write installers for simple programs / plug-in
DLLs (e.g. shell extensions; http://shellext.com/).

Note: I *absolutely* want/must have Phobos in a DLL for serious /
large-scale work. To not have it as such would as surely rule out D 
for
my consideration for any large scale project.

Naturally, these provide a contradiction, without a straightforward
solution. Therefore, I believe that the only viable solution is that D
does something smarter than C/C++, and support both. It requires a
degree of sophistication in the runtime arbitration between multiple 
GC
creation events, but I hardly think this is an insurmountable problem.
I'd be very surprised if this is something that we cannot all work
through, or that will confound big-W's programming skill. In any case, 
I
don't see that there's a (commercially) viable alternative.
So how about this: 1) the GC lives within a library (as it does now) 2) there is an optional DLL, compiled with the library GC, and there is a seperate library shim to bind said DLL instead 3) the developer makes a choice to either (a) link with the library GC directly or (b) link with the DLL GC shim 4) the choice is made by adding the DLL-shim library-name to the dmd command-line, causing the linker to select the DLL GC rather than the default statically-linked GC (the linker will not try to bind the static GC instance if those symbols have already been satisfied) Does that covers all bases? There's probably several other ways to acheive a similar effect. Note that the default, and simple, behaviour is to statically link the GC ...
What about an application, written in D and statically linked to a GC, that may or may not load a DDL to get some D classes, depending on its cmd-line params?
Jan 27 2005
parent reply Kris <Kris_member pathlink.com> writes:
In article <ctcbun$v1o$1 digitaldaemon.com>, Matthew says...
What about an application, written in D and statically linked to a GC, 
that may or may not load a DDL to get some D classes, depending on its 
cmd-line params? 
Right. Well, I had assumed that any developer going to the trouble of supporting dynamically loadable modules (providing a 'container') would have statically linked to the DLL version GC, since all those lovely little loadable modules will be using said DLL anyway. There are certain considerations that apply to containers, particularly those of a dynamic variety. As such, I don't think it's much of a stretch to note that such designs should use the DLL GC instead. In the end, that takes care of all those hideously complex issues you noted prior, in a robust manner, and it's simpler than /consistently/ following all those little details Walter added to the DMD doc :-) Having said that, Walter has at least provided the bare-bones. I'll utilize that to provide a means of hiding the grubby details, such that both dynamic & static linking of DLLs will be both thoroughly transparent and painless. IMO, this kind of thing should ideally be left to the O/S; not re-invented by the language runtime (someone else had noted this, also). Sometimes one has to sidestep the O/S, but in this case I don't feel the complexity tradeoffs are reasonable. That is; I believe containers will be simpler and probably more robust if they avoid trying to do some fancy internal sharing of multiple GC instances. Just going with a single, shared GC instance, managed by the O/S is the better option. That simplicity might hopefully lead to more people writing dynamically loadable code, such as D Servlets. It also makes it easier for others to write alternate GC implementations, without the added complexity of re-implementing and thoroughly testing all that GC-sharing 'stuff'! That's just my opinion, but it is the manner in which I will personally awaken the two containers currently slumbering within Mango; along with the mobile-code to go with them :-) Lastly, I should note that this is just for the dynamic 'containment' style of programming (the specific case we're talking about). Other types of programs would link the GC in whatever means was appropriate to them (where static linking of the static-library GC would be the default, static, behavior). Thoughts, Matthew? And how many times can one legitimally say 'static' in a single sentence? - Kris p.s. Pragma is building a container also, so I'd like to get his perspective on this too.
Jan 27 2005
next sibling parent reply "Walter" <newshound digitalmars.com> writes:
"Kris" <Kris_member pathlink.com> wrote in message
news:ctcgoa$13ro$1 digitaldaemon.com...
 Having said that, Walter has at least provided the bare-bones. I'll
utilize that
 to provide a means of hiding the grubby details, such that both dynamic &
static
 linking of DLLs will be both thoroughly transparent and painless.

 IMO, this kind of thing should ideally be left to the O/S; not re-invented
by
 the language runtime (someone else had noted this, also). Sometimes one
has to
 sidestep the O/S, but in this case I don't feel the complexity tradeoffs
are
 reasonable.
Most of the time, all you need to do is cut & paste from the examples given. One reason the details are shown is because D is a systems programming language, and knowing the how & the why of the details means one is much more likely to use it successfully. It also enables one to modify it for special purposes. I also agree that the OS should provide gc services. But I am not in a position to design an OS <g>, so we must work with what we have.
Jan 28 2005
next sibling parent reply Kris <Kris_member pathlink.com> writes:
In article <ctd1v8$2239$1 digitaldaemon.com>, Walter says...
"Kris" <Kris_member pathlink.com> wrote in message
news:ctcgoa$13ro$1 digitaldaemon.com...
 Having said that, Walter has at least provided the bare-bones. I'll
utilize that
 to provide a means of hiding the grubby details, such that both dynamic &
static
 linking of DLLs will be both thoroughly transparent and painless.

 IMO, this kind of thing should ideally be left to the O/S; not re-invented
by
 the language runtime (someone else had noted this, also). Sometimes one
has to
 sidestep the O/S, but in this case I don't feel the complexity tradeoffs
are
 reasonable.
Most of the time, all you need to do is cut & paste from the examples given. One reason the details are shown is because D is a systems programming language, and knowing the how & the why of the details means one is much more likely to use it successfully. It also enables one to modify it for special purposes.
It's /great/ that you documented all the details!
I also agree that the OS should provide gc services. But I am not in a
position to design an OS <g>, so we mst work with what we have.
We're misunderstanding each other, Walter. But there's nothing unusual about that :-) Thanks for addressing the issue. Everyone has their own idea of how to skin the proverbial cat, but the end result is typically the same: one dead cat. Can you perhaps enlighten us on how to contruct robust "soft-references"? It appears that the GC disables all threads whilst reaping allocations, which could then lead to deadlock between the GC and a soft-reference manager. Are all threads (except the GC) halted when a destructor is invoked?
Jan 28 2005
parent "Walter" <newshound digitalmars.com> writes:
"Kris" <Kris_member pathlink.com> wrote in message
news:cte22g$97t$1 digitaldaemon.com...
 Can you perhaps enlighten us on how to contruct robust "soft-references"?
The way to do it is construct a pool of those soft references, so the gc won't reap them.
 It
 appears that the GC disables all threads whilst reaping allocations, which
could
 then lead to deadlock between the GC and a soft-reference manager.

 Are all threads (except the GC) halted when a destructor is invoked?
All the threads that the gc knows about (via std.thread). If you create a thread directly, not using std.thread, the gc won't stop it, scan it, or know anything about it.
Jan 28 2005
prev sibling parent "Dave" <Dave_member pathlink.com> writes:
"Walter" <newshound digitalmars.com> wrote in message 
news:ctd1v8$2239$1 digitaldaemon.com...
 "Kris" <Kris_member pathlink.com> wrote in message
 news:ctcgoa$13ro$1 digitaldaemon.com...
 Having said that, Walter has at least provided the bare-bones. I'll
utilize that
 to provide a means of hiding the grubby details, such that both dynamic &
static
 linking of DLLs will be both thoroughly transparent and painless.
<snip>
 Most of the time, all you need to do is cut & paste from the examples 
 given.
 One reason the details are shown is because D is a systems programming
 language, and knowing the how & the why of the details means one is much
 more likely to use it successfully. It also enables one to modify it for
 special purposes.
<snip>

Walter - thank you for the DLL/GC addition!!

I gotta add my $0.02 on this though..

If the code inside DllMain, MyDLL_Initialize and MyDLL_Terminate can be 
handled by some sort of boiler-plate wrapper for 8 of 10 uses, I think it 
would be a /very/ good thing to provide it (while still allowing the 
developer to use the detailed version).

This would be especially true if it would make shared library development 
more portable between Win and the 'nix's for the majority of cases where the 
code in MyDLL_Initialize and MyDLL_Terminate can be handled by an import and 
a few wrapper functions like:

import std.gc;
import std.slinit;        // extern(C) { _minit(), etc... }

version (Windows) {

HINSTANCE g_hInst;

extern (Windows)
    BOOL DllMain(HINSTANCE hInstance, ULONG ulReason, LPVOID pvReserved)
{
    return SL_DllMain(hInstance,ulReason,g_hInst);
}

} // version (Windows)

export void MySharedLib_Initialize(void* gc)
{
    SL_Init(gc);
}

export void MySharedLib_Terminate()
{
    SL_Term();
}

I think it worth the effort just to minimize the code overhead (and learning 
curve and clutter) needed for most shared libs. But if it also turns out 
that the standard copy and paste code (of your example) needs to be 
different between Win and Linux, wrapper functions will make things that 
much more elegant for portable library development, IMHO.

To me, this would coincide with the D philosophy of hiding the messy details 
for the general case while still providing for their use if needed.

- Dave
Jan 28 2005
prev sibling parent reply pragma <pragma_member pathlink.com> writes:
In article <ctcgoa$13ro$1 digitaldaemon.com>, Kris says...
p.s. Pragma is building a container also, so I'd like to get his perspective on
this too.
Hey, sure thing. Not to dilute Kris' argument here, but I think that Walter has given us what was needed for GC management between dll's and processes. I haven't thought around all the corners of the problem space yet, but it looks more and more to me that using a separate dll for the GC may acutally further complicate things. At first, I didn't think this was so. But the updated model now creates a 1-to-1 mapping between GC's and processes, irrespective of how many dll's are in use. To me, that seems a damn fine solution, if not a step in the right direction. That aside, the bigger issue is class management across dll boundaries. Most applications do not need to worry about the validity of v-tables and delegates, since the dll is usually freed at program termination (this goes especially for static linking). It is a problem that is not covered by the GC at all, so it requires additional management; hence Kris' notion of "Containers". For those not familiar with the problem, here's what can easily happen. Say I have an export from a dll that returns an object of class "Foobar". I then free the dll since its no longer needed. Finally, I attempt to print the contents of Foobar.
 // given: mylibrary represents a dll
 void makeSegFault(){
    Foobar foo = mylibrary.getNewFoobar();
    mylibrary.unload();
    writeln(foo.toString());
 }
This will segfault since the vtable for 'foo' was a part of the dll. Thankfully, the recent GC enhancements allow us to at least keep foo's memory footprint intact, but the methods are history. Also, reloading the dll cannot be reliably used to 'magically' restore that vtable. This pattern is easier to create than one would think, especially when one is cramming data into generic AA's and references become widely dispersed inside a large system. For DSP, the solution I'm going to use involves a combination of object-proxies and reference counting of said proxies per dll. A dll reload will not break code, since the proxies can be prodded to re-constitute their dll-bound counterparts. This way, the proxies can be freely refrenced throughout the application, save the dll they're interfacing with (feedback would be *bad*). The only other airtight solution I can think of, would be to apply the GC pattern to dll's. This means that a dll is not unloaded until the heap is free from all refrences into a dll's address space (lazy unloading via garbage collection). Adding a given dll's address space as a root to the GC should cover this. The only drawback here is that its effectively the same as the present situation given that you cannot force a dll unload without potentially breaking something; the real advantage of dll's is to load and unload at will. Aside: does anyone know what happens if you touch a used .class file while a Java app is running? Can Java's ClassLoader be told to unload or reload a class file that's in use? I'm curious since I'd like to know how other platforms have handled this space. - EricAnderton at yahoo
Jan 28 2005
parent reply Kris <Kris_member pathlink.com> writes:
In article <ctdqsl$30sc$1 digitaldaemon.com>, pragma says...
In article <ctcgoa$13ro$1 digitaldaemon.com>, Kris says...
p.s. Pragma is building a container also, so I'd like to get his perspective on
this too.
Hey, sure thing. Not to dilute Kris' argument here, but I think that Walter has given us what was needed for GC management between dll's and processes. I haven't thought around all the corners of the problem space yet, but it looks more and more to me that using a separate dll for the GC may acutally further complicate things. At first, I didn't think this was so. But the updated model now creates a 1-to-1 mapping between GC's and processes, irrespective of how many dll's are in use. To me, that seems a damn fine solution, if not a step in the right direction. That aside, the bigger issue is class management across dll boundaries. Most applications do not need to worry about the validity of v-tables and delegates, since the dll is usually freed at program termination (this goes especially for static linking). It is a problem that is not covered by the GC at all, so it requires additional management; hence Kris' notion of "Containers". For those not familiar with the problem, here's what can easily happen. Say I have an export from a dll that returns an object of class "Foobar". I then free the dll since its no longer needed. Finally, I attempt to print the contents of Foobar.
 // given: mylibrary represents a dll
 void makeSegFault(){
    Foobar foo = mylibrary.getNewFoobar();
    mylibrary.unload();
    writeln(foo.toString());
 }
This will segfault since the vtable for 'foo' was a part of the dll. Thankfully, the recent GC enhancements allow us to at least keep foo's memory footprint intact, but the methods are history. Also, reloading the dll cannot be reliably used to 'magically' restore that vtable. This pattern is easier to create than one would think, especially when one is cramming data into generic AA's and references become widely dispersed inside a large system. For DSP, the solution I'm going to use involves a combination of object-proxies and reference counting of said proxies per dll. A dll reload will not break code, since the proxies can be prodded to re-constitute their dll-bound counterparts. This way, the proxies can be freely refrenced throughout the application, save the dll they're interfacing with (feedback would be *bad*). The only other airtight solution I can think of, would be to apply the GC pattern to dll's. This means that a dll is not unloaded until the heap is free from all refrences into a dll's address space (lazy unloading via garbage collection). Adding a given dll's address space as a root to the GC should cover this. The only drawback here is that its effectively the same as the present situation given that you cannot force a dll unload without potentially breaking something; the real advantage of dll's is to load and unload at will. Aside: does anyone know what happens if you touch a used .class file while a Java app is running? Can Java's ClassLoader be told to unload or reload a class file that's in use? I'm curious since I'd like to know how other platforms have handled this space. - EricAnderton at yahoo
Good points, Pragma. Another thing to consider, regarding the explicit unloading of DLLs, is the 'version' issue. If one replaces an existing instance of some dynamically-loaded module with another, newer version, then the contract between the container and any existing (remote) clients has effectively been broken. I note this because each newer version should be loaded as such; as a distinct and seperate instance in addition to any prior version instances. Doing so leads to long-term stability. The upshot is that such a container would not have a regular need to /explicitly/ drop any particular (and previously loaded) module. Therefore, your approach of using the GC to manage module 'liveness' is rather suitable. Placing the GC within a DLL does not complicate this, as far as I can tell. There's at least one tricky part there: how to know whether or not each dynamically loaded-module is still actually loaded. I think soft-references would alleviate that problem, and there are some ways to do that in D, although there's a subtle danger of deadlock since it appears that the GC halts all other threads when it reaps the heap :-( Perhaps Walter could enlighten us on how to construct robust soft-references? Thinking about this brings up another issue to consider; starting a thread from within a DLL will potentially cause the GC, and the process, to fail. Something to be careful of.
Jan 28 2005
parent reply pragma <pragma_member pathlink.com> writes:
In article <cte18s$82o$1 digitaldaemon.com>, Kris says...

Good points, Pragma. Another thing to consider, regarding the explicit unloading
of DLLs, is the 'version' issue. If one replaces an existing instance of some
dynamically-loaded module with another, newer version, then the contract between
the container and any existing (remote) clients has effectively been broken.
Yep. This is why I've advocated that we all get into the habit of naming our dlls with the version number as a part of the name. It solves the majority of these problems. The other techniques I've proposed in the past, may very well be suitable in an application-to-application manner. Overall, this is an area where sufficent (and justified) pushback from Walter would have us forge an open standard for this kind of thing.
The upshot is that such a container would not have a regular need to
/explicitly/ drop any particular (and previously loaded) module. Therefore, your
approach of using the GC to manage module 'liveness' is rather suitable. 
I see where you're going with this. Assuming that the only reason for a reload is to grab a newer version, you don't need to unload the old one at all.
Placing the GC within a DLL does not complicate this, as far as I can tell.
I'm confused. Did you mean "manage the dll with the GC" instead?
There's at least one tricky part there: how to know whether or not each
dynamically loaded-module is still actually loaded. I think soft-references
would alleviate that problem, and there are some ways to do that in D, although
there's a subtle danger of deadlock since it appears that the GC halts all other
threads when it reaps the heap :-(

Perhaps Walter could enlighten us on how to construct robust soft-references?
You're talking about having a soft (weak?) reference to the library in question, correct? Constructing weaak-refrences in D should be as easy as writing a wrapper class that tells the GC to ignore the weak-pointer's address when checking for roots. Now, checking their validitiy is tough to solve, since the GC doesn't expose any way to check if a pointer is under it's control (sure you could use win32, but it's not portable) And as for deadlock: what if the call to unload a library is called on the GC's thread via a destructor? Would that fix the problem? I suppose if the dll held some kind of mutex inside of dllmain, that it would cause trouble. But this may come back to "Best Practices" for managing such a mechanism.
Thinking about this brings up another issue to consider; starting a thread from
within a DLL will potentially cause the GC, and the process, to fail. Something
to be careful of.
I'll have to take your word for this. Perhaps you can furnish me with a more concrete example? Unless you're inside of dllMain, there shouldn't be any side effects that I'm aware of. Also, the MSDN library has a slew of articles of what to do and not to do inside of dllMain. The gist of it all is that you should do the absolute minimum needed inside that routine as to avoid problems just within win32 itself. - EricAnderton at yahoo
Jan 28 2005
parent Kris <Kris_member pathlink.com> writes:
In article <cte4c0$c8n$1 digitaldaemon.com>, pragma says...
In article <cte18s$82o$1 digitaldaemon.com>, Kris says...
Placing the GC within a DLL does not complicate this, as far as I can tell.
I'm confused. Did you mean "manage the dll with the GC" instead?
Ahh; I was just referring to the earlier assertion that placing the GC itself within a seperate DLL might actually increase complexity. I don't think it does, but I could be wrong.
There's at least one tricky part there: how to know whether or not each
dynamically loaded-module is still actually loaded. I think soft-references
would alleviate that problem, and there are some ways to do that in D, although
there's a subtle danger of deadlock since it appears that the GC halts all other
threads when it reaps the heap :-(
And as for deadlock: what if the call to unload a library is called on the GC's
thread via a destructor?  Would that fix the problem?  I suppose if the dll held
some kind of mutex inside of dllmain, that it would cause trouble.  But this may
come back to "Best Practices" for managing such a mechanism.
Deadlock could occur if (a) the destructor is used to unload the module, (b) all threads are halted whilst the GC runs (and hence during the destructor call), and (c) the mutex protecting the "module is currently loaded" flag is held by one of the stalled threads; one which was 'concurrently' asking for a handle to that specific module. The GC thread would stall on that same mutex. One way around this would be to utilize a mutex-free queue, to stack up destructor requests for unloading reaped module instances -- thereby decoupling the GC from aforementioned mutex.
Thinking about this brings up another issue to consider; starting a thread from
within a DLL will potentially cause the GC, and the process, to fail. Something
to be careful of.
I'll have to take your word for this. Perhaps you can furnish me with a more concrete example?
If one assumes that the GC has a valid reason for halting all threads during a sweep, then any thread it does not know about is a potential threat to stability. Since Phobos (and thus std.Thread) is still linked statically, all DLLs will have their own std.Thread instance, yet will be sharing a single GC. The single GC only knows about one instance of std.Thread, and subsequently can only halt those threads created via that particular instance. Any thread created via a DLL will be noted only within that DLL std.Thread pool, and thus will not be stalled during a GC sweep. Therein lies trouble :-) Full resolution is conceptually trivial, but apparently controversial.
Jan 28 2005
prev sibling parent Benji Smith <dlanguage xxagg.com> writes:
On Wed, 26 Jan 2005 14:19:04 -0800, "Walter"
<newshound digitalmars.com> wrote:

I agree. I'm working on it.
Fantastic. Thanks, Walter. I really appreciate how receptive you are to the whims (er...I mean, the intelligent and informed opinions) of the people in this ng. --Benji
Jan 27 2005