www.digitalmars.com         C & C++   DMDScript  

D - What's the proper way to discard a doubly linked list?

reply Georg Wrede <Georg_member pathlink.com> writes:
I was teaching D to this guy, and by the time we were at
doubly linked lists, the issue of discarding the list
arose. I told him that in the spirit of D, all you'd have
to do is to null all references to the list (list, current,
etc. pointers), or to just have them go out of scope.

But then I got unsure whether this works today in D.

So, does one really have to walk the list and null all the
references between the items, or does the GC understand
about "linked islands"?

I want 2 answers: what's it today, and what's it gonna be
the day everything is Right?
Jan 17 2004
parent reply Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
Walter's garbage collector is a mark-and-sweep collector.  Basically, 
what this means is that it starts by scanning the range of data that it 
knows is not garbage (global variables and the stack(s)).  It finds 
there all the values that might conceivably be pointers and marks the 
things they point to as "not garbage."  It continues on, scanning things 
known to be "not garbage," until it doesn't find anything new.  At that 
point, all memory that has not been scanned is assumed to be garbage.

What that means is that it will realize that your doubly-linked list is 
an island and will garbage collect it.  Since nobody ever points in to 
any of the objects in the list, they will never be marked "not garbage" 
and will get collected.

One of the downsides of this type of collector, however, is that it can 
get false positives; if one object in your doubly-linked list is at 
address 0xdeadbeef, and you happen to have an integer somewhere in your 
program that has the value 0xdeadbeef, then the garbage collector will 
assume that it is a pointer.  (The garbage collector doesn't know about 
  the types of the variables it is scanning.)  In that case, the 
doubly-linked list will not get garbage collected immediately; however, 
whenever your integer value changes, then the next gc pass will clean it up.

Georg Wrede wrote:
 I was teaching D to this guy, and by the time we were at
 doubly linked lists, the issue of discarding the list
 arose. I told him that in the spirit of D, all you'd have
 to do is to null all references to the list (list, current,
 etc. pointers), or to just have them go out of scope.
 
 But then I got unsure whether this works today in D.
 
 So, does one really have to walk the list and null all the
 references between the items, or does the GC understand
 about "linked islands"?
 
 I want 2 answers: what's it today, and what's it gonna be
 the day everything is Right?
 
 
 
Jan 17 2004
next sibling parent Georg Wrede <Georg_member pathlink.com> writes:
In article <bubrab$2kqb$1 digitaldaemon.com>, Russ Lewis says...
What that means is that it will realize that your doubly-linked list is 
an island and will garbage collect it.  Since nobody ever points in to 
any of the objects in the list, they will never be marked "not garbage" 
and will get collected.
Whew, I won't get caught for lying!
One of the downsides of this type of collector, however, is that it can 
get false positives; if one object in your doubly-linked list is at 
address 0xdeadbeef, and you happen to have an integer somewhere in your 
program that has the value 0xdeadbeef, then the garbage collector will 
assume that it is a pointer.  (The garbage collector doesn't know about 
  the types of the variables it is scanning.)  In that case, the 
doubly-linked list will not get garbage collected immediately; however, 
whenever your integer value changes, then the next gc pass will clean it up.
Wow! This means it doesn't care about the structure of things, it just assumes everything is 32-bit, and then looks for matching bit patterns? From this follows that if I have a tightly packed struct that contains bytes at the beginning, then this 0xdeadbeef might just appear as the pattern composed of several unrelated bytes in the structure? Not that I think this is a practical problem. As you say, chances are the bit pattern will probably change before long.
Jan 17 2004
prev sibling next sibling parent reply "C" <dont respond.com> writes:
How often and when is the GC ran ?( good description btw ).

C
"Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message
news:bubrab$2kqb$1 digitaldaemon.com...
 Walter's garbage collector is a mark-and-sweep collector.  Basically,
 what this means is that it starts by scanning the range of data that it
 knows is not garbage (global variables and the stack(s)).  It finds
 there all the values that might conceivably be pointers and marks the
 things they point to as "not garbage."  It continues on, scanning things
 known to be "not garbage," until it doesn't find anything new.  At that
 point, all memory that has not been scanned is assumed to be garbage.

 What that means is that it will realize that your doubly-linked list is
 an island and will garbage collect it.  Since nobody ever points in to
 any of the objects in the list, they will never be marked "not garbage"
 and will get collected.

 One of the downsides of this type of collector, however, is that it can
 get false positives; if one object in your doubly-linked list is at
 address 0xdeadbeef, and you happen to have an integer somewhere in your
 program that has the value 0xdeadbeef, then the garbage collector will
 assume that it is a pointer.  (The garbage collector doesn't know about
   the types of the variables it is scanning.)  In that case, the
 doubly-linked list will not get garbage collected immediately; however,
 whenever your integer value changes, then the next gc pass will clean it
up.
 Georg Wrede wrote:
 I was teaching D to this guy, and by the time we were at
 doubly linked lists, the issue of discarding the list
 arose. I told him that in the spirit of D, all you'd have
 to do is to null all references to the list (list, current,
 etc. pointers), or to just have them go out of scope.

 But then I got unsure whether this works today in D.

 So, does one really have to walk the list and null all the
 references between the items, or does the GC understand
 about "linked islands"?

 I want 2 answers: what's it today, and what's it gonna be
 the day everything is Right?
Jan 17 2004
next sibling parent "Matthew" <matthew.hat stlsoft.dot.org> writes:
 How often and when is the GC ran ?( good description btw ).
When memory is low. I recently put out a suggestion that DMD creates and activates a low-priority thread if any additional (over and above the main) thread is created, i.e. the program is multi-threaded. Being low priority (e.g. THREAD_PRIORITY_IDLE on Win32) it would not inconvenience any other thread/process, but would cause memory to be cleaned when the machine's idle and, more importantly, will cause the destructors of objects to be called, potentially freeing resources far more scarce than memory. However, no-one seemed particularly keen, so it's not going anywhere for the moment. I think this'll only float when we've got meaty apps out there, and we can performance test the two variants.
 C
 "Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message
 news:bubrab$2kqb$1 digitaldaemon.com...
 Walter's garbage collector is a mark-and-sweep collector.  Basically,
 what this means is that it starts by scanning the range of data that it
 knows is not garbage (global variables and the stack(s)).  It finds
 there all the values that might conceivably be pointers and marks the
 things they point to as "not garbage."  It continues on, scanning things
 known to be "not garbage," until it doesn't find anything new.  At that
 point, all memory that has not been scanned is assumed to be garbage.

 What that means is that it will realize that your doubly-linked list is
 an island and will garbage collect it.  Since nobody ever points in to
 any of the objects in the list, they will never be marked "not garbage"
 and will get collected.

 One of the downsides of this type of collector, however, is that it can
 get false positives; if one object in your doubly-linked list is at
 address 0xdeadbeef, and you happen to have an integer somewhere in your
 program that has the value 0xdeadbeef, then the garbage collector will
 assume that it is a pointer.  (The garbage collector doesn't know about
   the types of the variables it is scanning.)  In that case, the
 doubly-linked list will not get garbage collected immediately; however,
 whenever your integer value changes, then the next gc pass will clean it
up.
 Georg Wrede wrote:
 I was teaching D to this guy, and by the time we were at
 doubly linked lists, the issue of discarding the list
 arose. I told him that in the spirit of D, all you'd have
 to do is to null all references to the list (list, current,
 etc. pointers), or to just have them go out of scope.

 But then I got unsure whether this works today in D.

 So, does one really have to walk the list and null all the
 references between the items, or does the GC understand
 about "linked islands"?

 I want 2 answers: what's it today, and what's it gonna be
 the day everything is Right?
Jan 17 2004
prev sibling parent Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
When you try to allocate memory but the allocator doesn't have any free 
memory regions, then it runs the GC *before* it asks the OS for more memory.

I don't know of any other situations when it runs...but there might be 
some, or there might be some new ones in the future.

Russ

C wrote:
 How often and when is the GC ran ?( good description btw ).
 
 C
 "Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message
 news:bubrab$2kqb$1 digitaldaemon.com...
 
Walter's garbage collector is a mark-and-sweep collector.  Basically,
what this means is that it starts by scanning the range of data that it
knows is not garbage (global variables and the stack(s)).  It finds
there all the values that might conceivably be pointers and marks the
things they point to as "not garbage."  It continues on, scanning things
known to be "not garbage," until it doesn't find anything new.  At that
point, all memory that has not been scanned is assumed to be garbage.

What that means is that it will realize that your doubly-linked list is
an island and will garbage collect it.  Since nobody ever points in to
any of the objects in the list, they will never be marked "not garbage"
and will get collected.

One of the downsides of this type of collector, however, is that it can
get false positives; if one object in your doubly-linked list is at
address 0xdeadbeef, and you happen to have an integer somewhere in your
program that has the value 0xdeadbeef, then the garbage collector will
assume that it is a pointer.  (The garbage collector doesn't know about
  the types of the variables it is scanning.)  In that case, the
doubly-linked list will not get garbage collected immediately; however,
whenever your integer value changes, then the next gc pass will clean it
up.
Georg Wrede wrote:

I was teaching D to this guy, and by the time we were at
doubly linked lists, the issue of discarding the list
arose. I told him that in the spirit of D, all you'd have
to do is to null all references to the list (list, current,
etc. pointers), or to just have them go out of scope.

But then I got unsure whether this works today in D.

So, does one really have to walk the list and null all the
references between the items, or does the GC understand
about "linked islands"?

I want 2 answers: what's it today, and what's it gonna be
the day everything is Right?
Jan 20 2004
prev sibling next sibling parent reply Georg Wrede <Georg_member pathlink.com> writes:
In article <bubrab$2kqb$1 digitaldaemon.com>, Russ Lewis says...
Walter's garbage collector is a mark-and-sweep collector.  Basically, 
..
One of the downsides of this type of collector, however, is that it can 
get false positives; if one object in your doubly-linked list is at 
address 0xdeadbeef, and you happen to have an integer somewhere in your 
program that has the value 0xdeadbeef, then the garbage collector will 
assume that it is a pointer.
DMD 0.76 docs say: "Do not store pointers into int variables using casts and other tricks. The garbage collector does not scan non-pointer types for roots." This would contradict the 0xdeadbeef example problem. That is, according to the documentation such a problem does not exist!?
Jan 18 2004
next sibling parent reply Patrick Down <pat codemoon.com> writes:
Georg Wrede <Georg_member pathlink.com> wrote in
news:budo3m$2krn$1 digitaldaemon.com: 

 In article <bubrab$2kqb$1 digitaldaemon.com>, Russ Lewis says...
Walter's garbage collector is a mark-and-sweep collector.  Basically, 
..
One of the downsides of this type of collector, however, is that it
can get false positives; if one object in your doubly-linked list is
at address 0xdeadbeef, and you happen to have an integer somewhere in
your program that has the value 0xdeadbeef, then the garbage collector
will assume that it is a pointer.
DMD 0.76 docs say: "Do not store pointers into int variables using casts and other tricks. The garbage collector does not scan non-pointer types for roots." This would contradict the 0xdeadbeef example problem. That is, according to the documentation such a problem does not exist!?
Not necessarily. It is specified this way to allow for smarter D complilers. A smart compiler might generate information about root pointers that the GC could use to make a smart scan of the global space and the stack. If compiler does not generate this type of information the GC can just scan the entire global space and stack. I don't think Walter's DMD generates this type of information yet but in the future it could. Other D compilers could too. So it's a smart idea not to store pointers in non pointer types.
Jan 18 2004
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
While it was 18/1/04 4:13 pm throughout the UK, Patrick Down sprinkled 
little black dots on a white screen, and they fell thus:

<snip>
"Do not store pointers into int variables using casts and other
tricks. The garbage collector does not scan non-pointer types for
roots." 
<snip>
 Not necessarily.  It is specified this way to
 allow for smarter D complilers.  A smart compiler
 might generate information about root pointers
 that the GC could use to make a smart scan of the
 global space and the stack.  If compiler does
 not generate this type of information the GC can
 just scan the entire global space and stack.
 
 I don't think Walter's DMD generates this type of 
 information yet but in the future it could. Other 
 D compilers could too.  So it's a smart idea not 
 to store pointers in non pointer types.
Makes sense, though I'm still a bit puzzled, as it also says: What would be the consequence of this? Speaking of which, are Windows API handles really pointers, as the type definitions imply? non-pointer types, use a union to do it so the garbage collector knows about it." How would the GC know about it, unless the union keeps a record of which member is 'active'? Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Jan 23 2004
next sibling parent Luke D <Luke_member pathlink.com> writes:
In article <burget$j36$1 digitaldaemon.com>, Stewart Gordon says...
While it was 18/1/04 4:13 pm throughout the UK, Patrick Down sprinkled 
little black dots on a white screen, and they fell thus:

<snip>
"Do not store pointers into int variables using casts and other
tricks. The garbage collector does not scan non-pointer types for
roots." 
<snip>
 Not necessarily.  It is specified this way to
 allow for smarter D complilers.  A smart compiler
 might generate information about root pointers
 that the GC could use to make a smart scan of the
 global space and the stack.  If compiler does
 not generate this type of information the GC can
 just scan the entire global space and stack.
 
 I don't think Walter's DMD generates this type of 
 information yet but in the future it could. Other 
 D compilers could too.  So it's a smart idea not 
 to store pointers in non pointer types.
Makes sense, though I'm still a bit puzzled, as it also says: What would be the consequence of this? Speaking of which, are Windows API handles really pointers, as the type definitions imply? non-pointer types, use a union to do it so the garbage collector knows about it." How would the GC know about it, unless the union keeps a record of which member is 'active'?
I'm not sure, I don't know much about the inner workings of D, but it would probably scan it even if the non-pointer type is being used. After all, the real danger isn't that the GC will scan non-pointers, it's that it'll miss scanning pointers and collect memory that's still being used.
Jan 23 2004
prev sibling parent reply Ilya Minkov <minkov cs.tum.edu> writes:
Stewart Gordon wrote:
 Makes sense, though I'm still a bit puzzled, as it also says:


 
 What would be the consequence of this? 
Currently, i guess, none. If complex pointer checks get optimised out someday, you could get a segfault in course of collection or even just a diagnostic exception. Besides, it's just a very dangerous strategy in application programming.
 Speaking of which, are Windows 
 API handles really pointers, as the type definitions imply?
In an implementation i know all HANDLEs are defined as ints in D. Doesn't make sense to call them pointers, since (1) they may accept constant values and (2) you cannot dereference them yourself, as even if they are pointers they only are valid within the Windows API implementation and cannot be dereferenced in application software. Besides, noone really knows whether they are pointers, or even just some kinds of unique IDs.

 non-pointer types, use a union to do it so the garbage collector knows 
 about it."
 
 How would the GC know about it, unless the union keeps a record of which 
 member is 'active'?
It wouldn't know, it would just scan anyway. The conservative GC has to be safe and thus err on the side of assuming anything which is not known is a pointer. In particular, no information exists over stack and over freestanding structs. Only objects and parts thereof get type information. However, this could possibly be extended to free-floating structs someday... -eye
Jan 23 2004
parent reply Georg Wrede <Georg_member pathlink.com> writes:
In article <burvq5$1eq0$1 digitaldaemon.com>, Ilya Minkov says...
Stewart Gordon wrote:
 Makes sense, though I'm still a bit puzzled, as it also says:


 
 What would be the consequence of this? 
Actually, the documentation should be changed here! It should say "do not ever store anything in a pointer". You should only assign it values of other pointers (as in foo = bar), or the null value. Since many D users are former C(++) users, it is customary to do all kinds of "wizardry" with pointers. For example, if one wants to make his own upCaseString(), then those people would probably use a pointer, and increment it as an int, to make it point at the next character. The spirit of D is to allow such (stupiditiy), but at yor own risk. (C users have a hard time remembering to leave pointers alone because in C you're the only one ever to use these values, and you "own" the entire program. In D there are always at least two users for these values: you and the GC!) Now, suppose one forgets to use a copy of the original pointer. Later, when GC occurs, then the GC would find no pointer pointing at the string (or actually at the beginning of the area allocated for the string data). The GC would mark this area as unused. Which obviously results in the area being overwritten with other data, sooner or later.

 non-pointer types, use a union to do it so the garbage collector knows 
 about it."
 
 How would the GC know about it, unless the union keeps a record of which 
 member is 'active'?
It doesn't. The above quote means that if you _really need_ to have an int in which you have the pointer value (say, you want to examine its numeral value, do some language research, study memory management, whatever -- or you're developing a data moving GC and you want to study how it changes the location of things), then don't just have it in an int. If you did, then the GC would not see it as a pointer to data, and it would leave it alone. So, to _both_ have the GC know that it is a pointer to data _and_ for you to be able to easily study the values in the pointer, then make a union. The union is nothing that happens in reality, it just tells the compiler that you want to have two different names (and types) for a particular item: to you it's both an int and a pointer.
Jan 24 2004
next sibling parent reply Ilya Minkov <minkov cs.tum.edu> writes:
Georg Wrede wrote:
 Actually, the documentation should be changed here! It should
 say "do not ever store anything in a pointer". You should only
 assign it values of other pointers (as in foo = bar), or the
 null value.
We have a number of allowed operations which result in new pointer adresses. Such as pointer increments/decrements or pointer adressing. Walking outside allocated memory regions automatically produces invalid values. Such cannot be used further.
 Since many D users are former C(++) users, it is customary to
 do all kinds of "wizardry" with pointers. For example, if one
 wants to make his own upCaseString(), then those people would
 probably use a pointer, and increment it as an int, to make 
 it point at the next character. The spirit of D is to allow
 such (stupiditiy), but at yor own risk. (C users have a hard
 time remembering to leave pointers alone because in C you're
 the only one ever to use these values, and you "own" the
 entire program. In D there are always at least two users for
 these values: you and the GC!)
I'm sorry, i have to disagree. Though the practice may be dangerous a bit too often, it is legal. It is not "incrementing as an int", increment of an int increments it by 1. As opposed to that, increment of a pointer is a completely different thing and increments it by the size of the stored object. Quasi advances by one in an array. This practice by itself would be safe, if the array could complain if run over, and would give a standard way not to run past its end. Unfortunately, in C it wasn't until the C++ and the STL. The very same idea of walking pointers was adopted by STL - and i find it good so - but with providing means to not run past the end due to end marker notion. And by generalizing the pointer to be not just memory pointer, but any kind of cursor at all.
 Now, suppose one forgets to use a copy of the original pointer.
 Later, when GC occurs, then the GC would find no pointer pointing
 at the string (or actually at the beginning of the area allocated 
 for the string data). The GC would mark this area as unused. Which
 obviously results in the area being overwritten with other data,
 sooner or later.
It's no problem. GC doesn't need a pointer into the beginning of an allocated region. Pointers into the middle of the region should suffice. Even the pointer to one next element after the end of the array should suffice, since it's called the end marker in C++ iterator lingo. And remember that the calling stack also has another pointer to the region, so it won't die. Only a region which is practically by no means acessible would. -eye
Jan 24 2004
parent Georg Wrede <Georg_member pathlink.com> writes:
In article <buu68m$1u0j$1 digitaldaemon.com>, Ilya Minkov says...
Georg Wrede wrote:
 Actually, the documentation should be changed here! It should
 say "do not ever store anything in a pointer". You should only
 assign it values of other pointers (as in foo = bar), or the
 null value.
We have a number of allowed operations which result in new pointer adresses. Such as pointer increments/decrements or pointer adressing.
What I meant was to not mess with the pointer by bypassing the language's methods. Of course all operations on the pointer that are done with the language, as opposed to directly manipulating the bit pattern, are ok.
Walking outside allocated memory regions automatically produces invalid 
values. Such cannot be used further.

 Since many D users are former C(++) users, it is customary to
 do all kinds of "wizardry" with pointers. For example, if one
I'm sorry, i have to disagree. Though the practice may be dangerous a bit too often, it is legal. It is not "incrementing as an int", increment of an int increments it by 1.
Incrementing a pointer (in C(++) and D) is ok, so we agree on that. The example described a situation where you'd increment it as an int as opposed to incrementing it the proper way. I should have come up with a better example situation.
 Now, suppose one forgets to use a copy of the original pointer.
 Later, when GC occurs, then the GC would find no pointer pointing
 at the string (or actually at the beginning of the area allocated 
 for the string data). The GC would mark this area as unused. Which
 obviously results in the area being overwritten with other data,
 sooner or later.
It's no problem. GC doesn't need a pointer into the beginning of an allocated region. Pointers into the middle of the region should suffice.
Yes, they should. This depends on the particular GC, though, so I wouldn't do this in production code, since the customer may use some other GC.
Even the pointer to one next element after the end of the array should 
suffice, since it's called the end marker in C++ iterator lingo.
I did my Master's thesis about the STL.
And remember that the calling stack also has another pointer to the 
region, so it won't die. 
Again, the example could have been of a more thought-out situation.
Only a region which is practically by no means acessible would.
This was the case here, and I took it for granted everyone knew.
Jan 25 2004
prev sibling parent "Walter" <walter digitalmars.com> writes:
"Georg Wrede" <Georg_member pathlink.com> wrote in message
news:buu08u$1kqg$1 digitaldaemon.com...
 Since many D users are former C(++) users, it is customary to
 do all kinds of "wizardry" with pointers. For example, if one
 wants to make his own upCaseString(), then those people would
 probably use a pointer, and increment it as an int, to make
 it point at the next character. The spirit of D is to allow
 such (stupiditiy), but at yor own risk. (C users have a hard
 time remembering to leave pointers alone because in C you're
 the only one ever to use these values, and you "own" the
 entire program. In D there are always at least two users for
 these values: you and the GC!)

 Now, suppose one forgets to use a copy of the original pointer.
 Later, when GC occurs, then the GC would find no pointer pointing
 at the string (or actually at the beginning of the area allocated
 for the string data). The GC would mark this area as unused. Which
 obviously results in the area being overwritten with other data,
 sooner or later.
This is an important issue. D's gc does not require a pointer to the beginning of the object, a pointer to any byte in the object is considered as being a valid reference to the entire object. So, you don't need to keep around a copy of the original pointer, and using pointers to walk a string array will work correctly.
Mar 12 2004
prev sibling parent "Walter" <walter digitalmars.com> writes:
"Georg Wrede" <Georg_member pathlink.com> wrote in message
news:budo3m$2krn$1 digitaldaemon.com...
 DMD 0.76 docs say:

 "Do not store pointers into int variables using casts and other tricks.
The
 garbage collector does not scan non-pointer types for roots."

 This would contradict the 0xdeadbeef example problem. That is,
 according to the documentation such a problem does not exist!?
What that means is the gc can use type information to reduce the number and effect of false positives. (The current gc is a bit primitive in that it does not yet do this, but the door needs to be left open to that improvement.) This cannot be perfect, though, because of unions and the stack.
Mar 12 2004
prev sibling parent "Walter" <walter digitalmars.com> writes:
"Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message
news:bubrab$2kqb$1 digitaldaemon.com...
 One of the downsides of this type of collector, however, is that it can
 get false positives; if one object in your doubly-linked list is at
 address 0xdeadbeef, and you happen to have an integer somewhere in your
 program that has the value 0xdeadbeef, then the garbage collector will
 assume that it is a pointer.  (The garbage collector doesn't know about
   the types of the variables it is scanning.)  In that case, the
 doubly-linked list will not get garbage collected immediately; however,
 whenever your integer value changes, then the next gc pass will clean it
up. This is definitely a potential problem. In practice, it doesn't happen much. But it *can* happen, and the best way to deal with it is to not have islands that are too large. What I tend to do with large data structure islands is that when I know I'm done with them, I'll 'break it up' by nulling internal references in it so that if a chunk does get 'pinned' by a random integer value, it won't in turn pin the whole data structure.
Mar 12 2004