digitalmars.D.learn - GC and big non-binary trees
- Thomas Kuehne (30/30) Apr 07 2005 -----BEGIN PGP SIGNED MESSAGE-----
- Russ Lewis (43/84) Apr 07 2005 (NOTE: All this assumes the current GC, which is a mark-and-sweep GC).
- Thomas Kuehne (16/36) Apr 07 2005 -----BEGIN PGP SIGNED MESSAGE-----
- Georg Wrede (3/44) Apr 07 2005 Can't you just make a tree of that size with dummy data, and see how it
- Thomas Kuehne (10/12) Apr 07 2005 -----BEGIN PGP SIGNED MESSAGE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I'm faced with huge non-binary trees: leafs: > 8000000 subleafs: 1-40 depth: > 17 Saddly the parent member is required and results in crossreferences. Occasinally some branches will be dropped. Should I 1) simply drop the branch by removing it's root from the members list, (huge GC costs) 2) walk down the dropped branch and set parent to null, (medium dropping costs, medium GC costs) 3) or walk down the dropped branch and set parent and members to null? (huge dropping costs) Tests till now indicate that (2) is the apropiate solution for my current trees (~20000 leafs), but the tree size will significantly. Any ideas how the GC will react or how to improve branch dropping and GC performance? Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFCVY6S3w+/yD4P9tIRAqRUAJ4iatJd2DarTOUyqsO7DyWsYvL1vwCePlNY ZCpVDJxy7mjlMIhwFYFTqtA= =Xps0 -----END PGP SIGNATURE-----
Apr 07 2005
(NOTE: All this assumes the current GC, which is a mark-and-sweep GC).
I assume from what you're saying is that the branch is a substantial
portion of the tree.
Remember that, when the GC scans, it only scans the *live* portion of
the tree. So, if you disconnect a large subset of the tree (even if
that subset still has links back to the live data), then the GC will
never scan any of the nodes of the branch. So it doesn't matter whether
you set the 'parent' parameters in them to null or not.
Concept:
The cost of a pass of the GC is proportional to the size of the live set
at the time of the sweep.
On the other hand, having a lot of garbage lingering around means that
you will more quickly run out of available memory, which causes the GC
to run sooner. So, if you are *certain* that the entire branch is
garbage (no outside references into any node), then you could decide to
recursively delete all of the nodes in the branch. That would increase
your 'dropping' costs, but would probably result in better long-term
performance. However, it might not be a good option if you're worried
about the real-time performance of your drop operation.
If you're worried about the real-time performance of your 'dropping'
operation, but don't want to have big GC sweeps either, then you could
implement a delayed drop. To do this, keep a global structure (perhaps
as simple as a dynamic array of references) around which holds a list of
dropped branches. During idle time, you iterate over that array, and
recursively delete all of the nodes in each dropped branch. (Or, just
zero the array, and kick off a GC pass.) Thus, dropping a branch is as
simple as removing it from its parent and storing a reference to the
branch on this array. If you want, you can also disable the GC during
your critical windows, and re-enable it during your idle times.
The effect of this would be to have very fast real-time performance
(drops are quick, and GC never runs during critical windows).
Allocation in that window would also be pretty fast. (Since the GC is
disabled, it never stops to sweep memory; if it ever lacks memory, it
just gets more from the OS.) Yet, during idle times, you free up a lot
of memory.
You would end up with a system where your total memory allocation is
M+C, where M is the maximum amount of memory you would need (typically),
and C is the amount of memory that is allocated/freed in a real-time
window. During the real-time portion of your program, the GC allocates
from C; during the idle portion, you release objects (or the GC collects
them), refilling C.
Hope this makes sense, and at least some of it applies to your work :)
Thomas Kuehne wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I'm faced with huge non-binary trees:
leafs: > 8000000
subleafs: 1-40
depth: > 17
Saddly the parent member is required and results in crossreferences.
Occasinally some branches will be dropped.
Should I
1) simply drop the branch by removing it's root from the members list,
(huge GC costs)
2) walk down the dropped branch and set parent to null,
(medium dropping costs, medium GC costs)
3) or walk down the dropped branch and set parent and members to null?
(huge dropping costs)
Tests till now indicate that (2) is the apropiate solution for my
current trees (~20000 leafs), but the tree size will significantly.
Any ideas how the GC will react or how to improve branch dropping
and GC performance?
Thomas
-----BEGIN PGP SIGNATURE-----
iD8DBQFCVY6S3w+/yD4P9tIRAqRUAJ4iatJd2DarTOUyqsO7DyWsYvL1vwCePlNY
ZCpVDJxy7mjlMIhwFYFTqtA=
=Xps0
-----END PGP SIGNATURE-----
Apr 07 2005
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Russ Lewis schrieb am Thu, 07 Apr 2005 13:34:46 -0700:(NOTE: All this assumes the current GC, which is a mark-and-sweep GC). I assume from what you're saying is that the branch is a substantial portion of the tree. Remember that, when the GC scans, it only scans the *live* portion of the tree. So, if you disconnect a large subset of the tree (even if that subset still has links back to the live data), then the GC will never scan any of the nodes of the branch. So it doesn't matter whether you set the 'parent' parameters in them to null or not.Arg, why did I thought it was a reference counting GC??? Back to the drawing board!Concept: The cost of a pass of the GC is proportional to the size of the live set at the time of the sweep. On the other hand, having a lot of garbage lingering around means that you will more quickly run out of available memory, which causes the GC to run sooner. So, if you are *certain* that the entire branch is garbage (no outside references into any node), then you could decide to recursively delete all of the nodes in the branch. That would increase your 'dropping' costs, but would probably result in better long-term performance. However, it might not be a good option if you're worried about the real-time performance of your drop operation.It's not a realtime application, rather a small memory eating animal! As I know the branch size at dropping time I'll try to delete all leafs in big branches and leaf small branches to the GC. <snip>Hope this makes sense, and at least some of it applies to your work :)Thanks, you gave me some ideas worth playing around with. Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFCViBj3w+/yD4P9tIRAgCSAJ0c+ZihWUGp7s1mn9tWjJAPKHpdgwCglsY6 XGBF0GNp32znHoE1CKqATG8= =GVWQ -----END PGP SIGNATURE-----
Apr 07 2005
Can't you just make a tree of that size with dummy data, and see how it
performs?
Thomas Kuehne wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I'm faced with huge non-binary trees:
leafs: > 8000000
subleafs: 1-40
depth: > 17
Saddly the parent member is required and results in crossreferences.
Occasinally some branches will be dropped.
Should I
1) simply drop the branch by removing it's root from the members list,
(huge GC costs)
2) walk down the dropped branch and set parent to null,
(medium dropping costs, medium GC costs)
3) or walk down the dropped branch and set parent and members to null?
(huge dropping costs)
Tests till now indicate that (2) is the apropiate solution for my
current trees (~20000 leafs), but the tree size will significantly.
Any ideas how the GC will react or how to improve branch dropping
and GC performance?
Thomas
-----BEGIN PGP SIGNATURE-----
iD8DBQFCVY6S3w+/yD4P9tIRAqRUAJ4iatJd2DarTOUyqsO7DyWsYvL1vwCePlNY
ZCpVDJxy7mjlMIhwFYFTqtA=
=Xps0
-----END PGP SIGNATURE-----
Apr 07 2005
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Georg Wrede schrieb am Fri, 08 Apr 2005 00:45:10 +0300:Can't you just make a tree of that size with dummy data, and see how it performs?No, I haven't got enough RAM for that. Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFCVhxq3w+/yD4P9tIRAlxgAJ0ash9wrb3v5FQ2EHc61uP7ZKNd3wCgyjR/ hyhmBwR4s0Esf1kZc5KwEZA= =hUu2 -----END PGP SIGNATURE-----
Apr 07 2005









Thomas Kuehne <thomas-dloop kuehne.thisisspam.cn> 