digitalmars.D.learn - associative arrays
- RenatoL (12/12) Jan 07 2012 Very quick question
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (11/24) Jan 07 2012 I think that could be a valid design choice. On the other hand, since
- RenatoL (3/3) Jan 07 2012 Yes, i agree this may acceptable. On the other hand if i really
- Jonathan M Davis (19/22) Jan 07 2012 In general, if an element isn't in a container, and you expect it to be ...
- RenatoL (5/5) Jan 07 2012 Yes, Jonathan, you're right.
- bearophile (20/34) Jan 07 2012 This an example that shows that silent failures are sources of bugs...
- Jonathan M Davis (29/31) Jan 07 2012 Why? What's the gain? If you care that the element is already in the AA,...
- Manfred Nowak (18/19) Jan 07 2012 There are some more arguments:
- Jesse Phillips (6/27) Jan 08 2012 Based on these arguments does that mean std.file.remove should
- RenatoL (1/1) Jan 08 2012 Very interesting discussion. Tk u all.
- dennis luehring (5/11) Jan 09 2012 but then it should be called optional_remove - because it "mabye" remove...
- Manfred Nowak (22/24) Jan 09 2012 Exceptions or errors are not _needed_.
- dennis luehring (8/23) Jan 09 2012 so your FileDelete would not return an FileDoesNotExists-Error?
- Manfred Nowak (12/15) Jan 10 2012 Correct.
- Stephen Bennett (15/23) Jan 08 2012 a.remove(x) doesn't fail though. It successfully satisfies its own
- Jonathan M Davis (60/94) Jan 07 2012 ide
- Manfred Nowak (4/4) Jan 07 2012 Jonathan M Davis wrote:
- Jonathan M Davis (3/8) Jan 07 2012 Which part?
- Kapps (11/23) Jan 07 2012 For most languages (such as C# and maybe Java), the Remove method on
- Jonathan M Davis (7/19) Jan 08 2012 There _is_ extra cost in returning bool rather than void, but I'm not at...
- Kapps (4/31) Jan 08 2012 Ah, found the bug / pull request.
- Jonathan M Davis (6/10) Jan 08 2012 Ah, TDPL says that it returns bool. Well, then it definitely needs to be...
- simendsjo (9/19) Jan 08 2012 Wouldn't it make sense to return a pointer to the item being
- Manfred Nowak (3/5) Jan 08 2012 According to the docs this is the intended behavior.
- simendsjo (5/10) Jan 08 2012 The only mention I can see of remove is at the top, and it doesn't state...
- Manfred Nowak (3/4) Jan 08 2012 Yes, I haven't read carefully enough.
- Jonathan M Davis (4/28) Jan 08 2012 Since the element has been removed from the container, what would the po...
- simendsjo (4/32) Jan 08 2012 I was thinking it could be a shorthand for the following:
- Jonathan M Davis (10/14) Jan 08 2012 I take it that you then intend to use item after that? I'm not sure that...
- simendsjo (8/22) Jan 08 2012 Ah.. This should be documented in http://dlang.org/hash-map.html
- Jonathan M Davis (5/33) Jan 08 2012 I confess that it's one of those things that I would have thought would ...
- simendsjo (9/42) Jan 08 2012 Certainly not obvious to me :)
- Jonathan M Davis (37/46) Jan 08 2012 Well, what's obvious to one person is not always obvious to another. I'm...
- simendsjo (7/53) Jan 08 2012 Thanks for your clarifications.
- Jonathan M Davis (6/13) Jan 08 2012 Yes. Depending on the current implementation of AAs and the state of aa ...
- simendsjo (2/15) Jan 08 2012 Thanks! You've saved me a couple of future bugs :)
- Jonathan M Davis (14/32) Jan 08 2012 No problem. It's a bit like how an array could be reallocated after you ...
- Steven Schveighoffer (15/30) Jan 09 2012 Actually, not invalid for the current implementation. I don't know if
- Jonathan M Davis (12/15) Jan 09 2012 Well, like I said, it depends on the current implementation. There are h...
- Andrej Mitrovic (6/10) Jan 09 2012 Funny, I was looking at the docs a few days ago and it said "Adding an
- Steven Schveighoffer (19/30) Jan 09 2012 Yeah, I could have sworn I stated somewhere that the current
- Andrej Mitrovic (5/5) Jan 09 2012 Ok, allow me to temporarily hijack again and ask: Would you mind
- Steven Schveighoffer (6/11) Jan 09 2012 Could this be you?
- Andrej Mitrovic (3/4) Jan 09 2012 Ah, yes. I didn't even notice you've replied to that, sorry. Yes, I'm
- Manfred Nowak (5/6) Jan 08 2012 A relevant part of Andrei's thread on "associative arrays iteration":
- Andrej Mitrovic (6/8) Jan 08 2012 Seems like that would be even more costly. Personally I think
- Kapps (3/13) Jan 09 2012 Looks like this is fixed for 2.058.
Very quick question import std.stdio; void main() { auto arr1 = ["uno":1, "due":2, "tre":3]; arr1.remove("xxx"); } also in this case the compiler does not say anything and the program goes out silently ... why? Would not it be better if an exception was raised? After all if i write: writeln(arr1["xxx"]); runtime expresses its disappointment...
Jan 07 2012
On 01/07/2012 02:11 PM, RenatoL wrote:Very quick question import std.stdio; void main() { auto arr1 = ["uno":1, "due":2, "tre":3]; arr1.remove("xxx"); } also in this case the compiler does not say anything and the program goes out silently ... why? Would not it be better if an exception was raised?I think that could be a valid design choice. On the other hand, since what is required is to remove the element, it is not really an error if the object is already missing. The requirement is met.After all if i write: writeln(arr1["xxx"]); runtime expresses its disappointment...You get the exception more simply by this: auto e = arr1["xxx"]; But that's understandable because the [] operator is supposed to provide access to a usable object. Since there is no general concept of a null object, there is no object to provide access to, so there is nothing else to do but to throw. Ali
Jan 07 2012
Yes, i agree this may acceptable. On the other hand if i really want/have to remove an item i have to be very careful cause a trivial typo could cause a disaster....
Jan 07 2012
On Saturday, January 07, 2012 23:34:05 RenatoL wrote:Yes, i agree this may acceptable. On the other hand if i really want/have to remove an item i have to be very careful cause a trivial typo could cause a disaster....In general, if an element isn't in a container, and you expect it to be there, it's bug in your code and _should_ cause disaster. That is, if your code is assuming that the element must be in the container before you remove it, you should be asserting for that. assert(key in aa); aa.remove(key); This will result in an AssertError if key is not in aa, and upon failure will indicate that you have a bug in your code that you need to fix. If, on the other hand, it's not a bug if that element is not in the container, then why would your code be assuming that it is? In that case, I don't see why you'd care whether it was actually in the container or not, and having remove not do anything if it isn't there is the perfect behavior in that case. So, I'd argue that if your code is assuming that the element is in the container, an assertion should be used, and the fact that remove doesn't throw an exception is irrelevant, since it would be the wrong thing to do anyway. And if your code can't assume that the element is in the container already, then just let remove remove it if it's there and do nothing if it's not. - Jonathan M Davis
Jan 07 2012
Yes, Jonathan, you're right. the question arose precisely from a typo... i had to remove an item with key "length"... i wrote "lengt" and the item never went away... i knew that "lengt" was not in my key list... This kind of mistake is quite tricky, may be using and IDE could help.
Jan 07 2012
RenatoL:Yes, Jonathan, you're right. the question arose precisely from a typo... i had to remove an item with key "length"... i wrote "lengt" and the item never went away... i knew that "lengt" was not in my key list... This kind of mistake is quite tricky, may be using and IDE could help.This an example that shows that silent failures are sources of bugs... So this time I don't agree with Jonathan Davis. Python Zen contains: Errors should never pass silently. And Python associative arrays show it:Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 5 The successive rule of the Python Zen says: Unless explicitly silenced. Python sets show this at work, they have a 'remove' method that throws if the item is missing, plus a 'discard' method that acts more like D associative arrays, in a sense it's a way to silence explicitly the error:d = {1:2, 3:4} del d[5]set([2])s = set([1, 2]) s.discard(1) sset([1, 2])s = set([1, 2]) s.discard(3) sTraceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 3 Maybe D associative arrays too should have both kinds of deleting methods :-) Bye, bearophiles.remove(3)
Jan 07 2012
On Saturday, January 07, 2012 21:54:07 bearophile wrote:Maybe D associative arrays too should have both kinds of deleting methods :-)Why? What's the gain? If you care that the element is already in the AA, then do either assert(key in aa); aa.remove(key); or enforce(key in aa); aa.remove(key); I suppose that that's less efficient than remove throwing on failure, since it has to do a double lookup, but since throwing the exception costs _far_ more than that, I don't think that the extra cost really matters But if you _don't_ care whether the element is in the AA when you remove it, and remove threw when the element wasn't there, then you'd have to do if(key in aa) aa.remove(key); which would result is a double lookup when you don't want one, and performance _would_ matter, since there isn't necessarily another operation which would cost a lot which is overshadowing the cost of the double lookup (as occurs in the case where you throw the exception when the element isn't there). Also, if you considered the key not being there to be a bug (as seems more likely to me), the fact that remove then threw an assertion would force you to do the extra check _anyway_. assert(key in aa); aa.remove(key); So, as far as I can tell, the current situation is more efficient, and it doesn't cost you any expressiveness. You can still have an exception thrown when remove fails if you use enforce before the call if you want an exception thrown when the element isn't there. - Jonathan M Davis
Jan 07 2012
Jonathan M Davis wrote:So, as far as I can tell, the current situation is more efficientThere are some more arguments: 1) Different threads might want to `remove' the same key from the AA. I don't see a reason why only the first served thread should complete the operation without an error. 2) Because there is only one operation for insertion of a key into an AA, only one operation should be used for removing that key. Note: This argument vanishes if insertions are divided in a:declaration of a key as valid in that AA, b:first assignment to the value of a key, c:reassignment to the value of a key 3) Reverse operations should not implement more functionality than the reversed operation. Because `remove' is the reverse operation for inserting a key and that operation does not cause any error, neither should errors be given from `remove'. Note: Reassigning to the value of a key might very well treated as an error. -manfred
Jan 07 2012
Based on these arguments does that mean std.file.remove should not be throwing when a file doesn't exist? Or more precisely should I add a bugzilla entry to change this. I certainly have a lot of if(exists) remove() in my code and end up having to change where I just use remove(). On Sunday, 8 January 2012 at 05:03:45 UTC, Manfred Nowak wrote:Jonathan M Davis wrote:So, as far as I can tell, the current situation is more efficientThere are some more arguments: 1) Different threads might want to `remove' the same key from the AA. I don't see a reason why only the first served thread should complete the operation without an error. 2) Because there is only one operation for insertion of a key into an AA, only one operation should be used for removing that key. Note: This argument vanishes if insertions are divided in a:declaration of a key as valid in that AA, b:first assignment to the value of a key, c:reassignment to the value of a key 3) Reverse operations should not implement more functionality than the reversed operation. Because `remove' is the reverse operation for inserting a key and that operation does not cause any error, neither should errors be given from `remove'. Note: Reassigning to the value of a key might very well treated as an error. -manfred
Jan 08 2012
assert(key in aa); aa.remove(key); So, as far as I can tell, the current situation is more efficient, and it doesn't cost you any expressiveness. You can still have an exception thrown when remove fails if you use enforce before the call if you want an exception thrown when the element isn't there.but then it should be called optional_remove - because it "mabye" removes its like file_delete, DeleteFile (and all the others) on non existing files - why is there an exception/error neeeded if missing? the "maybe"-situation is not that often used in good code - because you can't find your errors if many routines would work like remove
Jan 09 2012
dennis luehring wrote:why is there an exception/error neeeded if missing?Exceptions or errors are not _needed_. Their existence stems from the modell under which the user of the operation _has_ to think about the operation, especially whether it is a:only the outcome of the operation or b:the amount of work to get to this outcome and maybe c:at runtime has to be known, whether the object exists. Jesse Phillips wrote:I have a lot of if(exists) remove() in my codeAs one can see, Jesse does not know, whether the object exists and has to ask, before he in fact removes the object, i.e. doubled access to the file, which may cause a slowdown. As Jesse states the designer of the API has made up false assumptions about his knowledge, described in c:, _and_ his interests, described in b:. Jesse is interested only in the outcome of the operation, as described in a:, Therefore the likely answer to your question stated at the beginning is: the designer of the API made a mistake. -manfred
Jan 09 2012
Am 09.01.2012 22:08, schrieb Manfred Nowak:dennis luehring wrote:so your FileDelete would not return an FileDoesNotExists-Error? it isn't always clear what is "ok" to happen - there are million types of remove-like-semantic, an Jesse's scenario is one type of usage, which other type of remove allows silently non-remove-removes? STL? Python, etc. (how do others handle such cases) would it not help to better understand big code if the remove would be renamed to remove_existing or to add something like this?why is there an exception/error neeeded if missing?Exceptions or errors are not _needed_. Their existence stems from the modell under which the user of the operation _has_ to think about the operation, especially whether it is a:only the outcome of the operation or b:the amount of work to get to this outcome and maybe c:at runtime has to be known, whether the object exists. Jesse Phillips wrote:I have a lot of if(exists) remove() in my codeAs one can see, Jesse does not know, whether the object exists and has to ask, before he in fact removes the object, i.e. doubled access to the file, which may cause a slowdown.
Jan 09 2012
dennis luehring wrote:so your FileDelete would not return an FileDoesNotExists-Error?Correct.would it not help to better understand big code if the remove would be renamed to remove_existing or to add something like this?Maybe. You possibly know about the `rm'-command of *nix-like systems and the by typo inserted space, which makes `rm -r *.obj' to `rm -r * .obj' This will certainly result in the nice error-message: "cannot remove `.obj': No such file or directory" Therefore: trying to help a minority possibly routes the majority. -manfred
Jan 10 2012
On 1/7/2012 8:54 PM, bearophile wrote:a.remove(x) doesn't fail though. It successfully satisfies its own postconditions by placing a in a state whereby it no longer contains x. The real source of bugs in this example is using hardcoded strings instead of named constants. :) Besides, one frequently doesn't know if (x in a) or not ahead of time. Think of evicting some computed intermediate results from a cache when you detect that the underlying resource has changed, for instance. In fact, it's really rare for an item to be removed from a long-lived collection anywhere near the point that it was added. Insisting on existence before removal clutters up the code for an extremely common use case. FWIW, Java's Map and .NET's IDictionary also handle this situation silently. Regards, ~StephenYes, Jonathan, you're right. the question arose precisely from a typo... i had to remove an item with key "length"... i wrote "lengt" and the item never went away... i knew that "lengt" was not in my key list... This kind of mistake is quite tricky, may be using and IDE could help.This an example that shows that silent failures are sources of bugs... So this time I don't agree with Jonathan Davis.
Jan 08 2012
On Saturday, January 07, 2012 15:06:30 Ali =C3=87ehreli wrote:On 01/07/2012 02:11 PM, RenatoL wrote: > Very quick question >=20 > import std.stdio; > void main() > { >=20 > =09auto arr1 =3D ["uno":1, "due":2, "tre":3]; > =09arr1.remove("xxx"); >=20 > } >=20 > also in this case the compiler does not say anything and the > program goes out silently ... why? Would not it be better if an > exception was raised? =20 I think that could be a valid design choice. On the other hand, since=what is required is to remove the element, it is not really an error =ifthe object is already missing. The requirement is met. =20 > After all if i write: >=20 > writeln(arr1["xxx"]); >=20 > runtime expresses its disappointment... =20 You get the exception more simply by this: =20 auto e =3D arr1["xxx"]; =20 But that's understandable because the [] operator is supposed to prov=ideaccess to a usable object. Since there is no general concept of a nul=lobject, there is no object to provide access to, so there is nothing else to do but to throw.You likely know this, but I'll point it out in case a newbie to decide = to get=20 htrowing behavior out of the substript operator. If the element is not in the container, then the subscript operator thr= ows a=20 RangeError if -release isn't used, so it's an Error, not an Exception, = and=20 isn't a Throwable that you're supposed to catch (since the unwinding of= the=20 stack skips destructors, scope statements, finall blocks for any Throwa= bles=20 which aren't either Exception or derived from Exception). Also, without= - release, it does no bounds checking and will likely result in a segfaul= t. So,=20 relying on the subscript operator throwing is a _bad_ idea. The idea is= that=20 you never give it something that isn't actually in the container. If you need to check it, and you're dealing with a dynamic or static ar= ray,=20 then you check the array's length. If you're dealing with an associativ= e=20 array, then you use the in operator to get a pointer to the element fro= m the=20 container and check whether it's null or not (since if it's null, it wa= sn't=20 there). It would probably make remove more expensive if it had to throw when it= didn't=20 find an element, and for the most part, it's rather meaningless whether= it was=20 in the container before that. As pointed out, remove makes sure that th= e=20 element is no longer in the container after it's been called. It has no= real=20 reason to care whether an element was in there before or not. And if _y= ou_=20 don't care, then being forced to check whether an element was in the co= ntainer=20 before trying to remove it would make your code less efficient. The way= remove=20 is now, the check only occurs if you do it yourself and therefore actua= lly=20 _want_ it. And odds are, such a check would be an assertion anyway (and= =20 therefore would be throwing an AssertError on failure, not an Exception= ),=20 since the fact that the element wasn't in there anymore is a bug in you= r code,=20 since your code is assuming that it's there. So, while I can understand someone's first reaction being why remove do= esn't=20 throw if the element isn't there, when you really think through it, it = does=20 make more sense for it not to throw. - Jonathan M Davis
Jan 07 2012
Jonathan M Davis wrote: [...] This sort of explanation is missing in the docs. -manfred
Jan 07 2012
On Sunday, January 08, 2012 00:13:12 Manfred Nowak wrote:Jonathan M Davis wrote: [...] This sort of explanation is missing in the docs.Which part? - Jonathan M Davis
Jan 07 2012
collections returns a boolean indicating whether it was removed. So you could write: enforce(MyAA.remove("lenght")) or bool Result = MyAA.remove("lenght"); assert(Result); I'm not sure why it doesn't in D, but I thought I remembered seeing a pull request or change that added it. Maybe I'm imagining things since I can't find it now. On 07/01/2012 4:11 PM, RenatoL wrote:Very quick question import std.stdio; void main() { auto arr1 = ["uno":1, "due":2, "tre":3]; arr1.remove("xxx"); } also in this case the compiler does not say anything and the program goes out silently ... why? Would not it be better if an exception was raised? After all if i write: writeln(arr1["xxx"]); runtime expresses its disappointment...
Jan 07 2012
On Sunday, January 08, 2012 01:39:24 Kapps wrote:collections returns a boolean indicating whether it was removed. So you could write: enforce(MyAA.remove("lenght")) or bool Result = MyAA.remove("lenght"); assert(Result); I'm not sure why it doesn't in D, but I thought I remembered seeing a pull request or change that added it. Maybe I'm imagining things since I can't find it now.There _is_ extra cost in returning bool rather than void, but I'm not at all sure that it's unreasonable to incur that cost. std.container.RedBlackTree's remove function returns how many elements were removed. Other functions in Phobos do similar in similar situations. It probably merits an enhancement request. - Jonathan M Davis
Jan 08 2012
Ah, found the bug / pull request. https://github.com/D-Programming-Language/dmd/pull/597 http://d.puremagic.com/issues/show_bug.cgi?id=4523 On 08/01/2012 1:39 AM, Kapps wrote:collections returns a boolean indicating whether it was removed. So you could write: enforce(MyAA.remove("lenght")) or bool Result = MyAA.remove("lenght"); assert(Result); I'm not sure why it doesn't in D, but I thought I remembered seeing a pull request or change that added it. Maybe I'm imagining things since I can't find it now. On 07/01/2012 4:11 PM, RenatoL wrote:Very quick question import std.stdio; void main() { auto arr1 = ["uno":1, "due":2, "tre":3]; arr1.remove("xxx"); } also in this case the compiler does not say anything and the program goes out silently ... why? Would not it be better if an exception was raised? After all if i write: writeln(arr1["xxx"]); runtime expresses its disappointment...
Jan 08 2012
On Sunday, January 08, 2012 03:24:22 Kapps wrote:Ah, found the bug / pull request. https://github.com/D-Programming-Language/dmd/pull/597 http://d.puremagic.com/issues/show_bug.cgi?id=4523Ah, TDPL says that it returns bool. Well, then it definitely needs to be changed, and it's good to see that that there's a pull request for it. So, it should be fixed in the next release. In fact, the number of TDPL-related bugs fixed for the next release should be quite high, which is great. - Jonathan M Davis
Jan 08 2012
On 08.01.2012 10:43, Jonathan M Davis wrote:On Sunday, January 08, 2012 03:24:22 Kapps wrote:Wouldn't it make sense to return a pointer to the item being removed/null? This way you can continue to work with the item, and it would be consistent with `in`. if (auto item = aa.remove(key)) { // some additional work on item } else { // not found // throw or something }Ah, found the bug / pull request. https://github.com/D-Programming-Language/dmd/pull/597 http://d.puremagic.com/issues/show_bug.cgi?id=4523Ah, TDPL says that it returns bool. Well, then it definitely needs to be changed, and it's good to see that that there's a pull request for it. So, it should be fixed in the next release. In fact, the number of TDPL-related bugs fixed for the next release should be quite high, which is great. - Jonathan M Davis
Jan 08 2012
simendsjo wrote:Wouldn't it make sense to return a pointer to the item being removed/null?According to the docs this is the intended behavior. -manfred
Jan 08 2012
On 08.01.2012 11:09, Manfred Nowak wrote:simendsjo wrote:The only mention I can see of remove is at the top, and it doesn't state return value: http://dlang.org/hash-map.html The pull changes the return value to bool: https://github.com/9rnsr/dmd/commit/f3cc75445775927e2036054811afb686cf4f1624Wouldn't it make sense to return a pointer to the item being removed/null?According to the docs this is the intended behavior. -manfred
Jan 08 2012
simendsjo wrote:it doesn't state return valueYes, I haven't read carefully enough. -manfred
Jan 08 2012
On Sunday, January 08, 2012 11:02:41 simendsjo wrote:On 08.01.2012 10:43, Jonathan M Davis wrote:Since the element has been removed from the container, what would the pointer point to? The element doesn't exist anymore. - Jonathan m DavisOn Sunday, January 08, 2012 03:24:22 Kapps wrote:Wouldn't it make sense to return a pointer to the item being removed/null? This way you can continue to work with the item, and it would be consistent with `in`. if (auto item = aa.remove(key)) { // some additional work on item } else { // not found // throw or something }Ah, found the bug / pull request. https://github.com/D-Programming-Language/dmd/pull/597 http://d.puremagic.com/issues/show_bug.cgi?id=4523Ah, TDPL says that it returns bool. Well, then it definitely needs to be changed, and it's good to see that that there's a pull request for it. So, it should be fixed in the next release. In fact, the number of TDPL-related bugs fixed for the next release should be quite high, which is great. - Jonathan M Davis
Jan 08 2012
On 08.01.2012 11:27, Jonathan M Davis wrote:On Sunday, January 08, 2012 11:02:41 simendsjo wrote:I was thinking it could be a shorthand for the following: auto item = key in aa; if (item) key.remove(key);On 08.01.2012 10:43, Jonathan M Davis wrote:Since the element has been removed from the container, what would the pointer point to? The element doesn't exist anymore. - Jonathan m DavisOn Sunday, January 08, 2012 03:24:22 Kapps wrote:Wouldn't it make sense to return a pointer to the item being removed/null? This way you can continue to work with the item, and it would be consistent with `in`. if (auto item = aa.remove(key)) { // some additional work on item } else { // not found // throw or something }Ah, found the bug / pull request. https://github.com/D-Programming-Language/dmd/pull/597 http://d.puremagic.com/issues/show_bug.cgi?id=4523Ah, TDPL says that it returns bool. Well, then it definitely needs to be changed, and it's good to see that that there's a pull request for it. So, it should be fixed in the next release. In fact, the number of TDPL-related bugs fixed for the next release should be quite high, which is great. - Jonathan M Davis
Jan 08 2012
On Sunday, January 08, 2012 11:33:51 simendsjo wrote:I was thinking it could be a shorthand for the following: auto item = key in aa; if (item) key.remove(key);I take it that you then intend to use item after that? I'm not sure that item is actually valid at that point. It points to memory which _was_ inside of aa but may or may not be now, depending on the AA implementation, and it may or may not be reused by aa. Because it's on the GC heap, the memory itself won't be invalid, but the memory could be reused by the AA for another element, depending on its implementation. If you're lucky, it's memory that's just sitting on the GC heap and now unrelated to the AA, but I wouldn't bet on it. I would _not_ consider that to be good code. It's just asking for trouble. - Jonathan M Davis
Jan 08 2012
On 08.01.2012 12:18, Jonathan M Davis wrote:On Sunday, January 08, 2012 11:33:51 simendsjo wrote:Ah.. This should be documented in http://dlang.org/hash-map.html The only documentation for remove is: " Particular keys in an associative array can be removed with the remove function: b.remove("hello"); "I was thinking it could be a shorthand for the following: auto item = key in aa; if (item) key.remove(key);I take it that you then intend to use item after that? I'm not sure that item is actually valid at that point. It points to memory which _was_ inside of aa but may or may not be now, depending on the AA implementation, and it may or may not be reused by aa. Because it's on the GC heap, the memory itself won't be invalid, but the memory could be reused by the AA for another element, depending on its implementation. If you're lucky, it's memory that's just sitting on the GC heap and now unrelated to the AA, but I wouldn't bet on it. I would _not_ consider that to be good code. It's just asking for trouble. - Jonathan M Davis
Jan 08 2012
On Sunday, January 08, 2012 12:35:27 simendsjo wrote:On 08.01.2012 12:18, Jonathan M Davis wrote:I confess that it's one of those things that I would have thought would have been fairly obvious, but it certainly wouldn't hurt to add it. The documentation does tend to be sparse of details in any case. - Jonathan M DavisOn Sunday, January 08, 2012 11:33:51 simendsjo wrote:Ah.. This should be documented in http://dlang.org/hash-map.html The only documentation for remove is: " Particular keys in an associative array can be removed with the remove function: b.remove("hello"); "I was thinking it could be a shorthand for the following: auto item = key in aa; if (item) key.remove(key);I take it that you then intend to use item after that? I'm not sure that item is actually valid at that point. It points to memory which _was_ inside of aa but may or may not be now, depending on the AA implementation, and it may or may not be reused by aa. Because it's on the GC heap, the memory itself won't be invalid, but the memory could be reused by the AA for another element, depending on its implementation. If you're lucky, it's memory that's just sitting on the GC heap and now unrelated to the AA, but I wouldn't bet on it. I would _not_ consider that to be good code. It's just asking for trouble. - Jonathan M Davis
Jan 08 2012
On 08.01.2012 12:57, Jonathan M Davis wrote:On Sunday, January 08, 2012 12:35:27 simendsjo wrote:Certainly not obvious to me :) auto c = new C(); C[string] aa; aa["c"] = c; aa.remove("c"); I guess using c from this point is valid, but if the only reference to c was inside the aa (and we got c using `in`), using c would have been undefined..?On 08.01.2012 12:18, Jonathan M Davis wrote:I confess that it's one of those things that I would have thought would have been fairly obvious, but it certainly wouldn't hurt to add it. The documentation does tend to be sparse of details in any case. - Jonathan M DavisOn Sunday, January 08, 2012 11:33:51 simendsjo wrote:Ah.. This should be documented in http://dlang.org/hash-map.html The only documentation for remove is: " Particular keys in an associative array can be removed with the remove function: b.remove("hello"); "I was thinking it could be a shorthand for the following: auto item = key in aa; if (item) key.remove(key);I take it that you then intend to use item after that? I'm not sure that item is actually valid at that point. It points to memory which _was_ inside of aa but may or may not be now, depending on the AA implementation, and it may or may not be reused by aa. Because it's on the GC heap, the memory itself won't be invalid, but the memory could be reused by the AA for another element, depending on its implementation. If you're lucky, it's memory that's just sitting on the GC heap and now unrelated to the AA, but I wouldn't bet on it. I would _not_ consider that to be good code. It's just asking for trouble. - Jonathan M Davis
Jan 08 2012
On Sunday, January 08, 2012 13:06:06 simendsjo wrote:Certainly not obvious to me :)Well, what's obvious to one person is not always obvious to another. I'm sure that there are plenty of things that Walter thinks should be perfectly obvious which 90% of programmers would never think of. A lot of it depends on what you're experience level is and what you have experience in. In this case, it's very much like dealing with C++ iterators, so if you're used to dealing with them and when they do or don't get invalidate, then the situation with in and remove is probably quite straightforward and intuitive.auto c = new C(); C[string] aa; aa["c"] = c; aa.remove("c"); I guess using c from this point is valid, but if the only reference to c was inside the aa (and we got c using `in`), using c would have been undefined..?Using c is valid regardless. The problem is the in operator. It returns a pointer to the element inside of the AA. So, if you did C* d = aa["c"]; then d would be a pointer to a reference to a C, and after the call to remove, the pointer is pointing to memory which may or may not be part of the AA anymore and which may or may not hold a reference to what c refers to. It's bit like doing this int* a() { int b; return &b; } though the compiler should catch this, since it's obviously wrong. In both cases, you're referring to memory which your really not supposed to be accessing anymore and may point to invalid memory. The case of the AA probably isn't as bad, since either you're pointing to memory on the GC which isn't part of the AA (but hasn't been collected, since you have a pointer to it), or you're pointing to an element in the AA which may still be the same as it was before it was removed, or it may be another element, or it may be garbage memory. You really don't know. It's undefined and therefore shouldn't be used. I don't believe that it's ever safe to assume that the pointer returned by the in operator is still valid after the AA that it comes from has had an element added or removed. It's the same with iterators or ranges with many iterators and ranges. If they point to elements within a container, and that container is altered, they're invalidated and aren't safe to use. So, your example is fine, because it doesn't involve the in operator, and therefore doesn't involve any pointers into the AA. It's only once you have pointers into the AA that you get into trouble. - Jonathan M Davis
Jan 08 2012
On 08.01.2012 13:49, Jonathan M Davis wrote:On Sunday, January 08, 2012 13:06:06 simendsjo wrote:Thanks for your clarifications. Does this mean even this is undefined? aa["a"] = new C(); auto c = "a" in aa; aa["b"] = new C(); // Using c here is undefined as an element was added to aaCertainly not obvious to me :)Well, what's obvious to one person is not always obvious to another. I'm sure that there are plenty of things that Walter thinks should be perfectly obvious which 90% of programmers would never think of. A lot of it depends on what you're experience level is and what you have experience in. In this case, it's very much like dealing with C++ iterators, so if you're used to dealing with them and when they do or don't get invalidate, then the situation with in and remove is probably quite straightforward and intuitive.auto c = new C(); C[string] aa; aa["c"] = c; aa.remove("c"); I guess using c from this point is valid, but if the only reference to c was inside the aa (and we got c using `in`), using c would have been undefined..?Using c is valid regardless. The problem is the in operator. It returns a pointer to the element inside of the AA. So, if you did C* d = aa["c"]; then d would be a pointer to a reference to a C, and after the call to remove, the pointer is pointing to memory which may or may not be part of the AA anymore and which may or may not hold a reference to what c refers to. It's bit like doing this int* a() { int b; return&b; } though the compiler should catch this, since it's obviously wrong. In both cases, you're referring to memory which your really not supposed to be accessing anymore and may point to invalid memory. The case of the AA probably isn't as bad, since either you're pointing to memory on the GC which isn't part of the AA (but hasn't been collected, since you have a pointer to it), or you're pointing to an element in the AA which may still be the same as it was before it was removed, or it may be another element, or it may be garbage memory. You really don't know. It's undefined and therefore shouldn't be used. I don't believe that it's ever safe to assume that the pointer returned by the in operator is still valid after the AA that it comes from has had an element added or removed. It's the same with iterators or ranges with many iterators and ranges. If they point to elements within a container, and that container is altered, they're invalidated and aren't safe to use. So, your example is fine, because it doesn't involve the in operator, and therefore doesn't involve any pointers into the AA. It's only once you have pointers into the AA that you get into trouble. - Jonathan M Davis
Jan 08 2012
On Sunday, January 08, 2012 14:24:32 simendsjo wrote:Thanks for your clarifications. Does this mean even this is undefined? aa["a"] = new C(); auto c = "a" in aa; aa["b"] = new C(); // Using c here is undefined as an element was added to aaYes. Depending on the current implementation of AAs and the state of aa when "b" is added to it, c could still point to exactly what it did before, or aa could have been rehashed, which could move any or all of its elements around, in which case who knows what c points to. - Jonathan M Davis
Jan 08 2012
On 08.01.2012 14:40, Jonathan M Davis wrote:On Sunday, January 08, 2012 14:24:32 simendsjo wrote:Thanks! You've saved me a couple of future bugs :)Thanks for your clarifications. Does this mean even this is undefined? aa["a"] = new C(); auto c = "a" in aa; aa["b"] = new C(); // Using c here is undefined as an element was added to aaYes. Depending on the current implementation of AAs and the state of aa when "b" is added to it, c could still point to exactly what it did before, or aa could have been rehashed, which could move any or all of its elements around, in which case who knows what c points to. - Jonathan M Davis
Jan 08 2012
On Sunday, January 08, 2012 14:57:51 simendsjo wrote:On 08.01.2012 14:40, Jonathan M Davis wrote:No problem. It's a bit like how an array could be reallocated after you append to it, which causes slices to no longer point to that array, except that unlike the slice the pointer may not point to valid memory. Given that it's managed by the GC, it probably isn't pointing to complete garbage, but it could be used for something else now. In general, if you insert or remove elements from a container, you must be very careful about any references to the data within the container - be they pointers to elements in the container, or iterators or ranges for the container. That's why std.container has the stable functions. They guarantee that the container's existing ranges don't get invalidated when they're called, whereas the ones that aren't stable make no such guarantees. The issues with in are just one of the manifestations of the more general issue. - Jonathan M DavisOn Sunday, January 08, 2012 14:24:32 simendsjo wrote:Thanks! You've saved me a couple of future bugs :)Thanks for your clarifications. Does this mean even this is undefined? aa["a"] = new C(); auto c = "a" in aa; aa["b"] = new C(); // Using c here is undefined as an element was added to aaYes. Depending on the current implementation of AAs and the state of aa when "b" is added to it, c could still point to exactly what it did before, or aa could have been rehashed, which could move any or all of its elements around, in which case who knows what c points to. - Jonathan M Davis
Jan 08 2012
On Sun, 08 Jan 2012 08:40:27 -0500, Jonathan M Davis <jmdavisProg gmx.com> wrote:On Sunday, January 08, 2012 14:24:32 simendsjo wrote:Actually, not invalid for the current implementation. I don't know if it's stated whether an AA specifically requires that elements do not re-associate on a rehash. There are certain hash implementations (such as ones that involve a single array), which would invalidate pointers on a rehash. So there is the potential (if it's not a stated requirement to the contrary in the spec) that some future AA implementation would invalidated references. However, I'd say it's unlikely this would happen. BTW, dcollections' HashMap, HashSet, and HashMultiset do guarantee that adding elements does not invalidated cursors (dcollections' safe version of pointers) as long as you use the default Hash implementation. However, I just noticed this is not stated anywhere in the docs (will fix). -SteveThanks for your clarifications. Does this mean even this is undefined? aa["a"] = new C(); auto c = "a" in aa; aa["b"] = new C(); // Using c here is undefined as an element was added to aaYes. Depending on the current implementation of AAs and the state of aa when "b" is added to it, c could still point to exactly what it did before, or aa could have been rehashed, which could move any or all of its elements around, in which case who knows what c points to.
Jan 09 2012
On Monday, January 09, 2012 09:25:14 Steven Schveighoffer wrote:Actually, not invalid for the current implementation. I don't know if it's stated whether an AA specifically requires that elements do not re-associate on a rehash.Well, like I said, it depends on the current implementation. There are hash table implementations where rehashing would invalidate the pointer returned by in, and I don't believe that the spec specificies that D's AA guarantees that rehashing doesn't do that. So in the future, it _could_ be changed to an implementation which invalidates the pointers on rehash. As such, it doesn't really matter what the current implementation does. Relying on the current behavior is unsafe if it's not guaranteed by the spec. Now, if we want to change the spec to make such guarantees, that's fine, but I don't believe it makes them right now. Good to know what the current implementation is doing though. - Jonathan m Davis
Jan 09 2012
On 1/9/12, Steven Schveighoffer <schveiguy yahoo.com> wrote:BTW, dcollections' HashMap, HashSet, and HashMultiset do guarantee that adding elements does not invalidated cursors (dcollections' safe version of pointers) as long as you use the default Hash implementation. However, I just noticed this is not stated anywhere in the docs (will fix).Funny, I was looking at the docs a few days ago and it said "Adding an element can invalidate cursors depending on the implementation.", so I just assumed it did invalidate the cursors. I do think those are dcollections v1 docs though. Anyway glad to hear from you about this.
Jan 09 2012
On Mon, 09 Jan 2012 13:35:26 -0500, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:On 1/9/12, Steven Schveighoffer <schveiguy yahoo.com> wrote:Yeah, I could have sworn I stated somewhere that the current implementation doesn't invalidate cursors, but apparently it's not stated. I think the docs need a lot of TLC before the first release. Some day when I have time... To clarify what I plan to do, the add doc will say something like "adding an element can invalidate cursors depending on the implementation, however the default implementation guarantees not to invalidate them." I don't want to specifically disallow implementations which do invalidate (dcollections' implementations are pluggable so anyone can replace the internals).BTW, dcollections' HashMap, HashSet, and HashMultiset do guarantee that adding elements does not invalidated cursors (dcollections' safe version of pointers) as long as you use the default Hash implementation. However, I just noticed this is not stated anywhere in the docs (will fix).Funny, I was looking at the docs a few days ago and it said "Adding an element can invalidate cursors depending on the implementation.", so I just assumed it did invalidate the cursors.I do think those are dcollections v1 docs though. Anyway glad to hear from you about this.The D2 docs are somewhat leftover from D1 version. However, I do remember when implementing the hash code that it purposely does not invalidate cursors on a rehash. Ranges can be invalidated, however. Given the implementation, it's actually faster *not* to invalidate them when rehashing because I just relink all the link-list nodes instead of reallocating them. -Steve
Jan 09 2012
Ok, allow me to temporarily hijack again and ask: Would you mind adding opIn_r (or rather the newer opBinaryRight with "in") that forwards to contains() for the HashSet and similar hash-based classes that define contains()? It would make porting code that uses builtin hashes to your own implementation easier.
Jan 09 2012
On Mon, 09 Jan 2012 14:57:36 -0500, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:Ok, allow me to temporarily hijack again and ask: Would you mind adding opIn_r (or rather the newer opBinaryRight with "in") that forwards to contains() for the HashSet and similar hash-based classes that define contains()? It would make porting code that uses builtin hashes to your own implementation easier.Could this be you? http://www.dsource.org/projects/dcollections/ticket/18 If so, this means you are OK with the last suggestion I made? -Steve
Jan 09 2012
On 1/9/12, Steven Schveighoffer <schveiguy yahoo.com> wrote:Could this be you?Ah, yes. I didn't even notice you've replied to that, sorry. Yes, I'm ok with it.
Jan 09 2012
Jonathan M Davis wrote:In this case, it's very much like dealing with C++ iteratorsA relevant part of Andrei's thread on "associative arrays iteration": http://www.digitalmars.com/d/archives/digitalmars/D/associative_arrays_ iteration_is_finally_here_99576.html#N99614 -manfred
Jan 08 2012
On 1/8/12, simendsjo <simendsjo gmail.com> wrote:Wouldn't it make sense to return a pointer to the item being removed/null?Seems like that would be even more costly. Personally I think returning bool is unnecessary, if we really want to know if something is in the hash we can check with the `in` operator. I filed that bug ages ago simply because it was in TDPL, but I didn't really give it much thought.
Jan 08 2012
Looks like this is fixed for 2.058. https://github.com/D-Programming-Language/dmd/commit/3e23b0f5834acb32eaee20d88c30ead7e03bb2f4 On 08/01/2012 3:43 AM, Jonathan M Davis wrote:On Sunday, January 08, 2012 03:24:22 Kapps wrote:Ah, found the bug / pull request. https://github.com/D-Programming-Language/dmd/pull/597 http://d.puremagic.com/issues/show_bug.cgi?id=4523Ah, TDPL says that it returns bool. Well, then it definitely needs to be changed, and it's good to see that that there's a pull request for it. So, it should be fixed in the next release. In fact, the number of TDPL-related bugs fixed for the next release should be quite high, which is great. - Jonathan M Davis
Jan 09 2012