digitalmars.D - Improve the OOP ABI
- deadalnix (128/128) Sep 28 2023 Hi all,
- Richard (Rikki) Andrew Cattermole (7/14) Sep 28 2023 TypeInfo isn't guaranteed to be unique to a process when dealing with
- Adam D Ruppe (11/16) Sep 28 2023 This should be fixed, but even if it isn't, that just means using
- deadalnix (5/20) Sep 28 2023 lol, wat?
- Guillaume Piolat (3/12) Sep 28 2023 I think it's the same in C++, and sometimes dynamic_cast ends up
- deadalnix (3/17) Sep 28 2023 What's the reason for this madness?
- Guillaume Piolat (8/11) Sep 29 2023 For example for GCC =>
- deadalnix (7/21) Sep 29 2023 At least the GCC one has some explaination. What stroke me is:
- Guillaume Piolat (6/10) Sep 29 2023 I'm way underqualified. Using RTLD_NOW (and not sure what it
- Richard (Rikki) Andrew Cattermole (18/18) Sep 29 2023 https://linux.die.net/man/3/dlopen
- Walter Bright (3/4) Jan 11 2024 You're right. The ClassInfo can be merged with the vtbl[]. The trouble i...
- FeepingCreature (4/7) Sep 28 2023 We should solve this problem by guaranteeing TypeInfo to be
- Richard (Rikki) Andrew Cattermole (2/2) Sep 28 2023 I said this on Discord, but that means all TypeInfo has to be exported,
- FeepingCreature (8/10) Sep 29 2023 Yep, and as I said, I don't think that harms anything.
- Richard (Rikki) Andrew Cattermole (3/6) Sep 29 2023 The linker can't do that. So you're relying entirely on the compiler for...
- Alexandru Ermicioi (7/9) Sep 29 2023 There might be an issue when your lib uses two other libs, and
- Richard (Rikki) Andrew Cattermole (6/15) Sep 29 2023 It shouldn't be. As long as everything is the same codegen wise (except
- ryuukk_ (9/11) Sep 30 2023 I agree, however let us not get side tracked
- Alexandru Ermicioi (4/7) Oct 01 2023 Thats one extremely biased view on D use. OOP is useful for
- deadalnix (6/13) Oct 01 2023 Indeed, if the basics such as OOP are not implemented right,
- max haughton (11/25) Oct 01 2023 Also worth adding that there is a difference between the midwit
- Jonathan M Davis (30/41) Oct 01 2023 If they're not what the language currently does, it's still a new featur...
- deadalnix (4/6) Nov 13 2023 Here we go:
- Walter Bright (4/4) Jan 11 2024 https://issues.dlang.org/show_bug.cgi?id=24332
- Walter Bright (3/4) Jan 14 2024 This has a PR for it now.
- deadalnix (2/6) Jan 18 2024 Thanks, that's a step in the right direction.
- Walter Bright (2/3) Jan 19 2024 You're welcome. I'd still like a list of the other problems you alluded ...
- Andrea Fontana (3/7) Jan 22 2024 Why md5 and not a faster method?
- ryuukk_ (3/12) Jan 22 2024 That is good opportunity to submit a PR, try it, it's not that
- Andrea Fontana (6/22) Jan 22 2024 I know: I've already submitted PR to dmd and phobos :)
- Andrea Fontana (3/26) Jan 22 2024 The right one is FNV-1a:
- Andrea Fontana (3/5) Jan 22 2024 Probably xxhash is even faster, but there's no public domain
- Hipreme (86/116) Jan 22 2024 FNV1A is used for `hashOf`. HashOf seems to be a lot faster than
- ryuukk_ (6/127) Jan 22 2024 It hashes class name, they'll probably never be more than 30
- Andrea Fontana (4/17) Jan 22 2024 Indeed. And probably xxhash would be even faster if well
- Alexandru Ermicioi (2/3) Jan 22 2024 What algorithm would you suggest?
- Walter Bright (7/8) Jan 22 2024 Md5 has an extremely remote chance of two class names hashing to the sam...
- Richard (Rikki) Andrew Cattermole (12/23) Jan 22 2024 Quite a few months ago I discussed this problem for class comparison and...
- Walter Bright (3/4) Jan 22 2024 https://stackoverflow.com/questions/201705/how-many-random-elements-befo...
- Richard (Rikki) Andrew Cattermole (7/14) Jan 22 2024 If we were gambling on something not happening, those are great odds.
- Walter Bright (9/18) Jan 22 2024 I do understand the desire for mathematical perfection, and feel it myse...
- Bruce Carneal (4/14) Jan 22 2024 Blake3 might be worth a look. It's reportedly faster and
- Walter Bright (2/5) Jan 22 2024 Thanks for the tip. I don't know how to evaluate a hash function for uni...
- deadalnix (10/17) Jan 22 2024 Blake is > 128 bits, so I don't think there is anything really
- Walter Bright (4/6) Jan 22 2024 The hash also happens at compile time, not runtime. The only thing I wou...
- Siarhei Siamashka (8/19) Jan 22 2024 It's probably possible to take just the first (or last) 128 bits
- Jonathan M Davis (8/29) Jan 22 2024 Why on earth would we care if the hash is secure in this context? We
- Siarhei Siamashka (8/41) Jan 22 2024 BLAKE3 was mentioned because it is allegedly faster than MD5
- Bruce Carneal (11/38) Jan 22 2024 I don't think it matters at all except for what it implies about
- Bruce Carneal (11/18) Jan 22 2024 I've not looked into that aspect either. I'd just assumed that
- Andrea Fontana (4/14) Jan 23 2024 So probably you like xxhash128
- Walter Bright (2/5) Jan 23 2024 Thanks! I do, it has a 64 bit version, does anyone know what the collisi...
- David Gileadi (5/12) Jan 23 2024 xxhash has a wiki page about it [1]. Given that it's their page, it's
- Siarhei Siamashka (11/24) Jan 23 2024 If you really need a 64-bit version, then you can simply take the
- deadalnix (10/27) Jan 23 2024 While it is not possible to do better than this, it is certainly
- Siarhei Siamashka (6/30) Jan 23 2024 Go ahead and show me truncated output of any hash, designed to be
- ryuukk_ (11/14) Jan 24 2024 Why try to demotivate people? this could be a good entry point
- Andrea Fontana (5/11) Jan 23 2024 Check this:
- deadalnix (3/9) Jan 23 2024 For 64 bits, your best best is CRC64. However, the risk of
- Alexandru Ermicioi (6/10) Jan 22 2024 Just wondering, could a trie be used instead of hashing algorithm?
- Richard (Rikki) Andrew Cattermole (4/19) Jan 22 2024 That sounds an lot like an assumption that TypeInfo is unique in a
- Walter Bright (2/5) Jan 22 2024 It isn't true because of shared libraries.
- Richard (Rikki) Andrew Cattermole (4/11) Jan 22 2024 Or incremental compilation, the symbol could at least in theory be
Hi all, I have been arguing for quite some time about the fact that there is a well of improvement to make to D that are not necessarily breaking changes or languages addition - and that, in fact, D having more feature than most language, adding feature is clearly not where the impact is. Today I would like to provide a concrete example of things that could be dramatically improved, with no breakage except at the ABI level - which we already do not guarantee. Let's go over these changes. 1/ Downcast to final classes. ```d class A {} final class B : A {} auto foo(A a) { return cast(B) a; } ``` This generates a call into the runtime, making things completely opaque to the optimizer. The algorithm used by the runtime is linear. But there is a dumb constant time solution. because B is final, a is either an instance of B, or it is not (as in, it cannot be an instance of a subclass of B). Therefore, we expect the codegen to look like the following instead: ```d auto foo(A a) { return typeid(a) is typeid(B) ? a : null; } ``` This obviously won't compile but you get the idea. Not only this is constant time, but very simple too, and visible to the optimizer, which means the check can be folded by the opitimizer after other transforms, for instance inlining. 2/ Inline ClassInfo with the vtbl. The current layout of objects is as follow: ``` +-----------+ +-----------+ +-----------+ | __vtbl +--->| typeid +--->| ClassInfo | +-----------+ +-----------+ | | | __monitor | | method0 | | ... | +-----------+ +-----------+ | | | field0 | | method1 | +-----------+ +-----------+ +-----------+ | field1 | | ... | +-----------+ +-----------+ | ... | +-----------+ ``` This causes a ton of extra indirections that are not necessary. instead the following layout ought to be used: ``` +-----------+ +-----------+ | __vtbl +--->| ClassInfo | +-----------+ | | | __monitor | | ... | +-----------+ | | | field0 | +-----------+ +-----------+ | method0 | | field1 | +-----------+ +-----------+ | method1 | | ... | +-----------+ +-----------+ | ... | +-----------+ ``` Alternatively, it is possible to have the pointer point at the first method and subtract to get the typeid. The important part is that there is only one indirection now. 3/ Downcast. Currently, we use a linear algorithm to downcast and we reach this algorithm via a call int he runtime. This prevent the optimizer from doing anything about it, in addition to be slow. The problem is similar to 1/. The whole check can be made trivial if we let the compiler prebake some data for us, namely an array of primary parents. Let's see what it looks like in the first example, when b is not final. ```d class A {} class B : A {} auto foo(A a) { return cast(B) a; } ``` Which we can do as follow: ```d auto foo(A a) { // The primaries generated by the compiler for us. auto tidA = typeid(A); assert(tidA.primaries = [tidA]); auto tidB = typeid(B); assert(tidB.primaries = [tidA, tidB]); auto t = typeid(a); auto depth = tidB.primaries.length - 1; if (t.primary.length <= depth) { return null; } if (t.primary[depth] == tidB) { return a; } return null; } ``` This is starting to look good, now we have general downcast in a form that is not only really fast to perform, but also completely transparent to the optimizer. 4/ Downcast to interfaces. This is getting a bit more tricky, so I won't expand this one fully, but it is also possible to do much better on this front as well. The obvious complication that that interfaces allow for multiple inheritance. The first obvious optimization is to consider the chain of leftmost (or alternatively the longest chain, which allows to cull more candidates faster) inheritance the primaries for the interface and eliminate the whole branch at once using the same strategy as 3/. Now we are left with possible secondaries match. Here, no possibility to remain O(1), we'll have to loop over the other interfaces and repeat the matching process. Thankfully very branch hierarchy are uncommon in practice, but even then, we can use this one weird trick to dramatically cull out the search space: bloom filters. Make the bloom filter 64 bits and simply cull via `(targetBloom & aggregateBloom) == targetBloom` and you can eliminate a good chunk of the search right there. I hope this is the kind of change we can see in D in the future. This is the kind of changes that would make D better in practice, not the next cool feature, but that the current, existing feature, have an quality of implementation that is in line with what can be expected in a modern language.
Sep 28 2023
On 29/09/2023 1:42 AM, deadalnix wrote:1/ Downcast to final classes.```d auto foo(A a) { return typeid(a) is typeid(B) ? a : null; } ```TypeInfo isn't guaranteed to be unique to a process when dealing with shared libraries. There is a strong chance it'll be duplicative. So this won't work.2/ Inline ClassInfo with the vtbl.ClassInfo is an alias to TypeInfo_Class now. This indirection shouldn't exist. https://github.com/dlang/dmd/blob/a6285c26c0e04b9b3b1d40ae1c3200c834df1480/druntime/src/object.d#L1737
Sep 28 2023
On Thursday, 28 September 2023 at 13:05:01 UTC, Richard (Rikki) Andrew Cattermole wrote:TypeInfo isn't guaranteed to be unique to a process when dealing with shared libraries. There is a strong chance it'll be duplicative.This should be fixed, but even if it isn't, that just means using the == operator instead of the is operator; the concept i believe is still sound. (the == operator sux tho, this really should be fixed at the source)ClassInfo is an alias to TypeInfo_Class now. This indirection shouldn't exist.this means emplacing the typeinfo_class in the same memory block as the vtbl instead of linking to it. so the virtual table exists at *(instance + __traits(classInstanceSize, TypeInfo_Class))
Sep 28 2023
On Thursday, 28 September 2023 at 13:05:01 UTC, Richard (Rikki) Andrew Cattermole wrote:On 29/09/2023 1:42 AM, deadalnix wrote:lol, wat? That sounds crazy.1/ Downcast to final classes.```d auto foo(A a) { return typeid(a) is typeid(B) ? a : null; } ```TypeInfo isn't guaranteed to be unique to a process when dealing with shared libraries. There is a strong chance it'll be duplicative. So this won't work.It does exist.2/ Inline ClassInfo with the vtbl.ClassInfo is an alias to TypeInfo_Class now. This indirection shouldn't exist. https://github.com/dlang/dmd/blob/a6285c26c0e04b9b3b1d40ae1c3200c834df1480/druntime/src/object.d#L1737
Sep 28 2023
On Thursday, 28 September 2023 at 14:31:33 UTC, deadalnix wrote:I think it's the same in C++, and sometimes dynamic_cast ends up making string comparisons.TypeInfo isn't guaranteed to be unique to a process when dealing with shared libraries. There is a strong chance it'll be duplicative. So this won't work.lol, wat? That sounds crazy.
Sep 28 2023
On Thursday, 28 September 2023 at 14:41:16 UTC, Guillaume Piolat wrote:On Thursday, 28 September 2023 at 14:31:33 UTC, deadalnix wrote:What's the reason for this madness?I think it's the same in C++, and sometimes dynamic_cast ends up making string comparisons.TypeInfo isn't guaranteed to be unique to a process when dealing with shared libraries. There is a strong chance it'll be duplicative. So this won't work.lol, wat? That sounds crazy.
Sep 28 2023
On Thursday, 28 September 2023 at 15:39:22 UTC, deadalnix wrote:For example for GCC => https://github.com/gcc-mirror/gcc/blob/41d6b10e96a1de98e90a7c0378437c3255814b16/libstdc%2B%2B-v3/libsupc%2B%2B/typeinfo#L48 MSVCR => https://github.com/ojdkbuild/tools_toolchain_vs2017bt_1416/blob/master/VC/Tools/MSVC/14.16.27023/crt/src/vcruntime/rtti.cpp#L27 I don't understand any of it, just read "strcmp" in a profiler once in a big C++ app. It turns out it was RTTI which was taking a small percentage of time.I think it's the same in C++, and sometimes dynamic_cast ends up making string comparisons.What's the reason for this madness?
Sep 29 2023
On Friday, 29 September 2023 at 12:15:53 UTC, Guillaume Piolat wrote:On Thursday, 28 September 2023 at 15:39:22 UTC, deadalnix wrote:At least the GCC one has some explaination. What stroke me is:For example for GCC => https://github.com/gcc-mirror/gcc/blob/41d6b10e96a1de98e90a7c0378437c3255814b16/libstdc%2B%2B-v3/libsupc%2B%2B/typeinfo#L48 MSVCR => https://github.com/ojdkbuild/tools_toolchain_vs2017bt_1416/blob/master/VC/Tools/MSVC/14.16.27023/crt/src/vcruntime/rtti.cpp#L27 I don't understand any of it, just read "strcmp" in a profiler once in a big C++ app. It turns out it was RTTI which was taking a small percentage of time.I think it's the same in C++, and sometimes dynamic_cast ends up making string comparisons.What's the reason for this madness?but even with weak symbols sometimes names are not merged when objects are loaded with RTLD_LOCALWhich seems to indicate the duplication is not a fatality, but they have to handle it because they might be working with stuff that are loaded with RTLD_LOCAL? What does this mean? is this really a constraint that we have?
Sep 29 2023
On Friday, 29 September 2023 at 12:33:13 UTC, deadalnix wrote:Which seems to indicate the duplication is not a fatality, but they have to handle it because they might be working with stuff that are loaded with RTLD_LOCAL? What does this mean? is this really a constraint that we have?I'm way underqualified. Using RTLD_NOW (and not sure what it does), it seems to work on all platforms, but I certainly do not expect RTTI to work across binaries. C ABI provides minimum headache. Hipreme also do shared libs, and they went the minimal druntime way. We want less of druntime and Phobos, not more of it.
Sep 29 2023
https://linux.die.net/man/3/dlopen RTLD_NOW If this value is specified, or the environment variable LD_BIND_NOW is set to a nonempty string, all undefined symbols in the library are resolved before dlopen() returns. If this cannot be done, an error is returned. That is fine, this is kinda what you should usually use to prevent errors happening elsewhere (which won't be easily understood). RTLD_GLOBAL The symbols defined by this library will be made available for symbol resolution of subsequently loaded libraries. RTLD_LOCAL This is the converse of RTLD_GLOBAL, and the default if neither flag is specified. Symbols defined in this library are not made available to resolve references in subsequently loaded libraries. RTLD_LOCAL basically isolates all exports from one shared library once loaded from being matched to other shared library/executable externs. This is probably what you want when loading plugins.
Sep 29 2023
On 9/28/2023 7:31 AM, deadalnix wrote:It does exist.You're right. The ClassInfo can be merged with the vtbl[]. The trouble is it's a variable size, so that would have to be redone, too.
Jan 11 2024
On Thursday, 28 September 2023 at 13:05:01 UTC, Richard (Rikki) Andrew Cattermole wrote:TypeInfo isn't guaranteed to be unique to a process when dealing with shared libraries. There is a strong chance it'll be duplicative.We should solve this problem by guaranteeing TypeInfo to be unique to a process. I don't think that harms anything.
Sep 28 2023
I said this on Discord, but that means all TypeInfo has to be exported, even for executables.
Sep 28 2023
On Thursday, 28 September 2023 at 19:25:15 UTC, Richard (Rikki) Andrew Cattermole wrote:I said this on Discord, but that means all TypeInfo has to be exported, even for executables.Yep, and as I said, I don't think that harms anything. Note that you can still remove TypeInfo that isn't referenced in an executable/library; the only TypeInfo that needs to be exported are those that you actually look at. And those need to be in the binary anyway. So I really don't see where the problem is here.
Sep 29 2023
On 29/09/2023 8:43 PM, FeepingCreature wrote:Note that you can still remove TypeInfo that isn't referenced in an executable/library; the only TypeInfo that needs to be exported are those that you actually look at.The linker can't do that. So you're relying entirely on the compiler for this.
Sep 29 2023
On Thursday, 28 September 2023 at 19:25:15 UTC, Richard (Rikki) Andrew Cattermole wrote:I said this on Discord, but that means all TypeInfo has to be exported, even for executables.There might be an issue when your lib uses two other libs, and both of them use lib X but of different versions. This might cause issues with exported structures when they differ from version to version, and etc. Could this also be an issue for exported TypeInfos?
Sep 29 2023
On 29/09/2023 9:04 PM, Alexandru Ermicioi wrote:On Thursday, 28 September 2023 at 19:25:15 UTC, Richard (Rikki) Andrew Cattermole wrote:It shouldn't be. As long as everything is the same codegen wise (except pointer values), then two TypeInfo's should compare identically. But you almost got to the concern, not when there are two different versions of a package, but when they are the same version, but linked into different binaries and not exported.I said this on Discord, but that means all TypeInfo has to be exported, even for executables.There might be an issue when your lib uses two other libs, and both of them use lib X but of different versions. This might cause issues with exported structures when they differ from version to version, and etc. Could this also be an issue for exported TypeInfos?
Sep 29 2023
On Thursday, 28 September 2023 at 12:42:04 UTC, deadalnix wrote:D having more feature than most language, adding feature is clearly not where the impact is.I agree, however let us not get side tracked Asking for pattern matching? it's not new feature, it's improving ``switch`` Asking for tagged union? it's not new feature, it's improving ``enum`` However, you are wrong at expecting an impact for D by improving OOP, nope, D needs to improve the non-OOP story, more OOP is repulsive, specially for system languages
Sep 30 2023
On Sunday, 1 October 2023 at 00:21:48 UTC, ryuukk_ wrote:However, you are wrong at expecting an impact for D by improving OOP, nope, D needs to improve the non-OOP story, more OOP is repulsive, specially for system languagesThats one extremely biased view on D use. OOP is useful for application development (irrelevant of your feelings), and D is also used for app development.
Oct 01 2023
On Sunday, 1 October 2023 at 13:07:10 UTC, Alexandru Ermicioi wrote:On Sunday, 1 October 2023 at 00:21:48 UTC, ryuukk_ wrote:Indeed, if the basics such as OOP are not implemented right, pilling up more and more on top of it is not going to help. Solid foundations matter. OOP is just a basic feature of modern programing languages and widely used in the industry.However, you are wrong at expecting an impact for D by improving OOP, nope, D needs to improve the non-OOP story, more OOP is repulsive, specially for system languagesThats one extremely biased view on D use. OOP is useful for application development (irrelevant of your feelings), and D is also used for app development.
Oct 01 2023
On Sunday, 1 October 2023 at 22:01:56 UTC, deadalnix wrote:On Sunday, 1 October 2023 at 13:07:10 UTC, Alexandru Ermicioi wrote:Also worth adding that there is a difference between the midwit bullshit enterprise OOP, the modern synthesis OOP one might read in a book, and merely classes. Improvements to the latter should be a free win, other than ABI breakage. One of the reasons why I think we need to massively reduce the surface of the D toolchain is that if we don't we massively increase the activation energy to actually implement these changes — hand waving using Arrhenius/Boltzmann, T ~ O(\exp(E_a)), doesn't scale well.On Sunday, 1 October 2023 at 00:21:48 UTC, ryuukk_ wrote:Indeed, if the basics such as OOP are not implemented right, pilling up more and more on top of it is not going to help. Solid foundations matter. OOP is just a basic feature of modern programing languages and widely used in the industry.However, you are wrong at expecting an impact for D by improving OOP, nope, D needs to improve the non-OOP story, more OOP is repulsive, specially for system languagesThats one extremely biased view on D use. OOP is useful for application development (irrelevant of your feelings), and D is also used for app development.
Oct 01 2023
On Saturday, September 30, 2023 6:21:48 PM MDT ryuukk_ via Digitalmars-d wrote:On Thursday, 28 September 2023 at 12:42:04 UTC, deadalnix wrote:If they're not what the language currently does, it's still a new feature. They may be features that you want, and they may even be much easier to add than many other features, but they're still not currently part of the language. The main point here is that there are a number of features that we already have which are not implemented as well as they could be, which sometimes results in bugs, and sometimes results in performance issues. Ideally, we would be making sure that what we already have works as advertised and performs well, and in general, issues with language features that we already have are far more likely to cause problems than the fact that we're missing a particular feature that someone might want. New features can definitely improve the language, but we also need to make sure that what we already have is solid. And arguably, the focus is often too much on adding a feature that some particular subset of the community thinks is a "must-have" and not enough on improving what the language already does. And in large part, it's backlash over issues caused by not focusing enough on stability and improving what we already have which has led to the D foundation deciding to temporarily freeze the DIP process and focus more on bug fixing first. We're still going to be getting new features, but when we can identify issues like the one that Deadalnix has brought up here, we really should be trying to fix them. And while OOP is not as large a focus in D as it is in many other programming languages, it's still a paradigm that D supports and many programs use. So, if we can speed it up, that's definitely a win for the language. And since the runtime uses classes, it could actually speed up D programs that generally avoid OOP. And since the compiler uses classes, it might even speed up compilation times, which is a win for everyone. - Jonathan M DavisD having more feature than most language, adding feature is clearly not where the impact is.I agree, however let us not get side tracked Asking for pattern matching? it's not new feature, it's improving ``switch`` Asking for tagged union? it's not new feature, it's improving ``enum`` However, you are wrong at expecting an impact for D by improving OOP, nope, D needs to improve the non-OOP story, more OOP is repulsive, specially for system languages
Oct 01 2023
On Thursday, 28 September 2023 at 12:42:04 UTC, deadalnix wrote:Hi all, [...]Here we go: https://github.com/snazzy-d/sdc/blob/master/test/llvm/downcast.d Except the interface part.
Nov 13 2023
https://issues.dlang.org/show_bug.cgi?id=24332 https://issues.dlang.org/show_bug.cgi?id=24333 https://issues.dlang.org/show_bug.cgi?id=24335 https://issues.dlang.org/show_bug.cgi?id=24336
Jan 11 2024
On 9/28/2023 5:42 AM, deadalnix wrote:1/ Downcast to final classes.This has a PR for it now. https://github.com/dlang/dmd/pull/16032
Jan 14 2024
On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright wrote:On 9/28/2023 5:42 AM, deadalnix wrote:Thanks, that's a step in the right direction.1/ Downcast to final classes.This has a PR for it now. https://github.com/dlang/dmd/pull/16032
Jan 18 2024
On 1/18/2024 8:20 PM, deadalnix wrote:Thanks, that's a step in the right direction.You're welcome. I'd still like a list of the other problems you alluded to.
Jan 19 2024
On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright wrote:On 9/28/2023 5:42 AM, deadalnix wrote:Why md5 and not a faster method? Andrea1/ Downcast to final classes.This has a PR for it now. https://github.com/dlang/dmd/pull/16032
Jan 22 2024
On Monday, 22 January 2024 at 10:08:16 UTC, Andrea Fontana wrote:On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright wrote:That is good opportunity to submit a PR, try it, it's not that hard and is rewardingOn 9/28/2023 5:42 AM, deadalnix wrote:Why md5 and not a faster method? Andrea1/ Downcast to final classes.This has a PR for it now. https://github.com/dlang/dmd/pull/16032
Jan 22 2024
On Monday, 22 January 2024 at 10:23:55 UTC, ryuukk_ wrote:On Monday, 22 January 2024 at 10:08:16 UTC, Andrea Fontana wrote:I know: I've already submitted PR to dmd and phobos :) For example, this one. Easy, fast, insecure (who cares, in this case). http://www.isthe.com/chongo/tech/comp/fnv/ AndreaOn Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright wrote:That is good opportunity to submit a PR, try it, it's not that hard and is rewardingOn 9/28/2023 5:42 AM, deadalnix wrote:Why md5 and not a faster method? Andrea1/ Downcast to final classes.This has a PR for it now. https://github.com/dlang/dmd/pull/16032
Jan 22 2024
On Monday, 22 January 2024 at 13:47:34 UTC, Andrea Fontana wrote:On Monday, 22 January 2024 at 10:23:55 UTC, ryuukk_ wrote:The right one is FNV-1a: http://www.isthe.com/chongo/src/fnv/hash_32a.cOn Monday, 22 January 2024 at 10:08:16 UTC, Andrea Fontana wrote:I know: I've already submitted PR to dmd and phobos :) For example, this one. Easy, fast, insecure (who cares, in this case). http://www.isthe.com/chongo/tech/comp/fnv/ AndreaOn Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright wrote:That is good opportunity to submit a PR, try it, it's not that hard and is rewardingOn 9/28/2023 5:42 AM, deadalnix wrote:Why md5 and not a faster method? Andrea1/ Downcast to final classes.This has a PR for it now. https://github.com/dlang/dmd/pull/16032
Jan 22 2024
On Monday, 22 January 2024 at 13:48:20 UTC, Andrea Fontana wrote:The right one is FNV-1a: http://www.isthe.com/chongo/src/fnv/hash_32a.cProbably xxhash is even faster, but there's no public domain implementation.
Jan 22 2024
On Monday, 22 January 2024 at 13:48:20 UTC, Andrea Fontana wrote:On Monday, 22 January 2024 at 13:47:34 UTC, Andrea Fontana wrote:FNV1A is used for `hashOf`. HashOf seems to be a lot faster than md5 only when you're dealing with smaller strings. I've done a test online and you'll see, this won't do actual difference in compilation time: ```d import std; auto test(T)(scope T delegate() dg) { import std.datetime.stopwatch; StopWatch sw = StopWatch(AutoStart.yes); scope(exit){writeln(sw.peek.total!"msecs");} return dg(); } string repeat(string a, int n) { char[] ret = new char[](a.length*n); foreach(i; 0..n) ret[i*a.length..(i+1)*a.length] = a[]; return cast(string)ret; } void main() { import std.digest.md; foreach(count; [1, 5, 10, 15, 25, 35, 50, 100, 200]) { string testString = "a".repeat(count); writeln("\nResult for 1_000_000 times executing hashes for a string if size ", count); test(() { foreach(i; 0..1_000_000) hashOf(testString); }); test(() { foreach(i; 0..1_000_000) md5Of(testString); }); } } ``` This is the result I have for this program: ``` Result for 1_000_000 times executing hashes for a string if size 1 12 130 Result for 1_000_000 times executing hashes for a string if size 5 25 132 Result for 1_000_000 times executing hashes for a string if size 10 36 129 Result for 1_000_000 times executing hashes for a string if size 15 41 124 Result for 1_000_000 times executing hashes for a string if size 25 54 127 Result for 1_000_000 times executing hashes for a string if size 35 69 120 Result for 1_000_000 times executing hashes for a string if size 50 94 116 Result for 1_000_000 times executing hashes for a string if size 100 176 217 Result for 1_000_000 times executing hashes for a string if size 200 379 394 ``` As you see, this is a result for 1 million times hashing strings. I can't even think in many situations people would be doing 1 million time hashings. For a big program, you'll probably get 200 string literals. And if you test, it is still 0 milliseconds. So, MD5 is a pretty solution for that problem, and this would be too much effort for actually no gain at all, it could even create duplicate strings for no millisecond gain :)On Monday, 22 January 2024 at 10:23:55 UTC, ryuukk_ wrote:The right one is FNV-1a: http://www.isthe.com/chongo/src/fnv/hash_32a.cOn Monday, 22 January 2024 at 10:08:16 UTC, Andrea Fontana wrote:I know: I've already submitted PR to dmd and phobos :) For example, this one. Easy, fast, insecure (who cares, in this case). http://www.isthe.com/chongo/tech/comp/fnv/ AndreaOn Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright wrote:That is good opportunity to submit a PR, try it, it's not that hard and is rewardingOn 9/28/2023 5:42 AM, deadalnix wrote:Why md5 and not a faster method? Andrea1/ Downcast to final classes.This has a PR for it now. https://github.com/dlang/dmd/pull/16032
Jan 22 2024
On Monday, 22 January 2024 at 14:22:33 UTC, Hipreme wrote:On Monday, 22 January 2024 at 13:48:20 UTC, Andrea Fontana wrote:It hashes class name, they'll probably never be more than 30 characters, so md5 in your example would be 2x slower I think it worth doing more benchmarks, even few milliseconds gained is worthwhile, that's milliseconds you could spend doing more template work for example, it's additiveOn Monday, 22 January 2024 at 13:47:34 UTC, Andrea Fontana wrote:FNV1A is used for `hashOf`. HashOf seems to be a lot faster than md5 only when you're dealing with smaller strings. I've done a test online and you'll see, this won't do actual difference in compilation time: ```d import std; auto test(T)(scope T delegate() dg) { import std.datetime.stopwatch; StopWatch sw = StopWatch(AutoStart.yes); scope(exit){writeln(sw.peek.total!"msecs");} return dg(); } string repeat(string a, int n) { char[] ret = new char[](a.length*n); foreach(i; 0..n) ret[i*a.length..(i+1)*a.length] = a[]; return cast(string)ret; } void main() { import std.digest.md; foreach(count; [1, 5, 10, 15, 25, 35, 50, 100, 200]) { string testString = "a".repeat(count); writeln("\nResult for 1_000_000 times executing hashes for a string if size ", count); test(() { foreach(i; 0..1_000_000) hashOf(testString); }); test(() { foreach(i; 0..1_000_000) md5Of(testString); }); } } ``` This is the result I have for this program: ``` Result for 1_000_000 times executing hashes for a string if size 1 12 130 Result for 1_000_000 times executing hashes for a string if size 5 25 132 Result for 1_000_000 times executing hashes for a string if size 10 36 129 Result for 1_000_000 times executing hashes for a string if size 15 41 124 Result for 1_000_000 times executing hashes for a string if size 25 54 127 Result for 1_000_000 times executing hashes for a string if size 35 69 120 Result for 1_000_000 times executing hashes for a string if size 50 94 116 Result for 1_000_000 times executing hashes for a string if size 100 176 217 Result for 1_000_000 times executing hashes for a string if size 200 379 394 ``` As you see, this is a result for 1 million times hashing strings. I can't even think in many situations people would be doing 1 million time hashings. For a big program, you'll probably get 200 string literals. And if you test, it is still 0 milliseconds. So, MD5 is a pretty solution for that problem, and this would be too much effort for actually no gain at all, it could even create duplicate strings for no millisecond gain :)On Monday, 22 January 2024 at 10:23:55 UTC, ryuukk_ wrote:The right one is FNV-1a: http://www.isthe.com/chongo/src/fnv/hash_32a.cOn Monday, 22 January 2024 at 10:08:16 UTC, Andrea Fontana wrote:I know: I've already submitted PR to dmd and phobos :) For example, this one. Easy, fast, insecure (who cares, in this case). http://www.isthe.com/chongo/tech/comp/fnv/ AndreaOn Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright wrote:That is good opportunity to submit a PR, try it, it's not that hard and is rewardingOn 9/28/2023 5:42 AM, deadalnix wrote:Why md5 and not a faster method? Andrea1/ Downcast to final classes.This has a PR for it now. https://github.com/dlang/dmd/pull/16032
Jan 22 2024
On Monday, 22 January 2024 at 15:46:02 UTC, ryuukk_ wrote:Indeed. And probably xxhash would be even faster if well optimized, I guess. AndreaAs you see, this is a result for 1 million times hashing strings. I can't even think in many situations people would be doing 1 million time hashings. For a big program, you'll probably get 200 string literals. And if you test, it is still 0 milliseconds. So, MD5 is a pretty solution for that problem, and this would be too much effort for actually no gain at all, it could even create duplicate strings for no millisecond gain :)It hashes class name, they'll probably never be more than 30 characters, so md5 in your example would be 2x slower I think it worth doing more benchmarks, even few milliseconds gained is worthwhile, that's milliseconds you could spend doing more template work for example, it's additive
Jan 22 2024
On Monday, 22 January 2024 at 10:08:16 UTC, Andrea Fontana wrote:Why md5 and not a faster method?What algorithm would you suggest?
Jan 22 2024
On 1/22/2024 2:08 AM, Andrea Fontana wrote:Why md5 and not a faster method?Md5 has an extremely remote chance of two class names hashing to the same value. Some people have argued this is unacceptable, though I opine that the odds are so low they are unimaginable to humans. A replacement that has a perceptible collision rate will be unacceptable. This is not a hash backed up by a string compare. The hash has to be a substitute for the string compare.
Jan 22 2024
On 23/01/2024 9:46 AM, Walter Bright wrote:On 1/22/2024 2:08 AM, Andrea Fontana wrote:Quite a few months ago I discussed this problem for class comparison and proposed the hash to deadalnix. But not as a substitute for the string comparison, but as a limiter for how often it needs to be done. The string comparison is the only way to do the comparison reliably. You can't use the pointers nor do the string comparison all the time, too expensive. So by doing the hash comparison of say an int, we can reliably determine if it not a match and fail fast. Sadly no way to success fast without potential problems, which is what you seem to be hand waving the possible problems for so it'll be fast success.Why md5 and not a faster method?Md5 has an extremely remote chance of two class names hashing to the same value. Some people have argued this is unacceptable, though I opine that the odds are so low they are unimaginable to humans. A replacement that has a perceptible collision rate will be unacceptable. This is not a hash backed up by a string compare. The hash has to be a substitute for the string compare.
Jan 22 2024
On 1/22/2024 12:57 PM, Richard (Rikki) Andrew Cattermole wrote:The string comparison is the only way to do the comparison reliably.https://stackoverflow.com/questions/201705/how-many-random-elements-before-md5-produces-collisions One collision in 6 billion hashes per second for 100 years.
Jan 22 2024
On 23/01/2024 11:41 AM, Walter Bright wrote:On 1/22/2024 12:57 PM, Richard (Rikki) Andrew Cattermole wrote:If we were gambling on something not happening, those are great odds. However if this does occur, silent memory corruption is possible. I don't want to be gambling on whether there will be silent memory corruption. This is my perspective on the topic anyway. I love guarantees of program security.The string comparison is the only way to do the comparison reliably.https://stackoverflow.com/questions/201705/how-many-random-elements-before-md5-produces-collisions One collision in 6 billion hashes per second for 100 years.
Jan 22 2024
On 1/22/2024 2:49 PM, Richard (Rikki) Andrew Cattermole wrote:I do understand the desire for mathematical perfection, and feel it myself. But it has a significant and always present performance penalty. Engineering is always about tradeoffs. There's no getting away from it. I'm not concerned about a meteor hitting my house, either, which is far far more likely. BTW, D uses md5 hashes to enable the linker to merge redundant string literals. Other compromises we make for performance include things like floating point rounding.One collision in 6 billion hashes per second for 100 years.If we were gambling on something not happening, those are great odds. However if this does occur, silent memory corruption is possible. I don't want to be gambling on whether there will be silent memory corruption. This is my perspective on the topic anyway. I love guarantees of program security.
Jan 22 2024
On Monday, 22 January 2024 at 20:46:26 UTC, Walter Bright wrote:On 1/22/2024 2:08 AM, Andrea Fontana wrote:Blake3 might be worth a look. It's reportedly faster and stronger than md5. https://github.com/BLAKE3-team/BLAKE3Why md5 and not a faster method?Md5 has an extremely remote chance of two class names hashing to the same value. Some people have argued this is unacceptable, though I opine that the odds are so low they are unimaginable to humans. A replacement that has a perceptible collision rate will be unacceptable. This is not a hash backed up by a string compare. The hash has to be a substitute for the string compare.
Jan 22 2024
On 1/22/2024 1:11 PM, Bruce Carneal wrote:Blake3 might be worth a look. It's reportedly faster and stronger than md5. https://github.com/BLAKE3-team/BLAKE3Thanks for the tip. I don't know how to evaluate a hash function for uniqueness.
Jan 22 2024
On Monday, 22 January 2024 at 22:44:20 UTC, Walter Bright wrote:On 1/22/2024 1:11 PM, Bruce Carneal wrote:Blake is > 128 bits, so I don't think there is anything really interesting there anyways. md5 isn't cryptographically secure, mostly because 128 bits isn't cryptographically secure. But let's be real here, a collision would 100% be purposeful, there are zero chances of one happening by mistake. I don't see what using a different hash function would buy us. I doubt the hash of class name would be anywhere close to the bottleneck in term of speed.Blake3 might be worth a look. It's reportedly faster and stronger than md5. https://github.com/BLAKE3-team/BLAKE3Thanks for the tip. I don't know how to evaluate a hash function for uniqueness.
Jan 22 2024
On 1/22/2024 3:31 PM, deadalnix wrote:I don't see what using a different hash function would buy us. I doubt the hash of class name would be anywhere close to the bottleneck in term of speed.The hash also happens at compile time, not runtime. The only thing I would prefer is a hash that's 64 bits instead of 128, so it's one less compare instruction.
Jan 22 2024
On Monday, 22 January 2024 at 23:31:16 UTC, deadalnix wrote:On Monday, 22 January 2024 at 22:44:20 UTC, Walter Bright wrote:It's probably possible to take just the first (or last) 128 bits of BLAKE3 and the result might have better properties than MD5 (considering that MD5 is broken). There are some answers related to "truncated hash" on the Internet about SHA-256, but any cryptographically secure hash is likely to be similar in this aspect: https://crypto.stackexchange.com/questions/161/should-i-use-the-first-or-last-bits-from-a-sha-256-hash/163#163On 1/22/2024 1:11 PM, Bruce Carneal wrote:Blake is > 128 bits, so I don't think there is anything really interesting there anyways.Blake3 might be worth a look. It's reportedly faster and stronger than md5. https://github.com/BLAKE3-team/BLAKE3Thanks for the tip. I don't know how to evaluate a hash function for uniqueness.
Jan 22 2024
On Monday, January 22, 2024 7:33:33 PM MST Siarhei Siamashka via Digitalmars-d wrote:On Monday, 22 January 2024 at 23:31:16 UTC, deadalnix wrote:Why on earth would we care if the hash is secure in this context? We obviously care about the likelihood of collisions, and if that's high enough (which does not seem to be the case for md5), then the hash is unsuitable for comparing class names, but why would crytographic security matter when comparing class names in the compiler? - Jonathan M DavisOn Monday, 22 January 2024 at 22:44:20 UTC, Walter Bright wrote:It's probably possible to take just the first (or last) 128 bits of BLAKE3 and the result might have better properties than MD5 (considering that MD5 is broken). There are some answers related to "truncated hash" on the Internet about SHA-256, but any cryptographically secure hash is likely to be similar in this aspect: https://crypto.stackexchange.com/questions/161/should-i-use-the-first-or-lasOn 1/22/2024 1:11 PM, Bruce Carneal wrote:Blake is > 128 bits, so I don't think there is anything really interesting there anyways.Blake3 might be worth a look. It's reportedly faster and stronger than md5. https://github.com/BLAKE3-team/BLAKE3Thanks for the tip. I don't know how to evaluate a hash function for uniqueness.
Jan 22 2024
On Tuesday, 23 January 2024 at 03:34:23 UTC, Jonathan M Davis wrote:On Monday, January 22, 2024 7:33:33 PM MST Siarhei Siamashka via Digitalmars-d wrote:BLAKE3 was mentioned because it is allegedly faster than MD5 (this still needs to be confirmed). You can take as many bits of its output as you need and being too large is not a problem. Having high likelihood of collisions would have made it a bad hash from the security standpoint, so you don't need to worry about collisions for your use case either.On Monday, 22 January 2024 at 23:31:16 UTC, deadalnix wrote:Why on earth would we care if the hash is secure in this context? We obviously care about the likelihood of collisions, and if that's high enough (which does not seem to be the case for md5), then the hash is unsuitable for comparing class names, but why would crytographic security matter when comparing class names in the compiler?On Monday, 22 January 2024 at 22:44:20 UTC, Walter Bright wrote:It's probably possible to take just the first (or last) 128 bits of BLAKE3 and the result might have better properties than MD5 (considering that MD5 is broken). There are some answers related to "truncated hash" on the Internet about SHA-256, but any cryptographically secure hash is likely to be similar in this aspect: https://crypto.stackexchange.com/questions/161/should-i-use-the-first-or-lasOn 1/22/2024 1:11 PM, Bruce Carneal wrote:Blake is > 128 bits, so I don't think there is anything really interesting there anyways.Blake3 might be worth a look. It's reportedly faster and stronger than md5. https://github.com/BLAKE3-team/BLAKE3Thanks for the tip. I don't know how to evaluate a hash function for uniqueness.
Jan 22 2024
On Tuesday, 23 January 2024 at 03:34:23 UTC, Jonathan M Davis wrote:On Monday, January 22, 2024 7:33:33 PM MST Siarhei Siamashka via Digitalmars-d wrote:I don't think it matters at all except for what it implies about collisions. And, again, I don't think any modern hash is bad wrt collisions but I'm not a hash researcher. If it is true that any modern hash is in the "more likely to be smacked by a meteor than experience a collision" category *and* the current (md5) time burn is more than a blip then it's probably worth taking a look at the alternatives. OTOH, it's not like there is a shortage of things to work on... :-)On Monday, 22 January 2024 at 23:31:16 UTC, deadalnix wrote:Why on earth would we care if the hash is secure in this context? We obviously care about the likelihood of collisions, and if that's high enough (which does not seem to be the case for md5), then the hash is unsuitable for comparing class names, but why would crytographic security matter when comparing class names in the compiler? - Jonathan M DavisOn Monday, 22 January 2024 at 22:44:20 UTC, Walter Bright wrote:It's probably possible to take just the first (or last) 128 bits of BLAKE3 and the result might have better properties than MD5 (considering that MD5 is broken). There are some answers related to "truncated hash" on the Internet about SHA-256, but any cryptographically secure hash is likely to be similar in this aspect: https://crypto.stackexchange.com/questions/161/should-i-use-the-first-or-lasOn 1/22/2024 1:11 PM, Bruce Carneal wrote:Blake3 might be worth a look. It's reportedly faster and stronger than md5.
Jan 22 2024
On Monday, 22 January 2024 at 22:44:20 UTC, Walter Bright wrote:On 1/22/2024 1:11 PM, Bruce Carneal wrote:I've not looked into that aspect either. I'd just assumed that all modern hashes were quite good and was just looking at the time to compute the hash. I'm not at all concerned about accidental collisions apart from a broken implementation. As you say, more likely that we get smacked by a meteor. I've read hash comparisons claiming blake3 is anywhere from 3X to 9X faster than MD5 on a single core wrt throughput. No idea on how things would work out for this particular use case though, nor even if a 9X throughput upgrade would be noticeable. I'd suspect not, just wanted to put forward an alternative.Blake3 might be worth a look. It's reportedly faster and stronger than md5. https://github.com/BLAKE3-team/BLAKE3Thanks for the tip. I don't know how to evaluate a hash function for uniqueness.
Jan 22 2024
On Monday, 22 January 2024 at 20:46:26 UTC, Walter Bright wrote:On 1/22/2024 2:08 AM, Andrea Fontana wrote:So probably you like xxhash128 https://xxhash.com/ AndreaWhy md5 and not a faster method?Md5 has an extremely remote chance of two class names hashing to the same value. Some people have argued this is unacceptable, though I opine that the odds are so low they are unimaginable to humans. A replacement that has a perceptible collision rate will be unacceptable. This is not a hash backed up by a string compare. The hash has to be a substitute for the string compare.
Jan 23 2024
On 1/23/2024 3:49 AM, Andrea Fontana wrote:So probably you like xxhash128 https://xxhash.com/Thanks! I do, it has a 64 bit version, does anyone know what the collision rate is?
Jan 23 2024
On 1/23/24 2:37 PM, Walter Bright wrote:On 1/23/2024 3:49 AM, Andrea Fontana wrote:xxhash has a wiki page about it [1]. Given that it's their page, it's probably unsurprising that xxhash does well. [1]: https://github.com/Cyan4973/xxHash/wiki/Collision-ratio-comparison#collision-studySo probably you like xxhash128 https://xxhash.com/Thanks! I do, it has a 64 bit version, does anyone know what the collision rate is?
Jan 23 2024
On Tuesday, 23 January 2024 at 21:37:42 UTC, Walter Bright wrote:On 1/23/2024 3:49 AM, Andrea Fontana wrote:If you really need a 64-bit version, then you can simply take the first 64 bits of MD5. Assuming no defects in a 64-bit hash implementation, the collision probability is `1.0 / 2^^64` for a pair of hashed strings. And it becomes a https://en.wikipedia.org/wiki/Birthday_problem if the number of hashed strings is larger. There's no magic algorithmic sauce that can make a small hash collision resistant. A quote from the Wikipedia article:So probably you like xxhash128 https://xxhash.com/Thanks! I do, it has a 64 bit version, does anyone know what the collision rate is?As an example, if a 64-bit hash is used, there are approximately 1.8×10^^19 different outputs. If these are all equally probable (the best case), then it would take 'only' approximately 5 billion attempts (5.38×10^^9) to generate a collision using brute force.[6] This value is called birthday bound[7] and for n-bit codes it could be approximated as 2^^(n/2).I would say that any 64-bit hash is not sufficiently collision resistant for the problem that you are trying to solve.
Jan 23 2024
On Tuesday, 23 January 2024 at 22:15:11 UTC, Siarhei Siamashka wrote:On Tuesday, 23 January 2024 at 21:37:42 UTC, Walter Bright wrote:While it is not possible to do better than this, it is certainly possible - and in fact way easier than you'd expect - to do worse. I see no reason to move away from md5 here. It works, its speed is nowhere close to the bottleneck, and its pre-image resistance is of no concern for this use case. This thread has gone in the typical bullshit D goes into, arguing about the intricacies of some details that do not matter at all for the value delivered in the end.On 1/23/2024 3:49 AM, Andrea Fontana wrote:If you really need a 64-bit version, then you can simply take the first 64 bits of MD5. Assuming no defects in a 64-bit hash implementation, the collision probability is `1.0 / 2^^64` for a pair of hashed strings. And it becomes a https://en.wikipedia.org/wiki/Birthday_problem if the number of hashed strings is larger. There's no magic algorithmic sauce that can make a small hash collision resistant. A quote from the Wikipedia article:So probably you like xxhash128 https://xxhash.com/Thanks! I do, it has a 64 bit version, does anyone know what the collision rate is?
Jan 23 2024
On Wednesday, 24 January 2024 at 01:18:18 UTC, deadalnix wrote:On Tuesday, 23 January 2024 at 22:15:11 UTC, Siarhei Siamashka wrote:Go ahead and show me truncated output of any hash, designed to be cryptographically secure, behaving badly. I know that xxhash, ahash, fnv or any other non-secure hashes are prone to have collision issues, as they are primarily optimized for speed while being borderline acceptable from the collision standpoint.On Tuesday, 23 January 2024 at 21:37:42 UTC, Walter Bright wrote:While it is not possible to do better than this, it is certainly possible - and in fact way easier than you'd expect - to do worse.On 1/23/2024 3:49 AM, Andrea Fontana wrote:If you really need a 64-bit version, then you can simply take the first 64 bits of MD5. Assuming no defects in a 64-bit hash implementation, the collision probability is `1.0 / 2^^64` for a pair of hashed strings. And it becomes a https://en.wikipedia.org/wiki/Birthday_problem if the number of hashed strings is larger. There's no magic algorithmic sauce that can make a small hash collision resistant. A quote from the Wikipedia article:So probably you like xxhash128 https://xxhash.com/Thanks! I do, it has a 64 bit version, does anyone know what the collision rate is?
Jan 23 2024
On Wednesday, 24 January 2024 at 01:18:18 UTC, deadalnix wrote:This thread has gone in the typical bullshit D goes into, arguing about the intricacies of some details that do not matter at all for the value delivered in the end.Why try to demotivate people? this could be a good entry point for a new contributor, it's not "bullshit", what's bullshit is sitting here doing nothing but complain Submit PRs, encourage people to fix things, don't be negative, reading you feels like there is no point even participating, not everyone is a compiler developper, not everyone is able to "improve your OOP ABI", everyone is however able to benchmark and change the hashing function for a faster one If you want "Improve the OOP ABI", it better make the compiler remain fast, that detail, is not futile
Jan 24 2024
On Tuesday, 23 January 2024 at 21:37:42 UTC, Walter Bright wrote:On 1/23/2024 3:49 AM, Andrea Fontana wrote:Check this: https://fossies.org/linux/xxHash/tests/collisions/README.md All good hash algorithms should be uniform... AndreaSo probably you like xxhash128 https://xxhash.com/Thanks! I do, it has a 64 bit version, does anyone know what the collision rate is?
Jan 23 2024
On Tuesday, 23 January 2024 at 21:37:42 UTC, Walter Bright wrote:On 1/23/2024 3:49 AM, Andrea Fontana wrote:For 64 bits, your best best is CRC64. However, the risk of collision starts to be a bit too high to rely exclusively on this.So probably you like xxhash128 https://xxhash.com/Thanks! I do, it has a 64 bit version, does anyone know what the collision rate is?
Jan 23 2024
On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright wrote:On 9/28/2023 5:42 AM, deadalnix wrote:Just wondering, could a trie be used instead of hashing algorithm? It would guarantee that no collisions would happen at all, with better performance than simple comparison. Best regards, Alexandru.1/ Downcast to final classes.This has a PR for it now. https://github.com/dlang/dmd/pull/16032
Jan 22 2024
On 23/01/2024 11:48 AM, Alexandru Ermicioi wrote:On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright wrote:That sounds an lot like an assumption that TypeInfo is unique in a process. Which may not be true. If it was true, we could compare pointers and don't need any of this logic.On 9/28/2023 5:42 AM, deadalnix wrote:Just wondering, could a trie be used instead of hashing algorithm? It would guarantee that no collisions would happen at all, with better performance than simple comparison. Best regards, Alexandru.1/ Downcast to final classes.This has a PR for it now. https://github.com/dlang/dmd/pull/16032
Jan 22 2024
On 1/22/2024 2:50 PM, Richard (Rikki) Andrew Cattermole wrote:That sounds an lot like an assumption that TypeInfo is unique in a process. Which may not be true. If it was true, we could compare pointers and don't need any of this logic.It isn't true because of shared libraries.
Jan 22 2024
On 23/01/2024 12:52 PM, Walter Bright wrote:On 1/22/2024 2:50 PM, Richard (Rikki) Andrew Cattermole wrote:Or incremental compilation, the symbol could at least in theory be hidden in the object file and not be available for merging in the binary. Regardless it can't be assumed if you like your sanity!That sounds an lot like an assumption that TypeInfo is unique in a process. Which may not be true. If it was true, we could compare pointers and don't need any of this logic.It isn't true because of shared libraries.
Jan 22 2024