www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Improve the OOP ABI

reply deadalnix <deadalnix gmail.com> writes:
Hi all,

I have been arguing for quite some time about the fact that there 
is a well of improvement to make to D that are not necessarily 
breaking changes or languages addition - and that, in fact, D 
having more feature than most language, adding feature is clearly 
not where the impact is.

Today I would like to provide a concrete example of things that 
could be dramatically improved, with no breakage except at the 
ABI level - which we already do not guarantee. Let's go over 
these changes.

1/ Downcast to final classes.

```d
class A {}
final class B : A {}

auto foo(A a) {
     return cast(B) a;
}
```

This generates a call into the runtime, making things completely 
opaque to the optimizer. The algorithm used by the runtime is 
linear. But there is a dumb constant time solution. because B is 
final, a is either an instance of B, or it is not (as in, it 
cannot be an instance of a subclass of B). Therefore, we expect 
the codegen to look like the following instead:

```d
auto foo(A a) {
     return typeid(a) is typeid(B) ? a : null;
}
```

This obviously won't compile but you get the idea. Not only this 
is constant time, but very simple too, and visible to the 
optimizer, which means the check can be folded by the opitimizer 
after other transforms, for instance inlining.

2/ Inline ClassInfo with the vtbl.

The current layout of objects is as follow:

```
+-----------+    +-----------+    +-----------+
|  __vtbl   +--->|  typeid   +--->| ClassInfo |
+-----------+    +-----------+    |           |
| __monitor |    |  method0  |    |    ...    |
+-----------+    +-----------+    |           |
|  field0   |    |  method1  |    +-----------+
+-----------+    +-----------+
|  field1   |    |    ...    |
+-----------+    +-----------+
|    ...    |
+-----------+
```

This causes a ton of extra indirections that are not necessary. 
instead the following layout ought to be used:

```
+-----------+    +-----------+
|  __vtbl   +--->| ClassInfo |
+-----------+    |           |
| __monitor |    |    ...    |
+-----------+    |           |
|  field0   |    +-----------+
+-----------+    |  method0  |
|  field1   |    +-----------+
+-----------+    |  method1  |
|    ...    |    +-----------+
+-----------+    |    ...    |
                  +-----------+
```

Alternatively, it is possible to have the pointer point at the 
first method and subtract to get the typeid. The important part 
is that there is only one indirection now.

3/ Downcast.

Currently, we use a linear algorithm to downcast and we reach 
this algorithm via a call int he runtime. This prevent the 
optimizer from doing anything about it, in addition to be slow. 
The problem is similar to 1/. The whole check can be made trivial 
if we let the compiler prebake some data for us, namely an array 
of primary parents. Let's see what it looks like in the first 
example, when b is not final.

```d
class A {}
class B : A {}

auto foo(A a) {
     return cast(B) a;
}
```

Which we can do as follow:

```d
auto foo(A a) {
     // The primaries generated by the compiler for us.
     auto tidA = typeid(A);
     assert(tidA.primaries = [tidA]);
     auto tidB = typeid(B);
     assert(tidB.primaries = [tidA, tidB]);

     auto t = typeid(a);
     auto depth = tidB.primaries.length - 1;

     if (t.primary.length <= depth) {
         return null;
     }

     if (t.primary[depth] == tidB) {
         return a;
     }

     return null;
}
```

This is starting to look good, now we have general downcast in a 
form that is not only really fast to perform, but also completely 
transparent to the optimizer.

4/ Downcast to interfaces.

This is getting a bit more tricky, so I won't expand this one 
fully, but it is also possible to do much better on this front as 
well. The obvious complication that that interfaces allow for 
multiple inheritance.

The first obvious optimization is to consider the chain of 
leftmost (or alternatively the longest chain, which allows to 
cull more candidates faster) inheritance the primaries for the 
interface and eliminate the whole branch at once using the same 
strategy as 3/.

Now we are left with possible secondaries match. Here, no 
possibility to remain O(1), we'll have to loop over the other 
interfaces and repeat the matching process. Thankfully very 
branch hierarchy are uncommon in practice, but even then, we can 
use this one weird trick to dramatically cull out the search 
space: bloom filters.

Make the bloom filter 64 bits and simply cull via `(targetBloom & 
aggregateBloom) == targetBloom` and you can eliminate a good 
chunk of the search right there.

I hope this is the kind of change we can see in D in the future. 
This is the kind of changes that would make D better in practice, 
not the next cool feature, but that the current, existing 
feature, have an quality of implementation that is in line with 
what can be expected in a modern language.
Sep 28 2023
next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 29/09/2023 1:42 AM, deadalnix wrote:
 1/ Downcast to final classes.
 ```d
 auto foo(A a) {
      return typeid(a) is typeid(B) ? a : null;
 }
 ```
TypeInfo isn't guaranteed to be unique to a process when dealing with shared libraries. There is a strong chance it'll be duplicative. So this won't work.
 2/ Inline ClassInfo with the vtbl.
ClassInfo is an alias to TypeInfo_Class now. This indirection shouldn't exist. https://github.com/dlang/dmd/blob/a6285c26c0e04b9b3b1d40ae1c3200c834df1480/druntime/src/object.d#L1737
Sep 28 2023
next sibling parent Adam D Ruppe <destructionator gmail.com> writes:
On Thursday, 28 September 2023 at 13:05:01 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 TypeInfo isn't guaranteed to be unique to a process when 
 dealing with shared libraries. There is a strong chance it'll 
 be duplicative.
This should be fixed, but even if it isn't, that just means using the == operator instead of the is operator; the concept i believe is still sound. (the == operator sux tho, this really should be fixed at the source)
 ClassInfo is an alias to TypeInfo_Class now. This indirection 
 shouldn't exist.
this means emplacing the typeinfo_class in the same memory block as the vtbl instead of linking to it. so the virtual table exists at *(instance + __traits(classInstanceSize, TypeInfo_Class))
Sep 28 2023
prev sibling next sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Thursday, 28 September 2023 at 13:05:01 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 On 29/09/2023 1:42 AM, deadalnix wrote:
 1/ Downcast to final classes.
 ```d
 auto foo(A a) {
      return typeid(a) is typeid(B) ? a : null;
 }
 ```
TypeInfo isn't guaranteed to be unique to a process when dealing with shared libraries. There is a strong chance it'll be duplicative. So this won't work.
lol, wat? That sounds crazy.
 2/ Inline ClassInfo with the vtbl.
ClassInfo is an alias to TypeInfo_Class now. This indirection shouldn't exist. https://github.com/dlang/dmd/blob/a6285c26c0e04b9b3b1d40ae1c3200c834df1480/druntime/src/object.d#L1737
It does exist.
Sep 28 2023
next sibling parent reply Guillaume Piolat <first.name gmail.com> writes:
On Thursday, 28 September 2023 at 14:31:33 UTC, deadalnix wrote:
 TypeInfo isn't guaranteed to be unique to a process when 
 dealing with shared libraries. There is a strong chance it'll 
 be duplicative.

 So this won't work.
lol, wat? That sounds crazy.
I think it's the same in C++, and sometimes dynamic_cast ends up making string comparisons.
Sep 28 2023
parent reply deadalnix <deadalnix gmail.com> writes:
On Thursday, 28 September 2023 at 14:41:16 UTC, Guillaume Piolat 
wrote:
 On Thursday, 28 September 2023 at 14:31:33 UTC, deadalnix wrote:
 TypeInfo isn't guaranteed to be unique to a process when 
 dealing with shared libraries. There is a strong chance it'll 
 be duplicative.

 So this won't work.
lol, wat? That sounds crazy.
I think it's the same in C++, and sometimes dynamic_cast ends up making string comparisons.
What's the reason for this madness?
Sep 28 2023
parent reply Guillaume Piolat <first.name gmail.com> writes:
On Thursday, 28 September 2023 at 15:39:22 UTC, deadalnix wrote:
 I think it's the same in C++, and sometimes dynamic_cast ends 
 up making string comparisons.
What's the reason for this madness?
For example for GCC => https://github.com/gcc-mirror/gcc/blob/41d6b10e96a1de98e90a7c0378437c3255814b16/libstdc%2B%2B-v3/libsupc%2B%2B/typeinfo#L48 MSVCR => https://github.com/ojdkbuild/tools_toolchain_vs2017bt_1416/blob/master/VC/Tools/MSVC/14.16.27023/crt/src/vcruntime/rtti.cpp#L27 I don't understand any of it, just read "strcmp" in a profiler once in a big C++ app. It turns out it was RTTI which was taking a small percentage of time.
Sep 29 2023
parent reply deadalnix <deadalnix gmail.com> writes:
On Friday, 29 September 2023 at 12:15:53 UTC, Guillaume Piolat 
wrote:
 On Thursday, 28 September 2023 at 15:39:22 UTC, deadalnix wrote:
 I think it's the same in C++, and sometimes dynamic_cast ends 
 up making string comparisons.
What's the reason for this madness?
For example for GCC => https://github.com/gcc-mirror/gcc/blob/41d6b10e96a1de98e90a7c0378437c3255814b16/libstdc%2B%2B-v3/libsupc%2B%2B/typeinfo#L48 MSVCR => https://github.com/ojdkbuild/tools_toolchain_vs2017bt_1416/blob/master/VC/Tools/MSVC/14.16.27023/crt/src/vcruntime/rtti.cpp#L27 I don't understand any of it, just read "strcmp" in a profiler once in a big C++ app. It turns out it was RTTI which was taking a small percentage of time.
At least the GCC one has some explaination. What stroke me is:
 but even with weak symbols sometimes names are not merged
 when objects are loaded with RTLD_LOCAL
Which seems to indicate the duplication is not a fatality, but they have to handle it because they might be working with stuff that are loaded with RTLD_LOCAL? What does this mean? is this really a constraint that we have?
Sep 29 2023
parent reply Guillaume Piolat <first.name gmail.com> writes:
On Friday, 29 September 2023 at 12:33:13 UTC, deadalnix wrote:
 Which seems to indicate the duplication is not a fatality, but 
 they have to handle it because they might be working with stuff 
 that are loaded with RTLD_LOCAL? What does this mean? is this 
 really a constraint that we have?
I'm way underqualified. Using RTLD_NOW (and not sure what it does), it seems to work on all platforms, but I certainly do not expect RTTI to work across binaries. C ABI provides minimum headache. Hipreme also do shared libs, and they went the minimal druntime way. We want less of druntime and Phobos, not more of it.
Sep 29 2023
parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
https://linux.die.net/man/3/dlopen

RTLD_NOW
If this value is specified, or the environment variable LD_BIND_NOW is 
set to a nonempty string, all undefined symbols in the library are 
resolved before dlopen() returns. If this cannot be done, an error is 
returned.

That is fine, this is kinda what you should usually use to prevent 
errors happening elsewhere (which won't be easily understood).



RTLD_GLOBAL
The symbols defined by this library will be made available for symbol 
resolution of subsequently loaded libraries.

RTLD_LOCAL
This is the converse of RTLD_GLOBAL, and the default if neither flag is 
specified. Symbols defined in this library are not made available to 
resolve references in subsequently loaded libraries.

RTLD_LOCAL basically isolates all exports from one shared library once 
loaded from being matched to other shared library/executable externs. 
This is probably what you want when loading plugins.
Sep 29 2023
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 9/28/2023 7:31 AM, deadalnix wrote:
 It does exist.
You're right. The ClassInfo can be merged with the vtbl[]. The trouble is it's a variable size, so that would have to be redone, too.
Jan 11 2024
prev sibling parent reply FeepingCreature <feepingcreature gmail.com> writes:
On Thursday, 28 September 2023 at 13:05:01 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 TypeInfo isn't guaranteed to be unique to a process when 
 dealing with shared libraries. There is a strong chance it'll 
 be duplicative.
We should solve this problem by guaranteeing TypeInfo to be unique to a process. I don't think that harms anything.
Sep 28 2023
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
I said this on Discord, but that means all TypeInfo has to be exported, 
even for executables.
Sep 28 2023
next sibling parent reply FeepingCreature <feepingcreature gmail.com> writes:
On Thursday, 28 September 2023 at 19:25:15 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 I said this on Discord, but that means all TypeInfo has to be 
 exported, even for executables.
Yep, and as I said, I don't think that harms anything. Note that you can still remove TypeInfo that isn't referenced in an executable/library; the only TypeInfo that needs to be exported are those that you actually look at. And those need to be in the binary anyway. So I really don't see where the problem is here.
Sep 29 2023
parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 29/09/2023 8:43 PM, FeepingCreature wrote:
 Note that you can still remove TypeInfo that isn't referenced in an 
 executable/library; the only TypeInfo that needs to be exported are 
 those that you actually look at.
The linker can't do that. So you're relying entirely on the compiler for this.
Sep 29 2023
prev sibling parent reply Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Thursday, 28 September 2023 at 19:25:15 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 I said this on Discord, but that means all TypeInfo has to be 
 exported, even for executables.
There might be an issue when your lib uses two other libs, and both of them use lib X but of different versions. This might cause issues with exported structures when they differ from version to version, and etc. Could this also be an issue for exported TypeInfos?
Sep 29 2023
parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 29/09/2023 9:04 PM, Alexandru Ermicioi wrote:
 On Thursday, 28 September 2023 at 19:25:15 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 I said this on Discord, but that means all TypeInfo has to be 
 exported, even for executables.
There might be an issue when your lib uses two other libs, and both of them use lib X but of different versions. This might cause issues with exported structures when they differ from version to version, and etc. Could this also be an issue for exported TypeInfos?
It shouldn't be. As long as everything is the same codegen wise (except pointer values), then two TypeInfo's should compare identically. But you almost got to the concern, not when there are two different versions of a package, but when they are the same version, but linked into different binaries and not exported.
Sep 29 2023
prev sibling next sibling parent reply ryuukk_ <ryuukk.dev gmail.com> writes:
On Thursday, 28 September 2023 at 12:42:04 UTC, deadalnix wrote:
 D having more feature than most language, adding feature is 
 clearly not where the impact is.
I agree, however let us not get side tracked Asking for pattern matching? it's not new feature, it's improving ``switch`` Asking for tagged union? it's not new feature, it's improving ``enum`` However, you are wrong at expecting an impact for D by improving OOP, nope, D needs to improve the non-OOP story, more OOP is repulsive, specially for system languages
Sep 30 2023
next sibling parent reply Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Sunday, 1 October 2023 at 00:21:48 UTC, ryuukk_ wrote:
 However, you are wrong at expecting an impact for D by 
 improving OOP, nope, D needs to improve the non-OOP story, more 
 OOP is repulsive, specially for system languages
Thats one extremely biased view on D use. OOP is useful for application development (irrelevant of your feelings), and D is also used for app development.
Oct 01 2023
parent reply deadalnix <deadalnix gmail.com> writes:
On Sunday, 1 October 2023 at 13:07:10 UTC, Alexandru Ermicioi 
wrote:
 On Sunday, 1 October 2023 at 00:21:48 UTC, ryuukk_ wrote:
 However, you are wrong at expecting an impact for D by 
 improving OOP, nope, D needs to improve the non-OOP story, 
 more OOP is repulsive, specially for system languages
Thats one extremely biased view on D use. OOP is useful for application development (irrelevant of your feelings), and D is also used for app development.
Indeed, if the basics such as OOP are not implemented right, pilling up more and more on top of it is not going to help. Solid foundations matter. OOP is just a basic feature of modern programing languages and widely used in the industry.
Oct 01 2023
parent max haughton <maxhaton gmail.com> writes:
On Sunday, 1 October 2023 at 22:01:56 UTC, deadalnix wrote:
 On Sunday, 1 October 2023 at 13:07:10 UTC, Alexandru Ermicioi 
 wrote:
 On Sunday, 1 October 2023 at 00:21:48 UTC, ryuukk_ wrote:
 However, you are wrong at expecting an impact for D by 
 improving OOP, nope, D needs to improve the non-OOP story, 
 more OOP is repulsive, specially for system languages
Thats one extremely biased view on D use. OOP is useful for application development (irrelevant of your feelings), and D is also used for app development.
Indeed, if the basics such as OOP are not implemented right, pilling up more and more on top of it is not going to help. Solid foundations matter. OOP is just a basic feature of modern programing languages and widely used in the industry.
Also worth adding that there is a difference between the midwit bullshit enterprise OOP, the modern synthesis OOP one might read in a book, and merely classes. Improvements to the latter should be a free win, other than ABI breakage. One of the reasons why I think we need to massively reduce the surface of the D toolchain is that if we don't we massively increase the activation energy to actually implement these changes — hand waving using Arrhenius/Boltzmann, T ~ O(\exp(E_a)), doesn't scale well.
Oct 01 2023
prev sibling parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Saturday, September 30, 2023 6:21:48 PM MDT ryuukk_ via Digitalmars-d 
wrote:
 On Thursday, 28 September 2023 at 12:42:04 UTC, deadalnix wrote:
 D having more feature than most language, adding feature is
 clearly not where the impact is.
I agree, however let us not get side tracked Asking for pattern matching? it's not new feature, it's improving ``switch`` Asking for tagged union? it's not new feature, it's improving ``enum`` However, you are wrong at expecting an impact for D by improving OOP, nope, D needs to improve the non-OOP story, more OOP is repulsive, specially for system languages
If they're not what the language currently does, it's still a new feature. They may be features that you want, and they may even be much easier to add than many other features, but they're still not currently part of the language. The main point here is that there are a number of features that we already have which are not implemented as well as they could be, which sometimes results in bugs, and sometimes results in performance issues. Ideally, we would be making sure that what we already have works as advertised and performs well, and in general, issues with language features that we already have are far more likely to cause problems than the fact that we're missing a particular feature that someone might want. New features can definitely improve the language, but we also need to make sure that what we already have is solid. And arguably, the focus is often too much on adding a feature that some particular subset of the community thinks is a "must-have" and not enough on improving what the language already does. And in large part, it's backlash over issues caused by not focusing enough on stability and improving what we already have which has led to the D foundation deciding to temporarily freeze the DIP process and focus more on bug fixing first. We're still going to be getting new features, but when we can identify issues like the one that Deadalnix has brought up here, we really should be trying to fix them. And while OOP is not as large a focus in D as it is in many other programming languages, it's still a paradigm that D supports and many programs use. So, if we can speed it up, that's definitely a win for the language. And since the runtime uses classes, it could actually speed up D programs that generally avoid OOP. And since the compiler uses classes, it might even speed up compilation times, which is a win for everyone. - Jonathan M Davis
Oct 01 2023
prev sibling next sibling parent deadalnix <deadalnix gmail.com> writes:
On Thursday, 28 September 2023 at 12:42:04 UTC, deadalnix wrote:
 Hi all,

 [...]
Here we go: https://github.com/snazzy-d/sdc/blob/master/test/llvm/downcast.d Except the interface part.
Nov 13 2023
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
https://issues.dlang.org/show_bug.cgi?id=24332
https://issues.dlang.org/show_bug.cgi?id=24333
https://issues.dlang.org/show_bug.cgi?id=24335
https://issues.dlang.org/show_bug.cgi?id=24336
Jan 11 2024
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 9/28/2023 5:42 AM, deadalnix wrote:
 1/ Downcast to final classes.
This has a PR for it now. https://github.com/dlang/dmd/pull/16032
Jan 14 2024
next sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright wrote:
 On 9/28/2023 5:42 AM, deadalnix wrote:
 1/ Downcast to final classes.
This has a PR for it now. https://github.com/dlang/dmd/pull/16032
Thanks, that's a step in the right direction.
Jan 18 2024
parent Walter Bright <newshound2 digitalmars.com> writes:
On 1/18/2024 8:20 PM, deadalnix wrote:
 Thanks, that's a step in the right direction.
You're welcome. I'd still like a list of the other problems you alluded to.
Jan 19 2024
prev sibling next sibling parent reply Andrea Fontana <nospam example.org> writes:
On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright wrote:
 On 9/28/2023 5:42 AM, deadalnix wrote:
 1/ Downcast to final classes.
This has a PR for it now. https://github.com/dlang/dmd/pull/16032
Why md5 and not a faster method? Andrea
Jan 22 2024
next sibling parent reply ryuukk_ <ryuukk.dev gmail.com> writes:
On Monday, 22 January 2024 at 10:08:16 UTC, Andrea Fontana wrote:
 On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright wrote:
 On 9/28/2023 5:42 AM, deadalnix wrote:
 1/ Downcast to final classes.
This has a PR for it now. https://github.com/dlang/dmd/pull/16032
Why md5 and not a faster method? Andrea
That is good opportunity to submit a PR, try it, it's not that hard and is rewarding
Jan 22 2024
parent reply Andrea Fontana <nospam example.org> writes:
On Monday, 22 January 2024 at 10:23:55 UTC, ryuukk_ wrote:
 On Monday, 22 January 2024 at 10:08:16 UTC, Andrea Fontana 
 wrote:
 On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright 
 wrote:
 On 9/28/2023 5:42 AM, deadalnix wrote:
 1/ Downcast to final classes.
This has a PR for it now. https://github.com/dlang/dmd/pull/16032
Why md5 and not a faster method? Andrea
That is good opportunity to submit a PR, try it, it's not that hard and is rewarding
I know: I've already submitted PR to dmd and phobos :) For example, this one. Easy, fast, insecure (who cares, in this case). http://www.isthe.com/chongo/tech/comp/fnv/ Andrea
Jan 22 2024
parent reply Andrea Fontana <nospam example.org> writes:
On Monday, 22 January 2024 at 13:47:34 UTC, Andrea Fontana wrote:
 On Monday, 22 January 2024 at 10:23:55 UTC, ryuukk_ wrote:
 On Monday, 22 January 2024 at 10:08:16 UTC, Andrea Fontana 
 wrote:
 On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright 
 wrote:
 On 9/28/2023 5:42 AM, deadalnix wrote:
 1/ Downcast to final classes.
This has a PR for it now. https://github.com/dlang/dmd/pull/16032
Why md5 and not a faster method? Andrea
That is good opportunity to submit a PR, try it, it's not that hard and is rewarding
I know: I've already submitted PR to dmd and phobos :) For example, this one. Easy, fast, insecure (who cares, in this case). http://www.isthe.com/chongo/tech/comp/fnv/ Andrea
The right one is FNV-1a: http://www.isthe.com/chongo/src/fnv/hash_32a.c
Jan 22 2024
next sibling parent Andrea Fontana <nospam example.org> writes:
On Monday, 22 January 2024 at 13:48:20 UTC, Andrea Fontana wrote:
 The right one is FNV-1a:
 http://www.isthe.com/chongo/src/fnv/hash_32a.c
Probably xxhash is even faster, but there's no public domain implementation.
Jan 22 2024
prev sibling parent reply Hipreme <msnmancini hotmail.com> writes:
On Monday, 22 January 2024 at 13:48:20 UTC, Andrea Fontana wrote:
 On Monday, 22 January 2024 at 13:47:34 UTC, Andrea Fontana 
 wrote:
 On Monday, 22 January 2024 at 10:23:55 UTC, ryuukk_ wrote:
 On Monday, 22 January 2024 at 10:08:16 UTC, Andrea Fontana 
 wrote:
 On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright 
 wrote:
 On 9/28/2023 5:42 AM, deadalnix wrote:
 1/ Downcast to final classes.
This has a PR for it now. https://github.com/dlang/dmd/pull/16032
Why md5 and not a faster method? Andrea
That is good opportunity to submit a PR, try it, it's not that hard and is rewarding
I know: I've already submitted PR to dmd and phobos :) For example, this one. Easy, fast, insecure (who cares, in this case). http://www.isthe.com/chongo/tech/comp/fnv/ Andrea
The right one is FNV-1a: http://www.isthe.com/chongo/src/fnv/hash_32a.c
FNV1A is used for `hashOf`. HashOf seems to be a lot faster than md5 only when you're dealing with smaller strings. I've done a test online and you'll see, this won't do actual difference in compilation time: ```d import std; auto test(T)(scope T delegate() dg) { import std.datetime.stopwatch; StopWatch sw = StopWatch(AutoStart.yes); scope(exit){writeln(sw.peek.total!"msecs");} return dg(); } string repeat(string a, int n) { char[] ret = new char[](a.length*n); foreach(i; 0..n) ret[i*a.length..(i+1)*a.length] = a[]; return cast(string)ret; } void main() { import std.digest.md; foreach(count; [1, 5, 10, 15, 25, 35, 50, 100, 200]) { string testString = "a".repeat(count); writeln("\nResult for 1_000_000 times executing hashes for a string if size ", count); test(() { foreach(i; 0..1_000_000) hashOf(testString); }); test(() { foreach(i; 0..1_000_000) md5Of(testString); }); } } ``` This is the result I have for this program: ``` Result for 1_000_000 times executing hashes for a string if size 1 12 130 Result for 1_000_000 times executing hashes for a string if size 5 25 132 Result for 1_000_000 times executing hashes for a string if size 10 36 129 Result for 1_000_000 times executing hashes for a string if size 15 41 124 Result for 1_000_000 times executing hashes for a string if size 25 54 127 Result for 1_000_000 times executing hashes for a string if size 35 69 120 Result for 1_000_000 times executing hashes for a string if size 50 94 116 Result for 1_000_000 times executing hashes for a string if size 100 176 217 Result for 1_000_000 times executing hashes for a string if size 200 379 394 ``` As you see, this is a result for 1 million times hashing strings. I can't even think in many situations people would be doing 1 million time hashings. For a big program, you'll probably get 200 string literals. And if you test, it is still 0 milliseconds. So, MD5 is a pretty solution for that problem, and this would be too much effort for actually no gain at all, it could even create duplicate strings for no millisecond gain :)
Jan 22 2024
parent reply ryuukk_ <ryuukk.dev gmail.com> writes:
On Monday, 22 January 2024 at 14:22:33 UTC, Hipreme wrote:
 On Monday, 22 January 2024 at 13:48:20 UTC, Andrea Fontana 
 wrote:
 On Monday, 22 January 2024 at 13:47:34 UTC, Andrea Fontana 
 wrote:
 On Monday, 22 January 2024 at 10:23:55 UTC, ryuukk_ wrote:
 On Monday, 22 January 2024 at 10:08:16 UTC, Andrea Fontana 
 wrote:
 On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright 
 wrote:
 On 9/28/2023 5:42 AM, deadalnix wrote:
 1/ Downcast to final classes.
This has a PR for it now. https://github.com/dlang/dmd/pull/16032
Why md5 and not a faster method? Andrea
That is good opportunity to submit a PR, try it, it's not that hard and is rewarding
I know: I've already submitted PR to dmd and phobos :) For example, this one. Easy, fast, insecure (who cares, in this case). http://www.isthe.com/chongo/tech/comp/fnv/ Andrea
The right one is FNV-1a: http://www.isthe.com/chongo/src/fnv/hash_32a.c
FNV1A is used for `hashOf`. HashOf seems to be a lot faster than md5 only when you're dealing with smaller strings. I've done a test online and you'll see, this won't do actual difference in compilation time: ```d import std; auto test(T)(scope T delegate() dg) { import std.datetime.stopwatch; StopWatch sw = StopWatch(AutoStart.yes); scope(exit){writeln(sw.peek.total!"msecs");} return dg(); } string repeat(string a, int n) { char[] ret = new char[](a.length*n); foreach(i; 0..n) ret[i*a.length..(i+1)*a.length] = a[]; return cast(string)ret; } void main() { import std.digest.md; foreach(count; [1, 5, 10, 15, 25, 35, 50, 100, 200]) { string testString = "a".repeat(count); writeln("\nResult for 1_000_000 times executing hashes for a string if size ", count); test(() { foreach(i; 0..1_000_000) hashOf(testString); }); test(() { foreach(i; 0..1_000_000) md5Of(testString); }); } } ``` This is the result I have for this program: ``` Result for 1_000_000 times executing hashes for a string if size 1 12 130 Result for 1_000_000 times executing hashes for a string if size 5 25 132 Result for 1_000_000 times executing hashes for a string if size 10 36 129 Result for 1_000_000 times executing hashes for a string if size 15 41 124 Result for 1_000_000 times executing hashes for a string if size 25 54 127 Result for 1_000_000 times executing hashes for a string if size 35 69 120 Result for 1_000_000 times executing hashes for a string if size 50 94 116 Result for 1_000_000 times executing hashes for a string if size 100 176 217 Result for 1_000_000 times executing hashes for a string if size 200 379 394 ``` As you see, this is a result for 1 million times hashing strings. I can't even think in many situations people would be doing 1 million time hashings. For a big program, you'll probably get 200 string literals. And if you test, it is still 0 milliseconds. So, MD5 is a pretty solution for that problem, and this would be too much effort for actually no gain at all, it could even create duplicate strings for no millisecond gain :)
It hashes class name, they'll probably never be more than 30 characters, so md5 in your example would be 2x slower I think it worth doing more benchmarks, even few milliseconds gained is worthwhile, that's milliseconds you could spend doing more template work for example, it's additive
Jan 22 2024
parent Andrea Fontana <nospam example.org> writes:
On Monday, 22 January 2024 at 15:46:02 UTC, ryuukk_ wrote:
 As you see, this is a result for 1 million times hashing 
 strings. I can't even think in many situations people would be 
 doing 1 million time hashings. For a big program, you'll 
 probably get 200 string literals. And if you test, it is still 
 0 milliseconds. So, MD5 is a pretty solution for that problem, 
 and this would be too much effort for actually no gain at all, 
 it could even create duplicate strings for no millisecond gain 
 :)
It hashes class name, they'll probably never be more than 30 characters, so md5 in your example would be 2x slower I think it worth doing more benchmarks, even few milliseconds gained is worthwhile, that's milliseconds you could spend doing more template work for example, it's additive
Indeed. And probably xxhash would be even faster if well optimized, I guess. Andrea
Jan 22 2024
prev sibling next sibling parent Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Monday, 22 January 2024 at 10:08:16 UTC, Andrea Fontana wrote:
 Why md5 and not a faster method?
What algorithm would you suggest?
Jan 22 2024
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/22/2024 2:08 AM, Andrea Fontana wrote:
 Why md5 and not a faster method?
Md5 has an extremely remote chance of two class names hashing to the same value. Some people have argued this is unacceptable, though I opine that the odds are so low they are unimaginable to humans. A replacement that has a perceptible collision rate will be unacceptable. This is not a hash backed up by a string compare. The hash has to be a substitute for the string compare.
Jan 22 2024
next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 23/01/2024 9:46 AM, Walter Bright wrote:
 On 1/22/2024 2:08 AM, Andrea Fontana wrote:
 Why md5 and not a faster method?
Md5 has an extremely remote chance of two class names hashing to the same value. Some people have argued this is unacceptable, though I opine that the odds are so low they are unimaginable to humans. A replacement that has a perceptible collision rate will be unacceptable. This is not a hash backed up by a string compare. The hash has to be a substitute for the string compare.
Quite a few months ago I discussed this problem for class comparison and proposed the hash to deadalnix. But not as a substitute for the string comparison, but as a limiter for how often it needs to be done. The string comparison is the only way to do the comparison reliably. You can't use the pointers nor do the string comparison all the time, too expensive. So by doing the hash comparison of say an int, we can reliably determine if it not a match and fail fast. Sadly no way to success fast without potential problems, which is what you seem to be hand waving the possible problems for so it'll be fast success.
Jan 22 2024
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/22/2024 12:57 PM, Richard (Rikki) Andrew Cattermole wrote:
 The string comparison is the only way to do the comparison reliably.
https://stackoverflow.com/questions/201705/how-many-random-elements-before-md5-produces-collisions One collision in 6 billion hashes per second for 100 years.
Jan 22 2024
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 23/01/2024 11:41 AM, Walter Bright wrote:
 On 1/22/2024 12:57 PM, Richard (Rikki) Andrew Cattermole wrote:
 The string comparison is the only way to do the comparison reliably.
https://stackoverflow.com/questions/201705/how-many-random-elements-before-md5-produces-collisions One collision in 6 billion hashes per second for 100 years.
If we were gambling on something not happening, those are great odds. However if this does occur, silent memory corruption is possible. I don't want to be gambling on whether there will be silent memory corruption. This is my perspective on the topic anyway. I love guarantees of program security.
Jan 22 2024
parent Walter Bright <newshound2 digitalmars.com> writes:
On 1/22/2024 2:49 PM, Richard (Rikki) Andrew Cattermole wrote:
 One collision in 6 billion hashes per second for 100 years.
If we were gambling on something not happening, those are great odds. However if this does occur, silent memory corruption is possible. I don't want to be gambling on whether there will be silent memory corruption. This is my perspective on the topic anyway. I love guarantees of program security.
I do understand the desire for mathematical perfection, and feel it myself. But it has a significant and always present performance penalty. Engineering is always about tradeoffs. There's no getting away from it. I'm not concerned about a meteor hitting my house, either, which is far far more likely. BTW, D uses md5 hashes to enable the linker to merge redundant string literals. Other compromises we make for performance include things like floating point rounding.
Jan 22 2024
prev sibling next sibling parent reply Bruce Carneal <bcarneal gmail.com> writes:
On Monday, 22 January 2024 at 20:46:26 UTC, Walter Bright wrote:
 On 1/22/2024 2:08 AM, Andrea Fontana wrote:
 Why md5 and not a faster method?
Md5 has an extremely remote chance of two class names hashing to the same value. Some people have argued this is unacceptable, though I opine that the odds are so low they are unimaginable to humans. A replacement that has a perceptible collision rate will be unacceptable. This is not a hash backed up by a string compare. The hash has to be a substitute for the string compare.
Blake3 might be worth a look. It's reportedly faster and stronger than md5. https://github.com/BLAKE3-team/BLAKE3
Jan 22 2024
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/22/2024 1:11 PM, Bruce Carneal wrote:
 Blake3 might be worth a look.  It's reportedly faster and stronger than md5.
 
 https://github.com/BLAKE3-team/BLAKE3
Thanks for the tip. I don't know how to evaluate a hash function for uniqueness.
Jan 22 2024
next sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Monday, 22 January 2024 at 22:44:20 UTC, Walter Bright wrote:
 On 1/22/2024 1:11 PM, Bruce Carneal wrote:
 Blake3 might be worth a look.  It's reportedly faster and 
 stronger than md5.
 
 https://github.com/BLAKE3-team/BLAKE3
Thanks for the tip. I don't know how to evaluate a hash function for uniqueness.
Blake is > 128 bits, so I don't think there is anything really interesting there anyways. md5 isn't cryptographically secure, mostly because 128 bits isn't cryptographically secure. But let's be real here, a collision would 100% be purposeful, there are zero chances of one happening by mistake. I don't see what using a different hash function would buy us. I doubt the hash of class name would be anywhere close to the bottleneck in term of speed.
Jan 22 2024
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 1/22/2024 3:31 PM, deadalnix wrote:
 I don't see what using a different hash function would buy us. I doubt the
hash 
 of class name would be anywhere close to the bottleneck in term of speed.
The hash also happens at compile time, not runtime. The only thing I would prefer is a hash that's 64 bits instead of 128, so it's one less compare instruction.
Jan 22 2024
prev sibling parent reply Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Monday, 22 January 2024 at 23:31:16 UTC, deadalnix wrote:
 On Monday, 22 January 2024 at 22:44:20 UTC, Walter Bright wrote:
 On 1/22/2024 1:11 PM, Bruce Carneal wrote:
 Blake3 might be worth a look.  It's reportedly faster and 
 stronger than md5.
 
 https://github.com/BLAKE3-team/BLAKE3
Thanks for the tip. I don't know how to evaluate a hash function for uniqueness.
Blake is > 128 bits, so I don't think there is anything really interesting there anyways.
It's probably possible to take just the first (or last) 128 bits of BLAKE3 and the result might have better properties than MD5 (considering that MD5 is broken). There are some answers related to "truncated hash" on the Internet about SHA-256, but any cryptographically secure hash is likely to be similar in this aspect: https://crypto.stackexchange.com/questions/161/should-i-use-the-first-or-last-bits-from-a-sha-256-hash/163#163
Jan 22 2024
parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Monday, January 22, 2024 7:33:33 PM MST Siarhei Siamashka via Digitalmars-d 
wrote:
 On Monday, 22 January 2024 at 23:31:16 UTC, deadalnix wrote:
 On Monday, 22 January 2024 at 22:44:20 UTC, Walter Bright wrote:
 On 1/22/2024 1:11 PM, Bruce Carneal wrote:
 Blake3 might be worth a look.  It's reportedly faster and
 stronger than md5.

 https://github.com/BLAKE3-team/BLAKE3
Thanks for the tip. I don't know how to evaluate a hash function for uniqueness.
Blake is > 128 bits, so I don't think there is anything really interesting there anyways.
It's probably possible to take just the first (or last) 128 bits of BLAKE3 and the result might have better properties than MD5 (considering that MD5 is broken). There are some answers related to "truncated hash" on the Internet about SHA-256, but any cryptographically secure hash is likely to be similar in this aspect: https://crypto.stackexchange.com/questions/161/should-i-use-the-first-or-las
Why on earth would we care if the hash is secure in this context? We obviously care about the likelihood of collisions, and if that's high enough (which does not seem to be the case for md5), then the hash is unsuitable for comparing class names, but why would crytographic security matter when comparing class names in the compiler? - Jonathan M Davis
Jan 22 2024
next sibling parent Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Tuesday, 23 January 2024 at 03:34:23 UTC, Jonathan M Davis 
wrote:
 On Monday, January 22, 2024 7:33:33 PM MST Siarhei Siamashka 
 via Digitalmars-d wrote:
 On Monday, 22 January 2024 at 23:31:16 UTC, deadalnix wrote:
 On Monday, 22 January 2024 at 22:44:20 UTC, Walter Bright 
 wrote:
 On 1/22/2024 1:11 PM, Bruce Carneal wrote:
 Blake3 might be worth a look.  It's reportedly faster and 
 stronger than md5.

 https://github.com/BLAKE3-team/BLAKE3
Thanks for the tip. I don't know how to evaluate a hash function for uniqueness.
Blake is > 128 bits, so I don't think there is anything really interesting there anyways.
It's probably possible to take just the first (or last) 128 bits of BLAKE3 and the result might have better properties than MD5 (considering that MD5 is broken). There are some answers related to "truncated hash" on the Internet about SHA-256, but any cryptographically secure hash is likely to be similar in this aspect: https://crypto.stackexchange.com/questions/161/should-i-use-the-first-or-las
Why on earth would we care if the hash is secure in this context? We obviously care about the likelihood of collisions, and if that's high enough (which does not seem to be the case for md5), then the hash is unsuitable for comparing class names, but why would crytographic security matter when comparing class names in the compiler?
BLAKE3 was mentioned because it is allegedly faster than MD5 (this still needs to be confirmed). You can take as many bits of its output as you need and being too large is not a problem. Having high likelihood of collisions would have made it a bad hash from the security standpoint, so you don't need to worry about collisions for your use case either.
Jan 22 2024
prev sibling parent Bruce Carneal <bcarneal gmail.com> writes:
On Tuesday, 23 January 2024 at 03:34:23 UTC, Jonathan M Davis 
wrote:
 On Monday, January 22, 2024 7:33:33 PM MST Siarhei Siamashka 
 via Digitalmars-d wrote:
 On Monday, 22 January 2024 at 23:31:16 UTC, deadalnix wrote:
 On Monday, 22 January 2024 at 22:44:20 UTC, Walter Bright 
 wrote:
 On 1/22/2024 1:11 PM, Bruce Carneal wrote:
 Blake3 might be worth a look.  It's reportedly faster and 
 stronger than md5.
It's probably possible to take just the first (or last) 128 bits of BLAKE3 and the result might have better properties than MD5 (considering that MD5 is broken). There are some answers related to "truncated hash" on the Internet about SHA-256, but any cryptographically secure hash is likely to be similar in this aspect: https://crypto.stackexchange.com/questions/161/should-i-use-the-first-or-las
Why on earth would we care if the hash is secure in this context? We obviously care about the likelihood of collisions, and if that's high enough (which does not seem to be the case for md5), then the hash is unsuitable for comparing class names, but why would crytographic security matter when comparing class names in the compiler? - Jonathan M Davis
I don't think it matters at all except for what it implies about collisions. And, again, I don't think any modern hash is bad wrt collisions but I'm not a hash researcher. If it is true that any modern hash is in the "more likely to be smacked by a meteor than experience a collision" category *and* the current (md5) time burn is more than a blip then it's probably worth taking a look at the alternatives. OTOH, it's not like there is a shortage of things to work on... :-)
Jan 22 2024
prev sibling parent Bruce Carneal <bcarneal gmail.com> writes:
On Monday, 22 January 2024 at 22:44:20 UTC, Walter Bright wrote:
 On 1/22/2024 1:11 PM, Bruce Carneal wrote:
 Blake3 might be worth a look.  It's reportedly faster and 
 stronger than md5.
 
 https://github.com/BLAKE3-team/BLAKE3
Thanks for the tip. I don't know how to evaluate a hash function for uniqueness.
I've not looked into that aspect either. I'd just assumed that all modern hashes were quite good and was just looking at the time to compute the hash. I'm not at all concerned about accidental collisions apart from a broken implementation. As you say, more likely that we get smacked by a meteor. I've read hash comparisons claiming blake3 is anywhere from 3X to 9X faster than MD5 on a single core wrt throughput. No idea on how things would work out for this particular use case though, nor even if a 9X throughput upgrade would be noticeable. I'd suspect not, just wanted to put forward an alternative.
Jan 22 2024
prev sibling parent reply Andrea Fontana <nospam example.org> writes:
On Monday, 22 January 2024 at 20:46:26 UTC, Walter Bright wrote:
 On 1/22/2024 2:08 AM, Andrea Fontana wrote:
 Why md5 and not a faster method?
Md5 has an extremely remote chance of two class names hashing to the same value. Some people have argued this is unacceptable, though I opine that the odds are so low they are unimaginable to humans. A replacement that has a perceptible collision rate will be unacceptable. This is not a hash backed up by a string compare. The hash has to be a substitute for the string compare.
So probably you like xxhash128 https://xxhash.com/ Andrea
Jan 23 2024
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/23/2024 3:49 AM, Andrea Fontana wrote:
 So probably you like xxhash128
 
 https://xxhash.com/
Thanks! I do, it has a 64 bit version, does anyone know what the collision rate is?
Jan 23 2024
next sibling parent David Gileadi <gileadisNOSPM gmail.com> writes:
On 1/23/24 2:37 PM, Walter Bright wrote:
 On 1/23/2024 3:49 AM, Andrea Fontana wrote:
 So probably you like xxhash128

 https://xxhash.com/
Thanks! I do, it has a 64 bit version, does anyone know what the collision rate is?
xxhash has a wiki page about it [1]. Given that it's their page, it's probably unsurprising that xxhash does well. [1]: https://github.com/Cyan4973/xxHash/wiki/Collision-ratio-comparison#collision-study
Jan 23 2024
prev sibling next sibling parent reply Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Tuesday, 23 January 2024 at 21:37:42 UTC, Walter Bright wrote:
 On 1/23/2024 3:49 AM, Andrea Fontana wrote:
 So probably you like xxhash128
 
 https://xxhash.com/
Thanks! I do, it has a 64 bit version, does anyone know what the collision rate is?
If you really need a 64-bit version, then you can simply take the first 64 bits of MD5. Assuming no defects in a 64-bit hash implementation, the collision probability is `1.0 / 2^^64` for a pair of hashed strings. And it becomes a https://en.wikipedia.org/wiki/Birthday_problem if the number of hashed strings is larger. There's no magic algorithmic sauce that can make a small hash collision resistant. A quote from the Wikipedia article:
 As an example, if a 64-bit hash is used, there are 
 approximately 1.8×10^^19 different outputs. If these are all 
 equally probable (the best case), then it would take 'only' 
 approximately 5 billion attempts (5.38×10^^9) to generate a 
 collision using brute force.[6] This value is called birthday 
 bound[7] and for n-bit codes it could be approximated as 
 2^^(n/2).
I would say that any 64-bit hash is not sufficiently collision resistant for the problem that you are trying to solve.
Jan 23 2024
parent reply deadalnix <deadalnix gmail.com> writes:
On Tuesday, 23 January 2024 at 22:15:11 UTC, Siarhei Siamashka 
wrote:
 On Tuesday, 23 January 2024 at 21:37:42 UTC, Walter Bright 
 wrote:
 On 1/23/2024 3:49 AM, Andrea Fontana wrote:
 So probably you like xxhash128
 
 https://xxhash.com/
Thanks! I do, it has a 64 bit version, does anyone know what the collision rate is?
If you really need a 64-bit version, then you can simply take the first 64 bits of MD5. Assuming no defects in a 64-bit hash implementation, the collision probability is `1.0 / 2^^64` for a pair of hashed strings. And it becomes a https://en.wikipedia.org/wiki/Birthday_problem if the number of hashed strings is larger. There's no magic algorithmic sauce that can make a small hash collision resistant. A quote from the Wikipedia article:
While it is not possible to do better than this, it is certainly possible - and in fact way easier than you'd expect - to do worse. I see no reason to move away from md5 here. It works, its speed is nowhere close to the bottleneck, and its pre-image resistance is of no concern for this use case. This thread has gone in the typical bullshit D goes into, arguing about the intricacies of some details that do not matter at all for the value delivered in the end.
Jan 23 2024
next sibling parent Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Wednesday, 24 January 2024 at 01:18:18 UTC, deadalnix wrote:
 On Tuesday, 23 January 2024 at 22:15:11 UTC, Siarhei Siamashka 
 wrote:
 On Tuesday, 23 January 2024 at 21:37:42 UTC, Walter Bright 
 wrote:
 On 1/23/2024 3:49 AM, Andrea Fontana wrote:
 So probably you like xxhash128
 
 https://xxhash.com/
Thanks! I do, it has a 64 bit version, does anyone know what the collision rate is?
If you really need a 64-bit version, then you can simply take the first 64 bits of MD5. Assuming no defects in a 64-bit hash implementation, the collision probability is `1.0 / 2^^64` for a pair of hashed strings. And it becomes a https://en.wikipedia.org/wiki/Birthday_problem if the number of hashed strings is larger. There's no magic algorithmic sauce that can make a small hash collision resistant. A quote from the Wikipedia article:
While it is not possible to do better than this, it is certainly possible - and in fact way easier than you'd expect - to do worse.
Go ahead and show me truncated output of any hash, designed to be cryptographically secure, behaving badly. I know that xxhash, ahash, fnv or any other non-secure hashes are prone to have collision issues, as they are primarily optimized for speed while being borderline acceptable from the collision standpoint.
Jan 23 2024
prev sibling parent ryuukk_ <ryuukk.dev gmail.com> writes:
On Wednesday, 24 January 2024 at 01:18:18 UTC, deadalnix wrote:
 This thread has gone in the typical bullshit D goes into, 
 arguing about the intricacies of some details that do not 
 matter at all for the value delivered in the end.
Why try to demotivate people? this could be a good entry point for a new contributor, it's not "bullshit", what's bullshit is sitting here doing nothing but complain Submit PRs, encourage people to fix things, don't be negative, reading you feels like there is no point even participating, not everyone is a compiler developper, not everyone is able to "improve your OOP ABI", everyone is however able to benchmark and change the hashing function for a faster one If you want "Improve the OOP ABI", it better make the compiler remain fast, that detail, is not futile
Jan 24 2024
prev sibling next sibling parent Andrea Fontana <nospam example.com> writes:
On Tuesday, 23 January 2024 at 21:37:42 UTC, Walter Bright wrote:
 On 1/23/2024 3:49 AM, Andrea Fontana wrote:
 So probably you like xxhash128
 
 https://xxhash.com/
Thanks! I do, it has a 64 bit version, does anyone know what the collision rate is?
Check this: https://fossies.org/linux/xxHash/tests/collisions/README.md All good hash algorithms should be uniform... Andrea
Jan 23 2024
prev sibling parent deadalnix <deadalnix gmail.com> writes:
On Tuesday, 23 January 2024 at 21:37:42 UTC, Walter Bright wrote:
 On 1/23/2024 3:49 AM, Andrea Fontana wrote:
 So probably you like xxhash128
 
 https://xxhash.com/
Thanks! I do, it has a 64 bit version, does anyone know what the collision rate is?
For 64 bits, your best best is CRC64. However, the risk of collision starts to be a bit too high to rely exclusively on this.
Jan 23 2024
prev sibling parent reply Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright wrote:
 On 9/28/2023 5:42 AM, deadalnix wrote:
 1/ Downcast to final classes.
This has a PR for it now. https://github.com/dlang/dmd/pull/16032
Just wondering, could a trie be used instead of hashing algorithm? It would guarantee that no collisions would happen at all, with better performance than simple comparison. Best regards, Alexandru.
Jan 22 2024
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 23/01/2024 11:48 AM, Alexandru Ermicioi wrote:
 On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright wrote:
 On 9/28/2023 5:42 AM, deadalnix wrote:
 1/ Downcast to final classes.
This has a PR for it now. https://github.com/dlang/dmd/pull/16032
Just wondering, could a trie be used instead of hashing algorithm? It would guarantee that no collisions would happen at all, with better performance than simple comparison. Best regards, Alexandru.
That sounds an lot like an assumption that TypeInfo is unique in a process. Which may not be true. If it was true, we could compare pointers and don't need any of this logic.
Jan 22 2024
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/22/2024 2:50 PM, Richard (Rikki) Andrew Cattermole wrote:
 That sounds an lot like an assumption that TypeInfo is unique in a process. 
 Which may not be true.
 If it was true, we could compare pointers and don't need any of this logic.
It isn't true because of shared libraries.
Jan 22 2024
parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 23/01/2024 12:52 PM, Walter Bright wrote:
 On 1/22/2024 2:50 PM, Richard (Rikki) Andrew Cattermole wrote:
 That sounds an lot like an assumption that TypeInfo is unique in a 
 process. Which may not be true.
 If it was true, we could compare pointers and don't need any of this 
 logic.
It isn't true because of shared libraries.
Or incremental compilation, the symbol could at least in theory be hidden in the object file and not be available for merging in the binary. Regardless it can't be assumed if you like your sanity!
Jan 22 2024