digitalmars.D - Improve the OOP ABI

deadalnix (128/128) Sep 28 2023 Hi all,

Richard (Rikki) Andrew Cattermole (7/14) Sep 28 2023 TypeInfo isn't guaranteed to be unique to a process when dealing with

Adam D Ruppe (11/16) Sep 28 2023 This should be fixed, but even if it isn't, that just means using
deadalnix (5/20) Sep 28 2023 lol, wat?

Guillaume Piolat (3/12) Sep 28 2023 I think it's the same in C++, and sometimes dynamic_cast ends up

deadalnix (3/17) Sep 28 2023 What's the reason for this madness?

Guillaume Piolat (8/11) Sep 29 2023 For example for GCC =>

deadalnix (7/21) Sep 29 2023 At least the GCC one has some explaination. What stroke me is:

Guillaume Piolat (6/10) Sep 29 2023 I'm way underqualified. Using RTLD_NOW (and not sure what it

Richard (Rikki) Andrew Cattermole (18/18) Sep 29 2023 https://linux.die.net/man/3/dlopen

Walter Bright (3/4) Jan 11 2024 You're right. The ClassInfo can be merged with the vtbl[]. The trouble i...

FeepingCreature (4/7) Sep 28 2023 We should solve this problem by guaranteeing TypeInfo to be

Richard (Rikki) Andrew Cattermole (2/2) Sep 28 2023 I said this on Discord, but that means all TypeInfo has to be exported,

FeepingCreature (8/10) Sep 29 2023 Yep, and as I said, I don't think that harms anything.

Richard (Rikki) Andrew Cattermole (3/6) Sep 29 2023 The linker can't do that. So you're relying entirely on the compiler for...

Alexandru Ermicioi (7/9) Sep 29 2023 There might be an issue when your lib uses two other libs, and

Richard (Rikki) Andrew Cattermole (6/15) Sep 29 2023 It shouldn't be. As long as everything is the same codegen wise (except

ryuukk_ (9/11) Sep 30 2023 I agree, however let us not get side tracked

Alexandru Ermicioi (4/7) Oct 01 2023 Thats one extremely biased view on D use. OOP is useful for

deadalnix (6/13) Oct 01 2023 Indeed, if the basics such as OOP are not implemented right,

max haughton (11/25) Oct 01 2023 Also worth adding that there is a difference between the midwit

Jonathan M Davis (30/41) Oct 01 2023 If they're not what the language currently does, it's still a new featur...

deadalnix (4/6) Nov 13 2023 Here we go:
Walter Bright (4/4) Jan 11 2024 https://issues.dlang.org/show_bug.cgi?id=24332
Walter Bright (3/4) Jan 14 2024 This has a PR for it now.

deadalnix (2/6) Jan 18 2024 Thanks, that's a step in the right direction.

Walter Bright (2/3) Jan 19 2024 You're welcome. I'd still like a list of the other problems you alluded ...

Andrea Fontana (3/7) Jan 22 2024 Why md5 and not a faster method?

ryuukk_ (3/12) Jan 22 2024 That is good opportunity to submit a PR, try it, it's not that

Andrea Fontana (6/22) Jan 22 2024 I know: I've already submitted PR to dmd and phobos :)

Andrea Fontana (3/26) Jan 22 2024 The right one is FNV-1a:

Andrea Fontana (3/5) Jan 22 2024 Probably xxhash is even faster, but there's no public domain
Hipreme (86/116) Jan 22 2024 FNV1A is used for `hashOf`. HashOf seems to be a lot faster than

ryuukk_ (6/127) Jan 22 2024 It hashes class name, they'll probably never be more than 30

Andrea Fontana (4/17) Jan 22 2024 Indeed. And probably xxhash would be even faster if well

Alexandru Ermicioi (2/3) Jan 22 2024 What algorithm would you suggest?
Walter Bright (7/8) Jan 22 2024 Md5 has an extremely remote chance of two class names hashing to the sam...

Richard (Rikki) Andrew Cattermole (12/23) Jan 22 2024 Quite a few months ago I discussed this problem for class comparison and...

Walter Bright (3/4) Jan 22 2024 https://stackoverflow.com/questions/201705/how-many-random-elements-befo...

Richard (Rikki) Andrew Cattermole (7/14) Jan 22 2024 If we were gambling on something not happening, those are great odds.

Walter Bright (9/18) Jan 22 2024 I do understand the desire for mathematical perfection, and feel it myse...

Bruce Carneal (4/14) Jan 22 2024 Blake3 might be worth a look. It's reportedly faster and

Walter Bright (2/5) Jan 22 2024 Thanks for the tip. I don't know how to evaluate a hash function for uni...

deadalnix (10/17) Jan 22 2024 Blake is > 128 bits, so I don't think there is anything really

Walter Bright (4/6) Jan 22 2024 The hash also happens at compile time, not runtime. The only thing I wou...
Siarhei Siamashka (8/19) Jan 22 2024 It's probably possible to take just the first (or last) 128 bits

Jonathan M Davis (8/29) Jan 22 2024 Why on earth would we care if the hash is secure in this context? We

Siarhei Siamashka (8/41) Jan 22 2024 BLAKE3 was mentioned because it is allegedly faster than MD5
Bruce Carneal (11/38) Jan 22 2024 I don't think it matters at all except for what it implies about

Bruce Carneal (11/18) Jan 22 2024 I've not looked into that aspect either. I'd just assumed that

Andrea Fontana (4/14) Jan 23 2024 So probably you like xxhash128

Walter Bright (2/5) Jan 23 2024 Thanks! I do, it has a 64 bit version, does anyone know what the collisi...

David Gileadi (5/12) Jan 23 2024 xxhash has a wiki page about it [1]. Given that it's their page, it's
Siarhei Siamashka (11/24) Jan 23 2024 If you really need a 64-bit version, then you can simply take the

deadalnix (10/27) Jan 23 2024 While it is not possible to do better than this, it is certainly

Siarhei Siamashka (6/30) Jan 23 2024 Go ahead and show me truncated output of any hash, designed to be
ryuukk_ (11/14) Jan 24 2024 Why try to demotivate people? this could be a good entry point

Andrea Fontana (5/11) Jan 23 2024 Check this:
deadalnix (3/9) Jan 23 2024 For 64 bits, your best best is CRC64. However, the risk of

Alexandru Ermicioi (6/10) Jan 22 2024 Just wondering, could a trie be used instead of hashing algorithm?

Richard (Rikki) Andrew Cattermole (4/19) Jan 22 2024 That sounds an lot like an assumption that TypeInfo is unique in a

Walter Bright (2/5) Jan 22 2024 It isn't true because of shared libraries.

Richard (Rikki) Andrew Cattermole (4/11) Jan 22 2024 Or incremental compilation, the symbol could at least in theory be

deadalnix <deadalnix gmail.com> writes:

Hi all,

I have been arguing for quite some time about the fact that there 
is a well of improvement to make to D that are not necessarily 
breaking changes or languages addition - and that, in fact, D 
having more feature than most language, adding feature is clearly 
not where the impact is.

Today I would like to provide a concrete example of things that 
could be dramatically improved, with no breakage except at the 
ABI level - which we already do not guarantee. Let's go over 
these changes.

1/ Downcast to final classes.

```d
class A {}
final class B : A {}

auto foo(A a) {
     return cast(B) a;
}
```

This generates a call into the runtime, making things completely 
opaque to the optimizer. The algorithm used by the runtime is 
linear. But there is a dumb constant time solution. because B is 
final, a is either an instance of B, or it is not (as in, it 
cannot be an instance of a subclass of B). Therefore, we expect 
the codegen to look like the following instead:

```d
auto foo(A a) {
     return typeid(a) is typeid(B) ? a : null;
}
```

This obviously won't compile but you get the idea. Not only this 
is constant time, but very simple too, and visible to the 
optimizer, which means the check can be folded by the opitimizer 
after other transforms, for instance inlining.

2/ Inline ClassInfo with the vtbl.

The current layout of objects is as follow:

```
+-----------+    +-----------+    +-----------+
|  __vtbl   +--->|  typeid   +--->| ClassInfo |
+-----------+    +-----------+    |           |
| __monitor |    |  method0  |    |    ...    |
+-----------+    +-----------+    |           |
|  field0   |    |  method1  |    +-----------+
+-----------+    +-----------+
|  field1   |    |    ...    |
+-----------+    +-----------+
|    ...    |
+-----------+
```

This causes a ton of extra indirections that are not necessary. 
instead the following layout ought to be used:

```
+-----------+    +-----------+
|  __vtbl   +--->| ClassInfo |
+-----------+    |           |
| __monitor |    |    ...    |
+-----------+    |           |
|  field0   |    +-----------+
+-----------+    |  method0  |
|  field1   |    +-----------+
+-----------+    |  method1  |
|    ...    |    +-----------+
+-----------+    |    ...    |
                  +-----------+
```

Alternatively, it is possible to have the pointer point at the 
first method and subtract to get the typeid. The important part 
is that there is only one indirection now.

3/ Downcast.

Currently, we use a linear algorithm to downcast and we reach 
this algorithm via a call int he runtime. This prevent the 
optimizer from doing anything about it, in addition to be slow. 
The problem is similar to 1/. The whole check can be made trivial 
if we let the compiler prebake some data for us, namely an array 
of primary parents. Let's see what it looks like in the first 
example, when b is not final.

```d
class A {}
class B : A {}

auto foo(A a) {
     return cast(B) a;
}
```

Which we can do as follow:

```d
auto foo(A a) {
     // The primaries generated by the compiler for us.
     auto tidA = typeid(A);
     assert(tidA.primaries = [tidA]);
     auto tidB = typeid(B);
     assert(tidB.primaries = [tidA, tidB]);

     auto t = typeid(a);
     auto depth = tidB.primaries.length - 1;

     if (t.primary.length <= depth) {
         return null;
     }

     if (t.primary[depth] == tidB) {
         return a;
     }

     return null;
}
```

This is starting to look good, now we have general downcast in a 
form that is not only really fast to perform, but also completely 
transparent to the optimizer.

4/ Downcast to interfaces.

This is getting a bit more tricky, so I won't expand this one 
fully, but it is also possible to do much better on this front as 
well. The obvious complication that that interfaces allow for 
multiple inheritance.

The first obvious optimization is to consider the chain of 
leftmost (or alternatively the longest chain, which allows to 
cull more candidates faster) inheritance the primaries for the 
interface and eliminate the whole branch at once using the same 
strategy as 3/.

Now we are left with possible secondaries match. Here, no 
possibility to remain O(1), we'll have to loop over the other 
interfaces and repeat the matching process. Thankfully very 
branch hierarchy are uncommon in practice, but even then, we can 
use this one weird trick to dramatically cull out the search 
space: bloom filters.

Make the bloom filter 64 bits and simply cull via `(targetBloom & 
aggregateBloom) == targetBloom` and you can eliminate a good 
chunk of the search right there.

I hope this is the kind of change we can see in D in the future. 
This is the kind of changes that would make D better in practice, 
not the next cool feature, but that the current, existing 
feature, have an quality of implementation that is in line with 
what can be expected in a modern language.

Sep 28 2023

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 29/09/2023 1:42 AM, deadalnix wrote:
 1/ Downcast to final classes.

 ```d
 auto foo(A a) {
      return typeid(a) is typeid(B) ? a : null;
 }
 ```

TypeInfo isn't guaranteed to be unique to a process when dealing with 
shared libraries. There is a strong chance it'll be duplicative.

So this won't work.

 2/ Inline ClassInfo with the vtbl.

ClassInfo is an alias to TypeInfo_Class now. This indirection shouldn't 
exist.

https://github.com/dlang/dmd/blob/a6285c26c0e04b9b3b1d40ae1c3200c834df1480/druntime/src/object.d#L1737

Sep 28 2023

Adam D Ruppe <destructionator gmail.com> writes:

On Thursday, 28 September 2023 at 13:05:01 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 TypeInfo isn't guaranteed to be unique to a process when 
 dealing with shared libraries. There is a strong chance it'll 
 be duplicative.

This should be fixed, but even if it isn't, that just means using 
the ==  operator instead of the is operator; the concept i 
believe is still sound.

(the == operator sux tho, this really should be fixed at the 
source)

 ClassInfo is an alias to TypeInfo_Class now. This indirection 
 shouldn't exist.

this means emplacing the typeinfo_class in the same memory block 
as the vtbl instead of linking to it.

so the virtual table exists at *(instance + 
__traits(classInstanceSize, TypeInfo_Class))

Sep 28 2023

deadalnix <deadalnix gmail.com> writes:

On Thursday, 28 September 2023 at 13:05:01 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 On 29/09/2023 1:42 AM, deadalnix wrote:
 1/ Downcast to final classes.

 ```d
 auto foo(A a) {
      return typeid(a) is typeid(B) ? a : null;
 }
 ```

 TypeInfo isn't guaranteed to be unique to a process when 
 dealing with shared libraries. There is a strong chance it'll 
 be duplicative.

 So this won't work.

lol, wat?

That sounds crazy.

 2/ Inline ClassInfo with the vtbl.

 ClassInfo is an alias to TypeInfo_Class now. This indirection 
 shouldn't exist.

 https://github.com/dlang/dmd/blob/a6285c26c0e04b9b3b1d40ae1c3200c834df1480/druntime/src/object.d#L1737

It does exist.

Sep 28 2023

Guillaume Piolat <first.name gmail.com> writes:

On Thursday, 28 September 2023 at 14:31:33 UTC, deadalnix wrote:
 TypeInfo isn't guaranteed to be unique to a process when 
 dealing with shared libraries. There is a strong chance it'll 
 be duplicative.

 So this won't work.

 lol, wat?

 That sounds crazy.

I think it's the same in C++, and sometimes dynamic_cast ends up 
making string comparisons.

Sep 28 2023

deadalnix <deadalnix gmail.com> writes:

On Thursday, 28 September 2023 at 14:41:16 UTC, Guillaume Piolat 
wrote:
 On Thursday, 28 September 2023 at 14:31:33 UTC, deadalnix wrote:
 TypeInfo isn't guaranteed to be unique to a process when 
 dealing with shared libraries. There is a strong chance it'll 
 be duplicative.

 So this won't work.

 lol, wat?

 That sounds crazy.

 I think it's the same in C++, and sometimes dynamic_cast ends 
 up making string comparisons.

What's the reason for this madness?

Sep 28 2023

Guillaume Piolat <first.name gmail.com> writes:

On Thursday, 28 September 2023 at 15:39:22 UTC, deadalnix wrote:
 I think it's the same in C++, and sometimes dynamic_cast ends 
 up making string comparisons.

 What's the reason for this madness?

For example for GCC => 
https://github.com/gcc-mirror/gcc/blob/41d6b10e96a1de98e90a7c0378437c3255814b16/libstdc%2B%2B-v3/libsupc%2B%2B/typeinfo#L48

MSVCR => 
https://github.com/ojdkbuild/tools_toolchain_vs2017bt_1416/blob/master/VC/Tools/MSVC/14.16.27023/crt/src/vcruntime/rtti.cpp#L27

I don't understand any of it, just read "strcmp" in a profiler 
once in a big C++ app. It turns out it was RTTI which was taking 
a small percentage of time.

Sep 29 2023

deadalnix <deadalnix gmail.com> writes:

On Friday, 29 September 2023 at 12:15:53 UTC, Guillaume Piolat 
wrote:
 On Thursday, 28 September 2023 at 15:39:22 UTC, deadalnix wrote:
 I think it's the same in C++, and sometimes dynamic_cast ends 
 up making string comparisons.

 What's the reason for this madness?

 For example for GCC => 
 https://github.com/gcc-mirror/gcc/blob/41d6b10e96a1de98e90a7c0378437c3255814b16/libstdc%2B%2B-v3/libsupc%2B%2B/typeinfo#L48

 MSVCR => 
 https://github.com/ojdkbuild/tools_toolchain_vs2017bt_1416/blob/master/VC/Tools/MSVC/14.16.27023/crt/src/vcruntime/rtti.cpp#L27

 I don't understand any of it, just read "strcmp" in a profiler 
 once in a big C++ app. It turns out it was RTTI which was 
 taking a small percentage of time.

At least the GCC one has some explaination. What stroke me is:

 but even with weak symbols sometimes names are not merged
 when objects are loaded with RTLD_LOCAL

Which seems to indicate the duplication is not a fatality, but 
they have to handle it because they might be working with stuff 
that are loaded with RTLD_LOCAL? What does this mean? is this 
really a constraint that we have?

Sep 29 2023

Guillaume Piolat <first.name gmail.com> writes:

On Friday, 29 September 2023 at 12:33:13 UTC, deadalnix wrote:
 Which seems to indicate the duplication is not a fatality, but 
 they have to handle it because they might be working with stuff 
 that are loaded with RTLD_LOCAL? What does this mean? is this 
 really a constraint that we have?

I'm way underqualified. Using RTLD_NOW (and not sure what it 
does), it seems to work on all platforms, but I certainly do not 
expect RTTI to work across binaries. C ABI provides minimum 
headache. Hipreme also do shared libs, and they went the minimal 
druntime way. We want less of druntime and Phobos, not more of it.

Sep 29 2023

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

https://linux.die.net/man/3/dlopen

RTLD_NOW
If this value is specified, or the environment variable LD_BIND_NOW is 
set to a nonempty string, all undefined symbols in the library are 
resolved before dlopen() returns. If this cannot be done, an error is 
returned.

That is fine, this is kinda what you should usually use to prevent 
errors happening elsewhere (which won't be easily understood).



RTLD_GLOBAL
The symbols defined by this library will be made available for symbol 
resolution of subsequently loaded libraries.

RTLD_LOCAL
This is the converse of RTLD_GLOBAL, and the default if neither flag is 
specified. Symbols defined in this library are not made available to 
resolve references in subsequently loaded libraries.

RTLD_LOCAL basically isolates all exports from one shared library once 
loaded from being matched to other shared library/executable externs. 
This is probably what you want when loading plugins.

Sep 29 2023

Walter Bright <newshound2 digitalmars.com> writes:

On 9/28/2023 7:31 AM, deadalnix wrote:
 It does exist.

You're right. The ClassInfo can be merged with the vtbl[]. The trouble is it's
a 
variable size, so that would have to be redone, too.

Jan 11 2024

FeepingCreature <feepingcreature gmail.com> writes:

On Thursday, 28 September 2023 at 13:05:01 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 TypeInfo isn't guaranteed to be unique to a process when 
 dealing with shared libraries. There is a strong chance it'll 
 be duplicative.

We should solve this problem by guaranteeing TypeInfo to be 
unique to a process. I don't think that harms anything.

Sep 28 2023

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

I said this on Discord, but that means all TypeInfo has to be exported, 
even for executables.

Sep 28 2023

FeepingCreature <feepingcreature gmail.com> writes:

On Thursday, 28 September 2023 at 19:25:15 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 I said this on Discord, but that means all TypeInfo has to be 
 exported, even for executables.

Yep, and as I said, I don't think that harms anything.

Note that you can still remove TypeInfo that isn't referenced in 
an executable/library; the only TypeInfo that needs to be 
exported are those that you actually look at. And those need to 
be in the binary anyway. So I really don't see where the problem 
is here.

Sep 29 2023

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 29/09/2023 8:43 PM, FeepingCreature wrote:
 Note that you can still remove TypeInfo that isn't referenced in an 
 executable/library; the only TypeInfo that needs to be exported are 
 those that you actually look at.

The linker can't do that. So you're relying entirely on the compiler for 
this.

Sep 29 2023

Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:

On Thursday, 28 September 2023 at 19:25:15 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 I said this on Discord, but that means all TypeInfo has to be 
 exported, even for executables.

There might be an issue when your lib uses two other libs, and 
both of them use lib X but of different versions. This might 
cause issues with exported structures when they differ from 
version to version, and etc. Could this also be an issue for 
exported TypeInfos?

Sep 29 2023

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 29/09/2023 9:04 PM, Alexandru Ermicioi wrote:
 On Thursday, 28 September 2023 at 19:25:15 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 I said this on Discord, but that means all TypeInfo has to be 
 exported, even for executables.

 
 There might be an issue when your lib uses two other libs, and both of 
 them use lib X but of different versions. This might cause issues with 
 exported structures when they differ from version to version, and etc. 
 Could this also be an issue for exported TypeInfos?

It shouldn't be. As long as everything is the same codegen wise (except 
pointer values), then two TypeInfo's should compare identically.

But you almost got to the concern, not when there are two different 
versions of a package, but when they are the same version, but linked 
into different binaries and not exported.

Sep 29 2023

ryuukk_ <ryuukk.dev gmail.com> writes:

On Thursday, 28 September 2023 at 12:42:04 UTC, deadalnix wrote:
 D having more feature than most language, adding feature is 
 clearly not where the impact is.

I agree, however let us not get side tracked

Asking for pattern matching? it's not new feature, it's improving 
``switch``

Asking for tagged union? it's not new feature, it's improving 
``enum``

However, you are wrong at expecting an impact for D by improving 
OOP, nope, D needs to improve the non-OOP story, more OOP is 
repulsive, specially for system languages

Sep 30 2023

Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:

On Sunday, 1 October 2023 at 00:21:48 UTC, ryuukk_ wrote:
 However, you are wrong at expecting an impact for D by 
 improving OOP, nope, D needs to improve the non-OOP story, more 
 OOP is repulsive, specially for system languages

Thats one extremely biased view on D use. OOP is useful for 
application development (irrelevant of your feelings), and D is 
also used for app development.

Oct 01 2023

deadalnix <deadalnix gmail.com> writes:

On Sunday, 1 October 2023 at 13:07:10 UTC, Alexandru Ermicioi 
wrote:
 On Sunday, 1 October 2023 at 00:21:48 UTC, ryuukk_ wrote:
 However, you are wrong at expecting an impact for D by 
 improving OOP, nope, D needs to improve the non-OOP story, 
 more OOP is repulsive, specially for system languages

 Thats one extremely biased view on D use. OOP is useful for 
 application development (irrelevant of your feelings), and D is 
 also used for app development.

Indeed, if the basics such as OOP are not implemented right, 
pilling up more and more on top of it is not going to help. Solid 
foundations matter. OOP is just a basic feature of modern 
programing languages and widely used in the industry.

Oct 01 2023

max haughton <maxhaton gmail.com> writes:

On Sunday, 1 October 2023 at 22:01:56 UTC, deadalnix wrote:
 On Sunday, 1 October 2023 at 13:07:10 UTC, Alexandru Ermicioi 
 wrote:
 On Sunday, 1 October 2023 at 00:21:48 UTC, ryuukk_ wrote:
 However, you are wrong at expecting an impact for D by 
 improving OOP, nope, D needs to improve the non-OOP story, 
 more OOP is repulsive, specially for system languages

 Thats one extremely biased view on D use. OOP is useful for 
 application development (irrelevant of your feelings), and D 
 is also used for app development.

 Indeed, if the basics such as OOP are not implemented right, 
 pilling up more and more on top of it is not going to help. 
 Solid foundations matter. OOP is just a basic feature of modern 
 programing languages and widely used in the industry.

Also worth adding that there is a difference between the midwit 
bullshit enterprise OOP, the modern synthesis OOP one might read 
in a book, and merely classes.

Improvements to the latter should be a free win, other than ABI 
breakage.
One of the reasons why I think we need to massively reduce the 
surface of the D toolchain is that if we don't we massively 
increase the activation energy to actually implement these 
changes —  hand waving using Arrhenius/Boltzmann, T ~ 
O(\exp(E_a)), doesn't scale well.

Oct 01 2023

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Saturday, September 30, 2023 6:21:48 PM MDT ryuukk_ via Digitalmars-d 
wrote:
 On Thursday, 28 September 2023 at 12:42:04 UTC, deadalnix wrote:
 D having more feature than most language, adding feature is
 clearly not where the impact is.

 I agree, however let us not get side tracked

 Asking for pattern matching? it's not new feature, it's improving
 ``switch``

 Asking for tagged union? it's not new feature, it's improving
 ``enum``

 However, you are wrong at expecting an impact for D by improving
 OOP, nope, D needs to improve the non-OOP story, more OOP is
 repulsive, specially for system languages

If they're not what the language currently does, it's still a new feature.
They may be features that you want, and they may even be much easier to add
than many other features, but they're still not currently part of the
language.

The main point here is that there are a number of features that we already
have which are not implemented as well as they could be, which sometimes
results in bugs, and sometimes results in performance issues. Ideally, we
would be making sure that what we already have works as advertised and
performs well, and in general, issues with language features that we already
have are far more likely to cause problems than the fact that we're missing
a particular feature that someone might want.

New features can definitely improve the language, but we also need to make
sure that what we already have is solid. And arguably, the focus is often
too much on adding a feature that some particular subset of the community
thinks is a "must-have" and not enough on improving what the language
already does. And in large part, it's backlash over issues caused by not
focusing enough on stability and improving what we already have which has
led to the D foundation deciding to temporarily freeze the DIP process and
focus more on bug fixing first.

We're still going to be getting new features, but when we can identify
issues like the one that Deadalnix has brought up here, we really should be
trying to fix them. And while OOP is not as large a focus in D as it is in
many other programming languages, it's still a paradigm that D supports and
many programs use. So, if we can speed it up, that's definitely a win for
the language. And since the runtime uses classes, it could actually speed up
D programs that generally avoid OOP. And since the compiler uses classes, it
might even speed up compilation times, which is a win for everyone.

- Jonathan M Davis

Oct 01 2023

deadalnix <deadalnix gmail.com> writes:

On Thursday, 28 September 2023 at 12:42:04 UTC, deadalnix wrote:
 Hi all,

 [...]

Here we go: 
https://github.com/snazzy-d/sdc/blob/master/test/llvm/downcast.d

Except the interface part.

Nov 13 2023

Walter Bright <newshound2 digitalmars.com> writes:

https://issues.dlang.org/show_bug.cgi?id=24332
https://issues.dlang.org/show_bug.cgi?id=24333
https://issues.dlang.org/show_bug.cgi?id=24335
https://issues.dlang.org/show_bug.cgi?id=24336

Jan 11 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 9/28/2023 5:42 AM, deadalnix wrote:
 1/ Downcast to final classes.

This has a PR for it now.

https://github.com/dlang/dmd/pull/16032

Jan 14 2024

deadalnix <deadalnix gmail.com> writes:

On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright wrote:
 On 9/28/2023 5:42 AM, deadalnix wrote:
 1/ Downcast to final classes.

 This has a PR for it now.

 https://github.com/dlang/dmd/pull/16032

Thanks, that's a step in the right direction.

Jan 18 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 1/18/2024 8:20 PM, deadalnix wrote:
 Thanks, that's a step in the right direction.

You're welcome. I'd still like a list of the other problems you alluded to.

Jan 19 2024

Andrea Fontana <nospam example.org> writes:

On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright wrote:
 On 9/28/2023 5:42 AM, deadalnix wrote:
 1/ Downcast to final classes.

 This has a PR for it now.

 https://github.com/dlang/dmd/pull/16032

Why md5 and not a faster method?

Andrea

Jan 22 2024

ryuukk_ <ryuukk.dev gmail.com> writes:

On Monday, 22 January 2024 at 10:08:16 UTC, Andrea Fontana wrote:
 On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright wrote:
 On 9/28/2023 5:42 AM, deadalnix wrote:
 1/ Downcast to final classes.

 This has a PR for it now.

 https://github.com/dlang/dmd/pull/16032

 Why md5 and not a faster method?

 Andrea

That is good opportunity to submit a PR, try it, it's not that 
hard and is rewarding

Jan 22 2024

Andrea Fontana <nospam example.org> writes:

On Monday, 22 January 2024 at 10:23:55 UTC, ryuukk_ wrote:
 On Monday, 22 January 2024 at 10:08:16 UTC, Andrea Fontana 
 wrote:
 On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright 
 wrote:
 On 9/28/2023 5:42 AM, deadalnix wrote:
 1/ Downcast to final classes.

 This has a PR for it now.

 https://github.com/dlang/dmd/pull/16032

 Why md5 and not a faster method?

 Andrea

 That is good opportunity to submit a PR, try it, it's not that 
 hard and is rewarding

I know: I've already submitted PR to dmd and phobos :)

For example, this one. Easy, fast, insecure (who cares, in this 
case).
http://www.isthe.com/chongo/tech/comp/fnv/

Andrea

Jan 22 2024

Andrea Fontana <nospam example.org> writes:

On Monday, 22 January 2024 at 13:47:34 UTC, Andrea Fontana wrote:
 On Monday, 22 January 2024 at 10:23:55 UTC, ryuukk_ wrote:
 On Monday, 22 January 2024 at 10:08:16 UTC, Andrea Fontana 
 wrote:
 On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright 
 wrote:
 On 9/28/2023 5:42 AM, deadalnix wrote:
 1/ Downcast to final classes.

 This has a PR for it now.

 https://github.com/dlang/dmd/pull/16032

 Why md5 and not a faster method?

 Andrea

 That is good opportunity to submit a PR, try it, it's not that 
 hard and is rewarding

 I know: I've already submitted PR to dmd and phobos :)

 For example, this one. Easy, fast, insecure (who cares, in this 
 case).
 http://www.isthe.com/chongo/tech/comp/fnv/

 Andrea

The right one is FNV-1a:
http://www.isthe.com/chongo/src/fnv/hash_32a.c

Jan 22 2024

Andrea Fontana <nospam example.org> writes:

On Monday, 22 January 2024 at 13:48:20 UTC, Andrea Fontana wrote:
 The right one is FNV-1a:
 http://www.isthe.com/chongo/src/fnv/hash_32a.c

Probably xxhash is even faster, but there's no public domain 
implementation.

Jan 22 2024

Hipreme <msnmancini hotmail.com> writes:

On Monday, 22 January 2024 at 13:48:20 UTC, Andrea Fontana wrote:
 On Monday, 22 January 2024 at 13:47:34 UTC, Andrea Fontana 
 wrote:
 On Monday, 22 January 2024 at 10:23:55 UTC, ryuukk_ wrote:
 On Monday, 22 January 2024 at 10:08:16 UTC, Andrea Fontana 
 wrote:
 On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright 
 wrote:
 On 9/28/2023 5:42 AM, deadalnix wrote:
 1/ Downcast to final classes.

 This has a PR for it now.

 https://github.com/dlang/dmd/pull/16032

 Why md5 and not a faster method?

 Andrea

 That is good opportunity to submit a PR, try it, it's not 
 that hard and is rewarding

 I know: I've already submitted PR to dmd and phobos :)

 For example, this one. Easy, fast, insecure (who cares, in 
 this case).
 http://www.isthe.com/chongo/tech/comp/fnv/

 Andrea

 The right one is FNV-1a:
 http://www.isthe.com/chongo/src/fnv/hash_32a.c


FNV1A is used for `hashOf`. HashOf seems to be a lot faster than 
md5 only when you're dealing with smaller strings. I've done a 
test online and you'll see, this won't do actual difference in 
compilation time:

```d
import std;

     auto test(T)(scope T delegate() dg)
     {
         import std.datetime.stopwatch;
         StopWatch sw = StopWatch(AutoStart.yes);
         scope(exit){writeln(sw.peek.total!"msecs");}
         return dg();
     }

     string repeat(string a, int n)
     {
         char[] ret = new char[](a.length*n);
         foreach(i; 0..n)
             ret[i*a.length..(i+1)*a.length] = a[];
         return cast(string)ret;
     }
     void main()
     {
         import std.digest.md;

         foreach(count; [1, 5, 10, 15, 25, 35, 50, 100, 200])
         {
         	string testString = "a".repeat(count);

             writeln("\nResult for 1_000_000 times executing 
hashes for a string if size ", count);
             test(()
                  {
                      foreach(i; 0..1_000_000)
                          hashOf(testString);
                  });
             test(()
                  {
                      foreach(i; 0..1_000_000)
                          md5Of(testString);
                  });
         }
     }
```

This is the result I have for this program:

```
Result for 1_000_000 times executing hashes for a string if size 1
12
130

Result for 1_000_000 times executing hashes for a string if size 5
25
132

Result for 1_000_000 times executing hashes for a string if size 
10
36
129

Result for 1_000_000 times executing hashes for a string if size 
15
41
124

Result for 1_000_000 times executing hashes for a string if size 
25
54
127

Result for 1_000_000 times executing hashes for a string if size 
35
69
120

Result for 1_000_000 times executing hashes for a string if size 
50
94
116

Result for 1_000_000 times executing hashes for a string if size 
100
176
217

Result for 1_000_000 times executing hashes for a string if size 
200
379
394
```

As you see, this is a result for 1 million times hashing strings. 
I can't even think in many situations people would be doing 1 
million time hashings. For a big program, you'll probably get 200 
string literals. And if you test, it is still 0 milliseconds. So, 
MD5 is a pretty solution for that problem, and this would be too 
much effort for actually no gain at all, it could even create 
duplicate strings for no millisecond gain :)

Jan 22 2024

ryuukk_ <ryuukk.dev gmail.com> writes:

On Monday, 22 January 2024 at 14:22:33 UTC, Hipreme wrote:
 On Monday, 22 January 2024 at 13:48:20 UTC, Andrea Fontana 
 wrote:
 On Monday, 22 January 2024 at 13:47:34 UTC, Andrea Fontana 
 wrote:
 On Monday, 22 January 2024 at 10:23:55 UTC, ryuukk_ wrote:
 On Monday, 22 January 2024 at 10:08:16 UTC, Andrea Fontana 
 wrote:
 On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright 
 wrote:
 On 9/28/2023 5:42 AM, deadalnix wrote:
 1/ Downcast to final classes.

 This has a PR for it now.

 https://github.com/dlang/dmd/pull/16032

 Why md5 and not a faster method?

 Andrea

 That is good opportunity to submit a PR, try it, it's not 
 that hard and is rewarding

 I know: I've already submitted PR to dmd and phobos :)

 For example, this one. Easy, fast, insecure (who cares, in 
 this case).
 http://www.isthe.com/chongo/tech/comp/fnv/

 Andrea

 The right one is FNV-1a:
 http://www.isthe.com/chongo/src/fnv/hash_32a.c


 FNV1A is used for `hashOf`. HashOf seems to be a lot faster 
 than md5 only when you're dealing with smaller strings. I've 
 done a test online and you'll see, this won't do actual 
 difference in compilation time:

 ```d
 import std;

     auto test(T)(scope T delegate() dg)
     {
         import std.datetime.stopwatch;
         StopWatch sw = StopWatch(AutoStart.yes);
         scope(exit){writeln(sw.peek.total!"msecs");}
         return dg();
     }

     string repeat(string a, int n)
     {
         char[] ret = new char[](a.length*n);
         foreach(i; 0..n)
             ret[i*a.length..(i+1)*a.length] = a[];
         return cast(string)ret;
     }
     void main()
     {
         import std.digest.md;

         foreach(count; [1, 5, 10, 15, 25, 35, 50, 100, 200])
         {
         	string testString = "a".repeat(count);

             writeln("\nResult for 1_000_000 times executing 
 hashes for a string if size ", count);
             test(()
                  {
                      foreach(i; 0..1_000_000)
                          hashOf(testString);
                  });
             test(()
                  {
                      foreach(i; 0..1_000_000)
                          md5Of(testString);
                  });
         }
     }
 ```

 This is the result I have for this program:

 ```
 Result for 1_000_000 times executing hashes for a string if 
 size 1
 12
 130

 Result for 1_000_000 times executing hashes for a string if 
 size 5
 25
 132

 Result for 1_000_000 times executing hashes for a string if 
 size 10
 36
 129

 Result for 1_000_000 times executing hashes for a string if 
 size 15
 41
 124

 Result for 1_000_000 times executing hashes for a string if 
 size 25
 54
 127

 Result for 1_000_000 times executing hashes for a string if 
 size 35
 69
 120

 Result for 1_000_000 times executing hashes for a string if 
 size 50
 94
 116

 Result for 1_000_000 times executing hashes for a string if 
 size 100
 176
 217

 Result for 1_000_000 times executing hashes for a string if 
 size 200
 379
 394
 ```

 As you see, this is a result for 1 million times hashing 
 strings. I can't even think in many situations people would be 
 doing 1 million time hashings. For a big program, you'll 
 probably get 200 string literals. And if you test, it is still 
 0 milliseconds. So, MD5 is a pretty solution for that problem, 
 and this would be too much effort for actually no gain at all, 
 it could even create duplicate strings for no millisecond gain 
 :)

It hashes class name, they'll probably never be more than 30 
characters, so md5 in your example would be 2x slower

I think it worth doing more benchmarks, even few milliseconds 
gained is worthwhile, that's milliseconds you could spend doing 
more template work for example, it's additive

Jan 22 2024

Andrea Fontana <nospam example.org> writes:

On Monday, 22 January 2024 at 15:46:02 UTC, ryuukk_ wrote:
 As you see, this is a result for 1 million times hashing 
 strings. I can't even think in many situations people would be 
 doing 1 million time hashings. For a big program, you'll 
 probably get 200 string literals. And if you test, it is still 
 0 milliseconds. So, MD5 is a pretty solution for that problem, 
 and this would be too much effort for actually no gain at all, 
 it could even create duplicate strings for no millisecond gain 
 :)

 It hashes class name, they'll probably never be more than 30 
 characters, so md5 in your example would be 2x slower

 I think it worth doing more benchmarks, even few milliseconds 
 gained is worthwhile, that's milliseconds you could spend doing 
 more template work for example, it's additive

Indeed. And probably xxhash would be even faster if well 
optimized, I guess.

Andrea

Jan 22 2024

Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:

On Monday, 22 January 2024 at 10:08:16 UTC, Andrea Fontana wrote:
 Why md5 and not a faster method?


What algorithm would you suggest?

Jan 22 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 1/22/2024 2:08 AM, Andrea Fontana wrote:
 Why md5 and not a faster method?

Md5 has an extremely remote chance of two class names hashing to the same
value. 
Some people have argued this is unacceptable, though I opine that the odds are 
so low they are unimaginable to humans.

A replacement that has a perceptible collision rate will be unacceptable.

This is not a hash backed up by a string compare. The hash has to be a 
substitute for the string compare.

Jan 22 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 23/01/2024 9:46 AM, Walter Bright wrote:
 On 1/22/2024 2:08 AM, Andrea Fontana wrote:
 Why md5 and not a faster method?

 
 Md5 has an extremely remote chance of two class names hashing to the 
 same value. Some people have argued this is unacceptable, though I opine 
 that the odds are so low they are unimaginable to humans.
 
 A replacement that has a perceptible collision rate will be unacceptable.
 
 This is not a hash backed up by a string compare. The hash has to be a 
 substitute for the string compare.

Quite a few months ago I discussed this problem for class comparison and 
proposed the hash to deadalnix.

But not as a substitute for the string comparison, but as a limiter for 
how often it needs to be done.

The string comparison is the only way to do the comparison reliably. You 
can't use the pointers nor do the string comparison all the time, too 
expensive.

So by doing the hash comparison of say an int, we can reliably determine 
if it not a match and fail fast. Sadly no way to success fast without 
potential problems, which is what you seem to be hand waving the 
possible problems for so it'll be fast success.

Jan 22 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 1/22/2024 12:57 PM, Richard (Rikki) Andrew Cattermole wrote:
 The string comparison is the only way to do the comparison reliably.


https://stackoverflow.com/questions/201705/how-many-random-elements-before-md5-produces-collisions

One collision in 6 billion hashes per second for 100 years.

Jan 22 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 23/01/2024 11:41 AM, Walter Bright wrote:
 On 1/22/2024 12:57 PM, Richard (Rikki) Andrew Cattermole wrote:
 The string comparison is the only way to do the comparison reliably.

 
 
 https://stackoverflow.com/questions/201705/how-many-random-elements-before-md5-produces-collisions
 
 One collision in 6 billion hashes per second for 100 years.

If we were gambling on something not happening, those are great odds.

However if this does occur, silent memory corruption is possible.

I don't want to be gambling on whether there will be silent memory 
corruption.

This is my perspective on the topic anyway. I love guarantees of program 
security.

Jan 22 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 1/22/2024 2:49 PM, Richard (Rikki) Andrew Cattermole wrote:
 One collision in 6 billion hashes per second for 100 years.

 
 If we were gambling on something not happening, those are great odds.
 
 However if this does occur, silent memory corruption is possible.
 
 I don't want to be gambling on whether there will be silent memory corruption.
 
 This is my perspective on the topic anyway. I love guarantees of program
security.

I do understand the desire for mathematical perfection, and feel it myself. But 
it has a significant and always present performance penalty. Engineering is 
always about tradeoffs. There's no getting away from it.

I'm not concerned about a meteor hitting my house, either, which is far far
more 
likely.

BTW, D uses md5 hashes to enable the linker to merge redundant string literals.

Other compromises we make for performance include things like floating point 
rounding.

Jan 22 2024

Bruce Carneal <bcarneal gmail.com> writes:

On Monday, 22 January 2024 at 20:46:26 UTC, Walter Bright wrote:
 On 1/22/2024 2:08 AM, Andrea Fontana wrote:
 Why md5 and not a faster method?

 Md5 has an extremely remote chance of two class names hashing 
 to the same value. Some people have argued this is 
 unacceptable, though I opine that the odds are so low they are 
 unimaginable to humans.

 A replacement that has a perceptible collision rate will be 
 unacceptable.

 This is not a hash backed up by a string compare. The hash has 
 to be a substitute for the string compare.

Blake3 might be worth a look.  It's reportedly faster and 
stronger than md5.

https://github.com/BLAKE3-team/BLAKE3

Jan 22 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 1/22/2024 1:11 PM, Bruce Carneal wrote:
 Blake3 might be worth a look.  It's reportedly faster and stronger than md5.
 
 https://github.com/BLAKE3-team/BLAKE3

Thanks for the tip. I don't know how to evaluate a hash function for uniqueness.

Jan 22 2024

deadalnix <deadalnix gmail.com> writes:

On Monday, 22 January 2024 at 22:44:20 UTC, Walter Bright wrote:
 On 1/22/2024 1:11 PM, Bruce Carneal wrote:
 Blake3 might be worth a look.  It's reportedly faster and 
 stronger than md5.
 
 https://github.com/BLAKE3-team/BLAKE3

 Thanks for the tip. I don't know how to evaluate a hash 
 function for uniqueness.

Blake is > 128 bits, so I don't think there is anything really 
interesting there anyways.

md5 isn't cryptographically secure, mostly because 128 bits isn't 
cryptographically secure. But let's be real here, a collision 
would 100% be purposeful, there are zero chances of one happening 
by mistake.

I don't see what using a different hash function would buy us. I 
doubt the hash of class name would be anywhere close to the 
bottleneck in term of speed.

Jan 22 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 1/22/2024 3:31 PM, deadalnix wrote:
 I don't see what using a different hash function would buy us. I doubt the
hash 
 of class name would be anywhere close to the bottleneck in term of speed.

The hash also happens at compile time, not runtime. The only thing I would 
prefer is a hash that's 64 bits instead of 128, so it's one less compare 
instruction.

Jan 22 2024

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Monday, 22 January 2024 at 23:31:16 UTC, deadalnix wrote:
 On Monday, 22 January 2024 at 22:44:20 UTC, Walter Bright wrote:
 On 1/22/2024 1:11 PM, Bruce Carneal wrote:
 Blake3 might be worth a look.  It's reportedly faster and 
 stronger than md5.
 
 https://github.com/BLAKE3-team/BLAKE3

 Thanks for the tip. I don't know how to evaluate a hash 
 function for uniqueness.

 Blake is > 128 bits, so I don't think there is anything really 
 interesting there anyways.

It's probably possible to take just the first (or last) 128 bits 
of BLAKE3 and the result might have better properties than MD5 
(considering that MD5 is broken). There are some answers related 
to "truncated hash" on the Internet about SHA-256, but any 
cryptographically secure hash is likely to be similar in this 
aspect: 
https://crypto.stackexchange.com/questions/161/should-i-use-the-first-or-last-bits-from-a-sha-256-hash/163#163

Jan 22 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Monday, January 22, 2024 7:33:33 PM MST Siarhei Siamashka via Digitalmars-d 
wrote:
 On Monday, 22 January 2024 at 23:31:16 UTC, deadalnix wrote:
 On Monday, 22 January 2024 at 22:44:20 UTC, Walter Bright wrote:
 On 1/22/2024 1:11 PM, Bruce Carneal wrote:
 Blake3 might be worth a look.  It's reportedly faster and
 stronger than md5.

 https://github.com/BLAKE3-team/BLAKE3

 Thanks for the tip. I don't know how to evaluate a hash
 function for uniqueness.

 Blake is > 128 bits, so I don't think there is anything really
 interesting there anyways.

 It's probably possible to take just the first (or last) 128 bits
 of BLAKE3 and the result might have better properties than MD5
 (considering that MD5 is broken). There are some answers related
 to "truncated hash" on the Internet about SHA-256, but any
 cryptographically secure hash is likely to be similar in this
 aspect:
 https://crypto.stackexchange.com/questions/161/should-i-use-the-first-or-las


Why on earth would we care if the hash is secure in this context? We
obviously care about the likelihood of collisions, and if that's high enough
(which does not seem to be the case for md5), then the hash is unsuitable
for comparing class names, but why would crytographic security matter when
comparing class names in the compiler?

- Jonathan M Davis

Jan 22 2024

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Tuesday, 23 January 2024 at 03:34:23 UTC, Jonathan M Davis 
wrote:
 On Monday, January 22, 2024 7:33:33 PM MST Siarhei Siamashka 
 via Digitalmars-d wrote:
 On Monday, 22 January 2024 at 23:31:16 UTC, deadalnix wrote:
 On Monday, 22 January 2024 at 22:44:20 UTC, Walter Bright 
 wrote:
 On 1/22/2024 1:11 PM, Bruce Carneal wrote:
 Blake3 might be worth a look.  It's reportedly faster and 
 stronger than md5.

 https://github.com/BLAKE3-team/BLAKE3

 Thanks for the tip. I don't know how to evaluate a hash 
 function for uniqueness.

 Blake is > 128 bits, so I don't think there is anything 
 really interesting there anyways.

 It's probably possible to take just the first (or last) 128 
 bits
 of BLAKE3 and the result might have better properties than MD5
 (considering that MD5 is broken). There are some answers 
 related
 to "truncated hash" on the Internet about SHA-256, but any
 cryptographically secure hash is likely to be similar in this
 aspect:
 https://crypto.stackexchange.com/questions/161/should-i-use-the-first-or-las


 Why on earth would we care if the hash is secure in this 
 context? We obviously care about the likelihood of collisions, 
 and if that's high enough (which does not seem to be the case 
 for md5), then the hash is unsuitable for comparing class 
 names, but why would crytographic security matter when 
 comparing class names in the compiler?

BLAKE3 was mentioned because it is allegedly faster than MD5 
(this still needs to be confirmed). You can take as many bits of 
its output as you need and being too large is not a problem. 
Having high likelihood of collisions would have made it a bad 
hash from the security standpoint, so you don't need to worry 
about collisions for your use case either.

Jan 22 2024

Bruce Carneal <bcarneal gmail.com> writes:

On Tuesday, 23 January 2024 at 03:34:23 UTC, Jonathan M Davis 
wrote:
 On Monday, January 22, 2024 7:33:33 PM MST Siarhei Siamashka 
 via Digitalmars-d wrote:
 On Monday, 22 January 2024 at 23:31:16 UTC, deadalnix wrote:
 On Monday, 22 January 2024 at 22:44:20 UTC, Walter Bright 
 wrote:
 On 1/22/2024 1:11 PM, Bruce Carneal wrote:
 Blake3 might be worth a look.  It's reportedly faster and 
 stronger than md5.




 It's probably possible to take just the first (or last) 128 
 bits
 of BLAKE3 and the result might have better properties than MD5
 (considering that MD5 is broken). There are some answers 
 related
 to "truncated hash" on the Internet about SHA-256, but any
 cryptographically secure hash is likely to be similar in this
 aspect:
 https://crypto.stackexchange.com/questions/161/should-i-use-the-first-or-las


 Why on earth would we care if the hash is secure in this 
 context? We obviously care about the likelihood of collisions, 
 and if that's high enough (which does not seem to be the case 
 for md5), then the hash is unsuitable for comparing class 
 names, but why would crytographic security matter when 
 comparing class names in the compiler?

 - Jonathan M Davis

I don't think it matters at all except for what it implies about 
collisions.  And, again, I don't think any modern hash is bad wrt 
collisions but I'm not a hash researcher.

If it is true that any modern hash is in the "more likely to be 
smacked by a meteor than experience a collision" category *and* 
the current (md5) time burn is more than a blip then it's 
probably worth taking a look at the alternatives.

OTOH, it's not like there is a shortage of things to work on... 
:-)

Jan 22 2024

Bruce Carneal <bcarneal gmail.com> writes:

On Monday, 22 January 2024 at 22:44:20 UTC, Walter Bright wrote:
 On 1/22/2024 1:11 PM, Bruce Carneal wrote:
 Blake3 might be worth a look.  It's reportedly faster and 
 stronger than md5.
 
 https://github.com/BLAKE3-team/BLAKE3

 Thanks for the tip. I don't know how to evaluate a hash 
 function for uniqueness.

I've not looked into that aspect either.  I'd just assumed that 
all modern hashes were quite good and was just looking at the 
time to compute the hash.  I'm not at all concerned about 
accidental collisions apart from a broken implementation.  As you 
say, more likely that we get smacked by a meteor.

I've read hash comparisons claiming blake3 is anywhere from 3X to 
9X faster than MD5 on a single core wrt throughput.  No idea on 
how things would work out for this particular use case though, 
nor even if a 9X throughput upgrade would be noticeable.  I'd 
suspect not, just wanted to put forward an alternative.

Jan 22 2024

Andrea Fontana <nospam example.org> writes:

On Monday, 22 January 2024 at 20:46:26 UTC, Walter Bright wrote:
 On 1/22/2024 2:08 AM, Andrea Fontana wrote:
 Why md5 and not a faster method?

 Md5 has an extremely remote chance of two class names hashing 
 to the same value. Some people have argued this is 
 unacceptable, though I opine that the odds are so low they are 
 unimaginable to humans.

 A replacement that has a perceptible collision rate will be 
 unacceptable.

 This is not a hash backed up by a string compare. The hash has 
 to be a substitute for the string compare.

So probably you like xxhash128

https://xxhash.com/

Andrea

Jan 23 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 1/23/2024 3:49 AM, Andrea Fontana wrote:
 So probably you like xxhash128
 
 https://xxhash.com/

Thanks! I do, it has a 64 bit version, does anyone know what the collision rate
is?

Jan 23 2024

David Gileadi <gileadisNOSPM gmail.com> writes:

On 1/23/24 2:37 PM, Walter Bright wrote:
 On 1/23/2024 3:49 AM, Andrea Fontana wrote:
 So probably you like xxhash128

 https://xxhash.com/

 
 Thanks! I do, it has a 64 bit version, does anyone know what the 
 collision rate is?

xxhash has a wiki page about it [1]. Given that it's their page, it's 
probably unsurprising that xxhash does well.

[1]: 
https://github.com/Cyan4973/xxHash/wiki/Collision-ratio-comparison#collision-study

Jan 23 2024

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Tuesday, 23 January 2024 at 21:37:42 UTC, Walter Bright wrote:
 On 1/23/2024 3:49 AM, Andrea Fontana wrote:
 So probably you like xxhash128
 
 https://xxhash.com/

 Thanks! I do, it has a 64 bit version, does anyone know what 
 the collision rate is?

If you really need a 64-bit version, then you can simply take the 
first 64 bits of MD5. Assuming no defects in a 64-bit hash 
implementation, the collision probability is `1.0 / 2^^64` for a 
pair of hashed strings. And it becomes a 
https://en.wikipedia.org/wiki/Birthday_problem if the number of 
hashed strings is larger. There's no magic algorithmic sauce that 
can make a small hash collision resistant. A quote from the 
Wikipedia article:

 As an example, if a 64-bit hash is used, there are 
 approximately 1.8×10^^19 different outputs. If these are all 
 equally probable (the best case), then it would take 'only' 
 approximately 5 billion attempts (5.38×10^^9) to generate a 
 collision using brute force.[6] This value is called birthday 
 bound[7] and for n-bit codes it could be approximated as 
 2^^(n/2).

I would say that any 64-bit hash is not sufficiently collision 
resistant for the problem that you are trying to solve.

Jan 23 2024

deadalnix <deadalnix gmail.com> writes:

On Tuesday, 23 January 2024 at 22:15:11 UTC, Siarhei Siamashka 
wrote:
 On Tuesday, 23 January 2024 at 21:37:42 UTC, Walter Bright 
 wrote:
 On 1/23/2024 3:49 AM, Andrea Fontana wrote:
 So probably you like xxhash128
 
 https://xxhash.com/

 Thanks! I do, it has a 64 bit version, does anyone know what 
 the collision rate is?

 If you really need a 64-bit version, then you can simply take 
 the first 64 bits of MD5. Assuming no defects in a 64-bit hash 
 implementation, the collision probability is `1.0 / 2^^64` for 
 a pair of hashed strings. And it becomes a 
 https://en.wikipedia.org/wiki/Birthday_problem if the number of 
 hashed strings is larger. There's no magic algorithmic sauce 
 that can make a small hash collision resistant. A quote from 
 the Wikipedia article:

While it is not possible to do better than this, it is certainly 
possible - and in fact way easier than you'd expect - to do worse.

I see no reason to move away from md5 here. It works, its speed 
is nowhere close to the bottleneck, and its pre-image resistance 
is of no concern for this use case.

This thread has gone in the typical bullshit D goes into, arguing 
about the intricacies of some details that do not matter at all 
for the value delivered in the end.

Jan 23 2024

Siarhei Siamashka <siarhei.siamashka gmail.com> writes:

On Wednesday, 24 January 2024 at 01:18:18 UTC, deadalnix wrote:
 On Tuesday, 23 January 2024 at 22:15:11 UTC, Siarhei Siamashka 
 wrote:
 On Tuesday, 23 January 2024 at 21:37:42 UTC, Walter Bright 
 wrote:
 On 1/23/2024 3:49 AM, Andrea Fontana wrote:
 So probably you like xxhash128
 
 https://xxhash.com/

 Thanks! I do, it has a 64 bit version, does anyone know what 
 the collision rate is?

 If you really need a 64-bit version, then you can simply take 
 the first 64 bits of MD5. Assuming no defects in a 64-bit hash 
 implementation, the collision probability is `1.0 / 2^^64` for 
 a pair of hashed strings. And it becomes a 
 https://en.wikipedia.org/wiki/Birthday_problem if the number 
 of hashed strings is larger. There's no magic algorithmic 
 sauce that can make a small hash collision resistant. A quote 
 from the Wikipedia article:

 While it is not possible to do better than this, it is 
 certainly possible - and in fact way easier than you'd expect - 
 to do worse.

Go ahead and show me truncated output of any hash, designed to be 
cryptographically secure, behaving badly. I know that xxhash, 
ahash, fnv or any other non-secure hashes are prone to have 
collision issues, as they are primarily optimized for speed while 
being borderline acceptable from the collision standpoint.

Jan 23 2024

ryuukk_ <ryuukk.dev gmail.com> writes:

On Wednesday, 24 January 2024 at 01:18:18 UTC, deadalnix wrote:
 This thread has gone in the typical bullshit D goes into, 
 arguing about the intricacies of some details that do not 
 matter at all for the value delivered in the end.

Why try to demotivate people? this could be a good entry point 
for a new contributor, it's not "bullshit", what's bullshit is 
sitting here doing nothing but complain

Submit PRs, encourage people to fix things, don't be negative, 
reading you feels like there is no point even participating, not 
everyone is a compiler developper, not  everyone is able to 
"improve your OOP ABI", everyone is however able to benchmark and 
change the hashing function for a faster one

If you want "Improve the OOP ABI", it better make the compiler 
remain fast, that detail, is not futile

Jan 24 2024

Andrea Fontana <nospam example.com> writes:

On Tuesday, 23 January 2024 at 21:37:42 UTC, Walter Bright wrote:
 On 1/23/2024 3:49 AM, Andrea Fontana wrote:
 So probably you like xxhash128
 
 https://xxhash.com/

 Thanks! I do, it has a 64 bit version, does anyone know what 
 the collision rate is?

Check this:
https://fossies.org/linux/xxHash/tests/collisions/README.md

All good hash algorithms should be uniform...

Andrea

Jan 23 2024

deadalnix <deadalnix gmail.com> writes:

On Tuesday, 23 January 2024 at 21:37:42 UTC, Walter Bright wrote:
 On 1/23/2024 3:49 AM, Andrea Fontana wrote:
 So probably you like xxhash128
 
 https://xxhash.com/

 Thanks! I do, it has a 64 bit version, does anyone know what 
 the collision rate is?

For 64 bits, your best best is CRC64. However, the risk of 
collision starts to be a bit too high to rely exclusively on this.

Jan 23 2024

Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:

On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright wrote:
 On 9/28/2023 5:42 AM, deadalnix wrote:
 1/ Downcast to final classes.

 This has a PR for it now.

 https://github.com/dlang/dmd/pull/16032

Just wondering, could a trie be used instead of hashing algorithm?

It would guarantee that no collisions would happen at all, with 
better performance than simple comparison.

Best regards,
Alexandru.

Jan 22 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 23/01/2024 11:48 AM, Alexandru Ermicioi wrote:
 On Sunday, 14 January 2024 at 09:04:32 UTC, Walter Bright wrote:
 On 9/28/2023 5:42 AM, deadalnix wrote:
 1/ Downcast to final classes.

 This has a PR for it now.

 https://github.com/dlang/dmd/pull/16032

 
 Just wondering, could a trie be used instead of hashing algorithm?
 
 It would guarantee that no collisions would happen at all, with better 
 performance than simple comparison.
 
 Best regards,
 Alexandru.

That sounds an lot like an assumption that TypeInfo is unique in a 
process. Which may not be true.

If it was true, we could compare pointers and don't need any of this logic.

Jan 22 2024

Walter Bright <newshound2 digitalmars.com> writes:

On 1/22/2024 2:50 PM, Richard (Rikki) Andrew Cattermole wrote:
 That sounds an lot like an assumption that TypeInfo is unique in a process. 
 Which may not be true.
 If it was true, we could compare pointers and don't need any of this logic.

It isn't true because of shared libraries.

Jan 22 2024

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 23/01/2024 12:52 PM, Walter Bright wrote:
 On 1/22/2024 2:50 PM, Richard (Rikki) Andrew Cattermole wrote:
 That sounds an lot like an assumption that TypeInfo is unique in a 
 process. Which may not be true.
 If it was true, we could compare pointers and don't need any of this 
 logic.

 
 It isn't true because of shared libraries.

Or incremental compilation, the symbol could at least in theory be 
hidden in the object file and not be available for merging in the binary.

Regardless it can't be assumed if you like your sanity!

Jan 22 2024

D Programming

C/C++ Programming

Other

digitalmars.D - Improve the OOP ABI