www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - core.traits?

reply Manu <turkeyman gmail.com> writes:
So, druntime has core.internal.traits where a bunch of std.traits have
been mirrored to support internal machinery within druntime.
This is clear evidence that a lot of these traits are really
super-critical to doing basically anything interesting with D.
I have experience with no-phobos projects in the past where I've been
frustrated that I had to mirror all the traits I needed manually.

I suggest, a fair set of std.traits (no-brainer traits that you
basically can't live without) should be officially moved to
core.traits, so that they are always available to all D users.
Traits are pure-templates, they don't emit code, and have no impact on
the size of the druntime binary. They significantly shouldn't affect
build times unless they are instantiated.
...and they're already there in core.internal.traits.

We should move them to core.traits, and that should be their official
home. It really just makes sense. Uncontroversial low-level traits
don't belong in phobos.
Jan 05
next sibling parent reply kinke <noone nowhere.com> writes:
On Saturday, 5 January 2019 at 21:12:54 UTC, Manu wrote:
 We should move them to core.traits, and that should be their 
 official home. It really just makes sense. Uncontroversial 
 low-level traits don't belong in phobos.
I fully agree.
Jan 05
parent Brian <zoujiaqing gmail.com> writes:
On Saturday, 5 January 2019 at 21:31:38 UTC, kinke wrote:
 On Saturday, 5 January 2019 at 21:12:54 UTC, Manu wrote:
 We should move them to core.traits, and that should be their 
 official home. It really just makes sense. Uncontroversial 
 low-level traits don't belong in phobos.
I fully agree.
YES, YES!! Agree!!!
Jan 08
prev sibling next sibling parent reply Seb <seb wilzba.ch> writes:
On Saturday, 5 January 2019 at 21:12:54 UTC, Manu wrote:
 So, druntime has core.internal.traits where a bunch of 
 std.traits have
 been mirrored to support internal machinery within druntime.
 This is clear evidence that a lot of these traits are really
 super-critical to doing basically anything interesting with D.
 I have experience with no-phobos projects in the past where 
 I've been
 frustrated that I had to mirror all the traits I needed 
 manually.

 [...]
I would go even one step further and move everything and just alias things in std.traits, s.t. no breakage happens.
Jan 05
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sat, Jan 05, 2019 at 09:49:58PM +0000, Seb via Digitalmars-d wrote:
 On Saturday, 5 January 2019 at 21:12:54 UTC, Manu wrote:
 So, druntime has core.internal.traits where a bunch of std.traits have
 been mirrored to support internal machinery within druntime.
 This is clear evidence that a lot of these traits are really
 super-critical to doing basically anything interesting with D.
 I have experience with no-phobos projects in the past where I've been
 frustrated that I had to mirror all the traits I needed manually.
 
 [...]
I would go even one step further and move everything and just alias things in std.traits, s.t. no breakage happens.
I concur! We've been adding ugly nasty hacks to druntime, or writing code in circumlocutous ways, for far too long now, all because certain basic traits happen to be in std.traits and it's verboten to import Phobos from druntime. It's time to revisit that decision. The more complex traits should remain in Phobos in order not to complicate druntime too much, but the basic ones needed also in druntime should moved into druntime, instead of copy-pasta or roll-your-own in druntime. T -- It's amazing how careful choice of punctuation can leave you hanging:
Jan 05
prev sibling parent Manu <turkeyman gmail.com> writes:
On Sat, Jan 5, 2019 at 2:04 PM H. S. Teoh via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Sat, Jan 05, 2019 at 09:49:58PM +0000, Seb via Digitalmars-d wrote:
 On Saturday, 5 January 2019 at 21:12:54 UTC, Manu wrote:
 So, druntime has core.internal.traits where a bunch of std.traits have
 been mirrored to support internal machinery within druntime.
 This is clear evidence that a lot of these traits are really
 super-critical to doing basically anything interesting with D.
 I have experience with no-phobos projects in the past where I've been
 frustrated that I had to mirror all the traits I needed manually.

 [...]
I would go even one step further and move everything and just alias things in std.traits, s.t. no breakage happens.
I concur! We've been adding ugly nasty hacks to druntime, or writing code in circumlocutous ways, for far too long now, all because certain basic traits happen to be in std.traits and it's verboten to import Phobos from druntime. It's time to revisit that decision. The more complex traits should remain in Phobos in order not to complicate druntime too much, but the basic ones needed also in druntime should moved into druntime, instead of copy-pasta or roll-your-own in druntime. T -- It's amazing how careful choice of punctuation can leave you hanging:
Great! So, who's gonna do it? I'm already overloaded with these sorts of refactors >_<
Jan 05
prev sibling next sibling parent Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Saturday, 5 January 2019 at 21:12:54 UTC, Manu wrote:
 So, druntime has core.internal.traits where a bunch of 
 std.traits have
 been mirrored to support internal machinery within druntime.
 This is clear evidence that a lot of these traits are really
 super-critical to doing basically anything interesting with D.
 I have experience with no-phobos projects in the past where 
 I've been
 frustrated that I had to mirror all the traits I needed 
 manually.

 [...]
Makes sense to me.
Jan 05
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/5/2019 1:12 PM, Manu wrote:
 We should move them to core.traits, and that should be their official
 home. It really just makes sense. Uncontroversial low-level traits
 don't belong in phobos.
Sounds good.
Jan 05
next sibling parent reply Manu <turkeyman gmail.com> writes:
On Sat, Jan 5, 2019 at 11:00 PM Walter Bright via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On 1/5/2019 1:12 PM, Manu wrote:
 We should move them to core.traits, and that should be their official
 home. It really just makes sense. Uncontroversial low-level traits
 don't belong in phobos.
Sounds good.
Okay, so this is a challenging effort, since phobos is such a tangled rats nets of chaos... But attempting to move some traits immediately calls into question std.meta. I think we can all agree that Alias and AliasSeq should be in druntime along with core traits... but where should it live? Should there be core.meta as well? It's kinda like core.traits, in that it doesn't include runtime code, it doesn't increase the payload of druntime.lib for end-users.. Perhaps AliasSeq should live somewhere different? I'm feeling like a lean/trimmed-down core.meta might want to exist next to core.traits though; it seems reasonable. ...yes, this process will go on and on. The only way forward is to take each hurdle one at a time... and ideally, in attempting this effort, we can de-tangle a lot of cruft during the process.
Jan 07
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 1/7/19 4:25 PM, Manu wrote:
 On Sat, Jan 5, 2019 at 11:00 PM Walter Bright via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On 1/5/2019 1:12 PM, Manu wrote:
 We should move them to core.traits, and that should be their official
 home. It really just makes sense. Uncontroversial low-level traits
 don't belong in phobos.
Sounds good.
Okay, so this is a challenging effort, since phobos is such a tangled rats nets of chaos... But attempting to move some traits immediately calls into question std.meta. I think we can all agree that Alias and AliasSeq should be in druntime along with core traits... but where should it live? Should there be core.meta as well? It's kinda like core.traits, in that it doesn't include runtime code, it doesn't increase the payload of druntime.lib for end-users.. Perhaps AliasSeq should live somewhere different? I'm feeling like a lean/trimmed-down core.meta might want to exist next to core.traits though; it seems reasonable. ....yes, this process will go on and on. The only way forward is to take each hurdle one at a time... and ideally, in attempting this effort, we can de-tangle a lot of cruft during the process.
I was going to say core.meta should just have the basic definitions, but why not just dump all of it in there. As you said, they are templates (and so you only pay if you use them), and really part of the core language definition. There are already parts of std.meta inside core.internal as well, so if we want to be consistent, we should just move it all ;) -Steve
Jan 07
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Mon, Jan 07, 2019 at 01:25:17PM -0800, Manu via Digitalmars-d wrote:
[...]
 Okay, so this is a challenging effort, since phobos is such a tangled
 rats nets of chaos...
It's already gotten better over the years in some ways (though not others -- unfortunately I'm afraid std.traits might be one of the places where things have probably gotten more tangled). It certainly hasn't lived up to the promise of that old Phobos Philosophy page that once existed but has since been removed, of Phobos being a collection of lightweight, self-contained, mostly-orthogonal, reusable components. It has become quite the opposite, where the dependency graph of Phobos modules is approaching pretty close to being a complete graph. (And yes, there are some pretty deep-seated cyclic dependencies that thus far nobody has been able to truly unravel in any satisfactory way.)
 But attempting to move some traits immediately calls into question
 std.meta.  I think we can all agree that Alias and AliasSeq should be
 in druntime along with core traits... but where should it live?
 Should there be core.meta as well? It's kinda like core.traits, in
 that it doesn't include runtime code, it doesn't increase the payload
 of druntime.lib for end-users..
Shouldn't all of core.traits be like that? I'd hardly expect any runtime component to be associated with something called 'traits'.
 Perhaps AliasSeq should live somewhere different?
 I'm feeling like a lean/trimmed-down core.meta might want to exist
 next to core.traits though; it seems reasonable.
I'm afraid this would set the wrong precedent -- since there's core.traits for std.traits and core.meta for std.meta, why not also have core.typecons, core.range, and then it's all gonna go downhill from there, and before you know it there's gonna be core.stdio and core.format... *shudder*
 ...yes, this process will go on and on. The only way forward is to
 take each hurdle one at a time... and ideally, in attempting this
 effort, we can de-tangle a lot of cruft during the process.
I'm tempted to say we should put everything in core.traits for now. And just the absolute bare minimum it takes to meet whatever druntime needs, and nothing more. T -- I think Debian's doing something wrong, `apt-get install pesticide', doesn't seem to remove the bugs on my system! -- Mike Dresser
Jan 07
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 1/7/19 4:41 PM, H. S. Teoh wrote:
 On Mon, Jan 07, 2019 at 01:25:17PM -0800, Manu via Digitalmars-d wrote:
 [...]
 Perhaps AliasSeq should live somewhere different?
 I'm feeling like a lean/trimmed-down core.meta might want to exist
 next to core.traits though; it seems reasonable.
I'm afraid this would set the wrong precedent -- since there's core.traits for std.traits and core.meta for std.meta, why not also have core.typecons, core.range, and then it's all gonna go downhill from there, and before you know it there's gonna be core.stdio and core.format... *shudder*
std.internal.traits contains pieces of std.meta -- a quick look shows it has AliasSeq (but under the name TypeTuple), allSatisfy, anySatisfy, Filter, staticMap. We might as well just move the whole thing there. The fear of moving other pieces is real, but only if you look superficially. traits and meta are really part of the language, I can't imagine using D without it (and neither can any of the people who put pieces of those modules into druntime). I don't want to see anything more complex and interdependent get sucked in. I know the goal for Manu right now is emplace, which does feel like a language feature. But as has been said many times here, std.traits is a no-brainer, and std.meta is really a building block that std.traits uses. -Steve
Jan 07
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Mon, Jan 07, 2019 at 04:54:15PM -0500, Steven Schveighoffer via
Digitalmars-d wrote:
 On 1/7/19 4:41 PM, H. S. Teoh wrote:
 On Mon, Jan 07, 2019 at 01:25:17PM -0800, Manu via Digitalmars-d wrote:
 [...]
 Perhaps AliasSeq should live somewhere different?  I'm feeling
 like a lean/trimmed-down core.meta might want to exist next to
 core.traits though; it seems reasonable.
I'm afraid this would set the wrong precedent -- since there's core.traits for std.traits and core.meta for std.meta, why not also have core.typecons, core.range, and then it's all gonna go downhill from there, and before you know it there's gonna be core.stdio and core.format... *shudder*
std.internal.traits contains pieces of std.meta -- a quick look shows it has AliasSeq (but under the name TypeTuple),
What, wut...? `TypeTuple` still exists?! I thought we had gone through a somewhat painful deprecation cycle just to kill it off. Or is that cycle not done yet...?
 allSatisfy, anySatisfy, Filter, staticMap. We might as well just move
 the whole thing there.
I dunno, IMO allSatisfy, anySatisfy, and esp. Filter and staticMap are all heavy-weight templates (not in terms of code complexity, but in the sheer amount of templates that will get instantiated when you use them, AKA compiler slowdown fuel). I'm not so sure they should go into druntime.
 The fear of moving other pieces is real, but only if you look
 superficially.  traits and meta are really part of the language, I
 can't imagine using D without it (and neither can any of the people
 who put pieces of those modules into druntime).
Actually, I find myself redefining AliasSeq locally with a shorter name all the time. Scarily enough, doing that is shorter than typing `import std.traits : AliasSeq;`.
 I don't want to see anything more complex and interdependent get
 sucked in.  I know the goal for Manu right now is emplace, which does
 feel like a language feature. But as has been said many times here,
 std.traits is a no-brainer, and std.meta is really a building block
 that std.traits uses.
[...] Wait, so you're saying we should move the *entire* std.meta and std.traits to druntime...? T -- ASCII stupid question, getty stupid ANSI.
Jan 07
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 1/7/19 5:10 PM, H. S. Teoh wrote:
 On Mon, Jan 07, 2019 at 04:54:15PM -0500, Steven Schveighoffer via
Digitalmars-d wrote:
 On 1/7/19 4:41 PM, H. S. Teoh wrote:
 On Mon, Jan 07, 2019 at 01:25:17PM -0800, Manu via Digitalmars-d wrote:
 [...]
 Perhaps AliasSeq should live somewhere different?  I'm feeling
 like a lean/trimmed-down core.meta might want to exist next to
 core.traits though; it seems reasonable.
I'm afraid this would set the wrong precedent -- since there's core.traits for std.traits and core.meta for std.meta, why not also have core.typecons, core.range, and then it's all gonna go downhill from there, and before you know it there's gonna be core.stdio and core.format... *shudder*
std.internal.traits contains pieces of std.meta -- a quick look shows it has AliasSeq (but under the name TypeTuple),
What, wut...? `TypeTuple` still exists?! I thought we had gone through a somewhat painful deprecation cycle just to kill it off. Or is that cycle not done yet...?
It's an internal definition, so it can be anything it wants it to be. It could be Tuple or Foobar.
 
 
 allSatisfy, anySatisfy, Filter, staticMap. We might as well just move
 the whole thing there.
I dunno, IMO allSatisfy, anySatisfy, and esp. Filter and staticMap are all heavy-weight templates (not in terms of code complexity, but in the sheer amount of templates that will get instantiated when you use them, AKA compiler slowdown fuel). I'm not so sure they should go into druntime.
Again, they are already there, and for a reason.
 The fear of moving other pieces is real, but only if you look
 superficially.  traits and meta are really part of the language, I
 can't imagine using D without it (and neither can any of the people
 who put pieces of those modules into druntime).
Actually, I find myself redefining AliasSeq locally with a shorter name all the time. Scarily enough, doing that is shorter than typing `import std.traits : AliasSeq;`.
The point is to have it universally recognized construct. When your code's documentation says `TList` or whatever, then someone has to go figure that out. If it says `AliasSeq`, it's known what it is.
 I don't want to see anything more complex and interdependent get
 sucked in.  I know the goal for Manu right now is emplace, which does
 feel like a language feature. But as has been said many times here,
 std.traits is a no-brainer, and std.meta is really a building block
 that std.traits uses.
[...] Wait, so you're saying we should move the *entire* std.meta and std.traits to druntime...?
At least std.meta. std.traits can possibly be split into critical language-enabling traits and the less important ones, but I don't know off the top of my head which ones are those. But I would say std.meta is composed only of the former variety. -Steve
Jan 07
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
On Mon, Jan 7, 2019 at 2:10 PM H. S. Teoh via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Mon, Jan 07, 2019 at 04:54:15PM -0500, Steven Schveighoffer via
Digitalmars-d wrote:
 On 1/7/19 4:41 PM, H. S. Teoh wrote:
 On Mon, Jan 07, 2019 at 01:25:17PM -0800, Manu via Digitalmars-d wrote:
 [...]
 Perhaps AliasSeq should live somewhere different?  I'm feeling
 like a lean/trimmed-down core.meta might want to exist next to
 core.traits though; it seems reasonable.
I'm afraid this would set the wrong precedent -- since there's core.traits for std.traits and core.meta for std.meta, why not also have core.typecons, core.range, and then it's all gonna go downhill from there, and before you know it there's gonna be core.stdio and core.format... *shudder*
std.internal.traits contains pieces of std.meta -- a quick look shows it has AliasSeq (but under the name TypeTuple),
What, wut...? `TypeTuple` still exists?! I thought we had gone through a somewhat painful deprecation cycle just to kill it off. Or is that cycle not done yet...?
 allSatisfy, anySatisfy, Filter, staticMap. We might as well just move
 the whole thing there.
I dunno, IMO allSatisfy, anySatisfy, and esp. Filter and staticMap are all heavy-weight templates (not in terms of code complexity, but in the sheer amount of templates that will get instantiated when you use them, AKA compiler slowdown fuel).
Ummm, well, yes and no. staticMap is DEFINITELY like you say, one of the worst, and I really think it needs a language solution. C++'s `...` expression is exactly staticMap, and I wonder if we need an expression like that in-language to take load off staticMap, because it's perhaps one of the slowest parts f the language. We can translate almost anything to CTFE, *except* code that invokes staticMap, so I really reckon it needs a language tool. This is an aside, if we wanna talk about staticMap, we should start a new thread.
 I'm not so sure they should go into druntime.
Then you're kinda making a case that NONE of it should go in druntime, because they're such common building blocks. Also, you've already lost the game, because they're already in druntime (in core.internal).
 The fear of moving other pieces is real, but only if you look
 superficially.  traits and meta are really part of the language, I
 can't imagine using D without it (and neither can any of the people
 who put pieces of those modules into druntime).
Actually, I find myself redefining AliasSeq locally with a shorter name all the time. Scarily enough, doing that is shorter than typing `import std.traits : AliasSeq;`.
I've been known to do that too... I could do that for core.traits aswell, but that seems pretty lame.
 I don't want to see anything more complex and interdependent get
 sucked in.  I know the goal for Manu right now is emplace, which does
 feel like a language feature. But as has been said many times here,
 std.traits is a no-brainer, and std.meta is really a building block
 that std.traits uses.
[...] Wait, so you're saying we should move the *entire* std.meta and std.traits to druntime...?
I'd like to be more selective than that; at very least, audit every single symbol coming across. But some part of me thinks this may actually be a very reasonable proposal...
Jan 07
prev sibling next sibling parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Monday, January 7, 2019 3:10:02 PM MST H. S. Teoh via Digitalmars-d 
wrote:
 On Mon, Jan 07, 2019 at 04:54:15PM -0500, Steven Schveighoffer via 
Digitalmars-d wrote:
 On 1/7/19 4:41 PM, H. S. Teoh wrote:
 On Mon, Jan 07, 2019 at 01:25:17PM -0800, Manu via Digitalmars-d
 wrote:
 [...]

 Perhaps AliasSeq should live somewhere different?  I'm feeling
 like a lean/trimmed-down core.meta might want to exist next to
 core.traits though; it seems reasonable.
I'm afraid this would set the wrong precedent -- since there's core.traits for std.traits and core.meta for std.meta, why not also have core.typecons, core.range, and then it's all gonna go downhill from there, and before you know it there's gonna be core.stdio and core.format... *shudder*
std.internal.traits contains pieces of std.meta -- a quick look shows it has AliasSeq (but under the name TypeTuple),
What, wut...? `TypeTuple` still exists?! I thought we had gone through a somewhat painful deprecation cycle just to kill it off. Or is that cycle not done yet...?
We're talking about druntime internals here. TypeTuple was one of the things that got copied into druntime as internal. It was later renamed to AliasSeq in Phobos, but the copied stuff in druntime didn't necessarily get changed, since it was all internal and had nothing to do with the Phobos stuff it originated from.
 allSatisfy, anySatisfy, Filter, staticMap. We might as well just move
 the whole thing there.
I dunno, IMO allSatisfy, anySatisfy, and esp. Filter and staticMap are all heavy-weight templates (not in terms of code complexity, but in the sheer amount of templates that will get instantiated when you use them, AKA compiler slowdown fuel). I'm not so sure they should go into druntime.
Historically, we've tried to only put stuff in druntime that needs to be in druntime for druntime to do what it does. That does sometimes get stretched (e.g. we put all of the OS bindings in druntime rather than just the ones that druntime needs), but Phobos is the standard library, not druntime. druntime is the runtime. If something needs to be there so that the runtime can do its thing, then we put it in druntime, but not much else should be there. Certainly, anything that is there needs to be stuff that's actually core. So, in that respect, it's pretty questionable to move all of std.traits and std.meta into druntime wholesale. What we had for a while was traits just being copied into druntime where they were needed, resulting in duplicate implementations in various places (especially for TypeTuple). Later, they were largely consolidated into core.internal.traits, but they were still completely separate from Phobos. More recently, the traits in core.internal.traits have had their std.traits implementations turned into simple wrappers that use the core.traits symbols (they're not aliased, because that doesn't work well with the documentation, so you get an extra layer of template with every trait whose implementation is in druntime). So, if Manu needs a trait for something in druntime, the normal thing to do at this point would be to just add it to core.internal.traits and then potentially make Phobos wrap the druntime symbol. We don't actually need to move anything wholesale, nor do we need something like core.traits which is intended to make the traits publicly available from druntime. Another issue here if you actually look at core.traits and std.meta is that there are actually several templates which use other pieces of Phobos. e.g. isInputRange gets used in std.meta.aliasSeqOf as does std.array.array, and std.traits.packageName uses startsWith. So, while many of the templates from std.traits and std.meta could be moved, actually trying to move them all wouldn't actually work without reimplementing yet other pieces of Phobos. Personally, I think that the only real benefit of having something like core.traits over having core.internal.traits is that you can move the documentation for those symbols to their druntime implementations and make the Phobos implementations actual aliases with just a link to the druntime documentation instead of needing a thin wrapper template. Anyone wanting to avoid Phobos can still use those traits from std.traits and std.meta just fine, since it wouldn't involve linking against Phobos, just importing the module (even some of the traits that we can't move would work just fine, because they'd just be pulling in other Phobos symbols that don't involve linking). But it does get pretty weird to have some of the traits in core and some in std with no real obvious distinction, since it's based on what druntime needs. So, in that respect, keeping them internal is cleaner. Either way, I think that it's quite clear that we can't move everything. - Jonathan M Davis
Jan 07
prev sibling parent reply Nick Treleaven <nick geany.org> writes:
On Monday, 7 January 2019 at 21:54:15 UTC, Steven Schveighoffer 
wrote:
 std.internal.traits contains pieces of std.meta -- a quick look 
 shows it has AliasSeq (but under the name TypeTuple),
That's fine internally, but std.traits still uses *Tuple for ten public templates: https://github.com/dlang/phobos/pull/6227 If we move any of those to core.traits, please can we finally fix the names.
 traits and meta are really part of the language,
Some in std.traits are tightly coupled to the language, e.g. isInteger. Some are utility templates, e.g. ConstOf, CopyConstness. I think only the former should be public in druntime. Select and select aren't even traits, they should have been in std.meta. Except perhaps AliasSeq, (Alias, Instantiate) all of std.meta seems to be utility templates rather than language feature wrappers.
Jan 08
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 1/8/19 1:59 PM, Nick Treleaven wrote:
 On Monday, 7 January 2019 at 21:54:15 UTC, Steven Schveighoffer wrote:
 std.internal.traits contains pieces of std.meta -- a quick look shows 
 it has AliasSeq (but under the name TypeTuple),
That's fine internally, but std.traits still uses *Tuple for ten public templates: https://github.com/dlang/phobos/pull/6227 If we move any of those to core.traits, please can we finally fix the names.
You'd have to convince Andrei, as he seemingly nixed the PR.
 traits and meta are really part of the language,
Some in std.traits are tightly coupled to the language, e.g. isInteger. Some are utility templates, e.g. ConstOf, CopyConstness. I think only the former should be public in druntime. Select and select aren't even traits, they should have been in std.meta.
What I mean is that you reach for things like what is available in std.traits quite often when doing template constraints or type manipulation. I'd consider ConstOf and CopyConstness to be in that group.
 Except perhaps AliasSeq, (Alias, Instantiate) all of std.meta seems to 
 be utility templates rather than language feature wrappers.
What I mean is that I consider them part of D language, not like a library feature that is optional. But in any case, much of std.traits relies on std.meta. We would at least need the parts in std.meta that are used to build std.traits. -Steve
Jan 08
prev sibling parent Manu <turkeyman gmail.com> writes:
On Mon, Jan 7, 2019 at 1:42 PM H. S. Teoh via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Mon, Jan 07, 2019 at 01:25:17PM -0800, Manu via Digitalmars-d wrote:
 [...]
 Okay, so this is a challenging effort, since phobos is such a tangled
 rats nets of chaos...
It's already gotten better over the years in some ways (though not others -- unfortunately I'm afraid std.traits might be one of the places where things have probably gotten more tangled). It certainly hasn't lived up to the promise of that old Phobos Philosophy page that once existed but has since been removed, of Phobos being a collection of lightweight, self-contained, mostly-orthogonal, reusable components. It has become quite the opposite, where the dependency graph of Phobos modules is approaching pretty close to being a complete graph. (And yes, there are some pretty deep-seated cyclic dependencies that thus far nobody has been able to truly unravel in any satisfactory way.)
 But attempting to move some traits immediately calls into question
 std.meta.  I think we can all agree that Alias and AliasSeq should be
 in druntime along with core traits... but where should it live?
 Should there be core.meta as well? It's kinda like core.traits, in
 that it doesn't include runtime code, it doesn't increase the payload
 of druntime.lib for end-users..
Shouldn't all of core.traits be like that? I'd hardly expect any runtime component to be associated with something called 'traits'.
 Perhaps AliasSeq should live somewhere different?
 I'm feeling like a lean/trimmed-down core.meta might want to exist
 next to core.traits though; it seems reasonable.
I'm afraid this would set the wrong precedent -- since there's core.traits for std.traits and core.meta for std.meta, why not also have core.typecons, core.range, and then it's all gonna go downhill from there, and before you know it there's gonna be core.stdio and core.format... *shudder*
Well, I think you're kinda catastrophising here... I said very clearly "The only way forward is to take each hurdle one at a time", and I mean that as literally as possible. One at a time... The question is: core.meta... yeah? No precedents are being set.. we're not opening flood gates, it's a singular question.
 ...yes, this process will go on and on. The only way forward is to
 take each hurdle one at a time... and ideally, in attempting this
 effort, we can de-tangle a lot of cruft during the process.
I'm tempted to say we should put everything in core.traits for now. And just the absolute bare minimum it takes to meet whatever druntime needs, and nothing more.
Are you saying we should put *everything* in core.traits? That is, put AliasSeq in core.traits? It's objectively NOT a 'traits'...
Jan 07
prev sibling parent reply Mike Franklin <slavo5150 yahoo.com> writes:
On Saturday, 5 January 2019 at 21:12:54 UTC, Manu wrote:

 We should move them to core.traits, and that should be their 
 official home. It really just makes sense. Uncontroversial 
 low-level traits don't belong in phobos.
I'm not really doing much with D anymore, so I apologize for interrupting the conversation, but I had an idea I wanted to offer just in case it might help. This was something I tried to tackle about 6 months ago, called it "UtiliD" (https://forum.dlang.org/post/wgkbamnlraustaycbbya forum.dlang.org). I ultimately failed due to the tangle of Phobos, and because my life priorities changed I gave up and deleted my repository (oops!). Anyway, my suggestion is to create a new library separate from druntime and phobos that has no dependencies whatsoever (no libc, no libstdc++, no OS dependencies, no druntime dependency, etc.). I mean it; **no dependencies**. Not even object.d. The only thing it should require is a D compiler. That library can then be imported by druntime, phobos, betterC builds, or even the compiler itself. It will take strict enforcement of the "no dependency" rule and good judgment to keep the scope from ballooning, but it may be a good place for things like `traits`, `meta` and others. Hope I'm not just making noise. Mike
Jan 07
next sibling parent reply Mike Franklin <slavo5150 yahoo.com> writes:
On Tuesday, 8 January 2019 at 01:44:08 UTC, Mike Franklin wrote:

 Anyway, my suggestion is to create a new library separate from 
 druntime and phobos that has no dependencies whatsoever (no 
 libc, no libstdc++, no OS dependencies, no druntime dependency, 
 etc.).  I mean it; **no dependencies**.  Not even object.d.  
 The only thing it should require is a D compiler.

 That library can then be imported by druntime, phobos, betterC 
 builds, or even the compiler itself. It will take strict 
 enforcement of the "no dependency" rule and good judgment to 
 keep the scope from ballooning, but it may be a good place for 
 things like `traits`, `meta` and others.
I spent some time trying to think through some of the issues with druntime, and came up with this: Right now, druntime is somewhat of a monolith trying to be too many things. * utilities (traits, string utilities, type conversion utilities, etc...) * compiler lowerings * C standard library bindings * C++ standard library bindings * C standard library bindings * Operating system bindings * OS abstractions (thread, fibers, context switching, etc...) * Compiler lowerings * DWARF implementation * TLS implementation * GC * (probably more) So, I suggest something like this: ---------------------------------- * core.util - a.k.a utiliD - Just utility implementations written in D (e.g `std.traits`, `std.meta`, etc. No dependencies whatsoever. No operating system or platform abstractions. No high-level language features(e.g. exceptions) * public imports: (none) * private imports: (none) * core.stdc - C standard library bindings - libc functions verbatim; no convenience or utility implementations * public imports: (none) * private imports: core.util * core.stdcpp - C++ standard library bindings - libstdc++ data structures verbatim; no convenience or utility implementations * public imports: (none) * private imports: core.util * sys - OS/Platform bindings - operating system implementations verbatim; no convenience or utility implementations * public imports: (none) * private imports: core.util * core.pal - Platform/OS abstractions - threads, fibers, context switching, etc. * public imports: (none) * private imports: core.util, sys, core.libc * core.d - compiler support (compiler lowerings, runtime initialization, TLS implementation, DWARF implementation, GC, etc...) * public imports : core.util * private imports : core.pal * druntime - Just a top-level package containing public imports, aliases, and compiler support. No other implementations * public imports: core.pal, core.d * private imports: core.util * std - phobos * public imports: (none) * private imports: druntime There are likely other suitable ways to organize it, but that's just what I could come up with after thinking through it a little. I would prefer if each of those were in their own repository and even move some of them to Deimos or dub, but that would probably irritate a lot of people. I'd also prefer to have each of those in their own packages, but D is probably too deep in technical debt for that. (See also https://issues.dlang.org/show_bug.cgi?id=11666) So, to make it more palatable, I suggest: ----------------------------------------- * `core.util` gets own repository so it can be independently added to other repositories as a self-contained/freestanding dependency * `core.stdc`, `core.stdcpp`, `sys`, `core.pal`, and `core.d` all go into the druntime monolith like it is today. * phobos remains much like it is today. In the context of the discussion at hand, `std.traits`, `std.meta`, and other utilities can be moved to `core.util`. `core.util` can then be added as a dependency to dmd, druntime, and phobos. The rest will probably have to wait for D3 :/ Mike
Jan 07
parent Jacob Carlborg <doob me.com> writes:
On 2019-01-08 06:37, Mike Franklin wrote:

 I spent some time trying to think through some of the issues with 
 druntime, and came up with this:
 
 Right now, druntime is somewhat of a monolith trying to be too many things.
    * utilities (traits, string utilities, type conversion utilities, 
 etc...)
    * compiler lowerings
    * C standard library bindings
    * C++ standard library bindings
    * C standard library bindings
    * Operating system bindings
    * OS abstractions (thread, fibers, context switching, etc...)
    * Compiler lowerings
    * DWARF implementation
    * TLS implementation
    * GC
    * (probably more)
 
 So, I suggest something like this:
 ----------------------------------
 * core.util - a.k.a utiliD - Just utility implementations written in D 
 (e.g `std.traits`, `std.meta`, etc. No dependencies whatsoever. No 
 operating system or platform abstractions. No high-level language 
 features(e.g. exceptions)
      * public imports: (none)
      * private imports: (none)
 
 * core.stdc - C standard library bindings - libc functions verbatim; no 
 convenience or utility implementations
      * public imports: (none)
      * private imports: core.util
 
 * core.stdcpp - C++ standard library bindings - libstdc++ data 
 structures verbatim; no convenience or utility implementations
      * public imports: (none)
      * private imports: core.util
 
 * sys - OS/Platform bindings - operating system implementations 
 verbatim; no convenience or utility implementations
      * public imports: (none)
      * private imports: core.util
 
 * core.pal - Platform/OS abstractions - threads, fibers, context 
 switching, etc.
      * public imports: (none)
      * private imports: core.util, sys, core.libc
 
 * core.d - compiler support (compiler lowerings, runtime initialization, 
 TLS implementation, DWARF implementation, GC, etc...)
      * public imports : core.util
      * private imports : core.pal
 
 * druntime - Just a top-level package containing public imports, 
 aliases, and compiler support. No other implementations
      * public imports: core.pal, core.d
      * private imports: core.util
 
 * std - phobos
      * public imports: (none)
      * private imports: druntime
 
 There are likely other suitable ways to organize it, but that's just 
 what I could come up with after thinking through it a little.
 
 I would prefer if each of those were in their own repository and even 
 move some of them to Deimos or dub, but that would probably irritate a 
 lot of people.  I'd also prefer to have each of those in their own 
 packages, but D is probably too deep in technical debt for that.  (See 
 also https://issues.dlang.org/show_bug.cgi?id=11666)
 
 So, to make it more palatable, I suggest:
 -----------------------------------------
    * `core.util` gets own repository so it can be independently added to 
 other repositories as a self-contained/freestanding dependency
 
    * `core.stdc`, `core.stdcpp`, `sys`, `core.pal`, and `core.d` all go 
 into the druntime monolith like it is today.
 
    * phobos remains much like it is today.
 
 In the context of the discussion at hand, `std.traits`, `std.meta`, and 
 other utilities can be moved to `core.util`. `core.util` can then be 
 added as a dependency to dmd, druntime, and phobos.  The rest will 
 probably have to wait for D3 :/
I like this approach. -- /Jacob Carlborg
Jan 09
prev sibling parent reply kinke <noone nowhere.com> writes:
On Tuesday, 8 January 2019 at 01:44:08 UTC, Mike Franklin wrote:
 Anyway, my suggestion is to create a new library separate from 
 druntime and phobos that has no dependencies whatsoever (no 
 libc, no libstdc++, no OS dependencies, no druntime dependency, 
 etc.).  I mean it; **no dependencies**.  Not even object.d.  
 The only thing it should require is a D compiler.

 That library can then be imported by druntime, phobos, betterC 
 builds, or even the compiler itself. It will take strict 
 enforcement of the "no dependency" rule and good judgment to 
 keep the scope from ballooning, but it may be a good place for 
 things like `traits`, `meta` and others.
I also feel the need for at least 1 another base library. My focus is on the fundamental compiler support functions, like initializing/comparing/copying arrays and general associative arrays support, as they are fundamental to the language and their compilers (not talking about TypeInfos, ModuleInfos, Object etc.). I think we need such a base library in order to improve -betterC and its available language features. The important thing would be to try to reduce the external dependencies of that lib to an absolute minimum, similar to rust's core library (just 5 symbols: mem{cpy,cmp,set} + rust_begin_panic + rust_eh_personality), although we'll probably need some some primitives, e.g., malloc/realloc/free. If that's possible, using D for bare-metal targets without a C library (e.g., a future WebAssembly version with direct access to GC, or your own OS kernel/firmware) would probably become awesome, as you'd only need to implement maybe a dozen of symbols.
Jan 08
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 1/8/19 4:23 PM, kinke wrote:
 On Tuesday, 8 January 2019 at 01:44:08 UTC, Mike Franklin wrote:
 Anyway, my suggestion is to create a new library separate from 
 druntime and phobos that has no dependencies whatsoever (no libc, no 
 libstdc++, no OS dependencies, no druntime dependency, etc.).  I mean 
 it; **no dependencies**.  Not even object.d. The only thing it should 
 require is a D compiler.

 That library can then be imported by druntime, phobos, betterC builds, 
 or even the compiler itself. It will take strict enforcement of the 
 "no dependency" rule and good judgment to keep the scope from 
 ballooning, but it may be a good place for things like `traits`, 
 `meta` and others.
I also feel the need for at least 1 another base library. My focus is on the fundamental compiler support functions, like initializing/comparing/copying arrays and general associative arrays support, as they are fundamental to the language and their compilers (not talking about TypeInfos, ModuleInfos, Object etc.).
This is self-contradictory, as AA's require TypeInfo. Though I agree with the goal. It's just not a "now" goal, we first need to fix these components so they DON'T depend on such things as TypeInfo. -Steve
Jan 08
parent reply Mike Franklin <slavo5150 yahoo.com> writes:
On Tuesday, 8 January 2019 at 21:26:51 UTC, Steven Schveighoffer 
wrote:

 I also feel the need for at least 1 another base library. My 
 focus is on the fundamental compiler support functions, like 
 initializing/comparing/copying arrays and general associative 
 arrays support, as they are fundamental to the language and 
 their compilers (not talking about TypeInfos, ModuleInfos, 
 Object etc.).
This is self-contradictory, as AA's require TypeInfo. Though I agree with the goal. It's just not a "now" goal, we first need to fix these components so they DON'T depend on such things as TypeInfo.
Steven is right (as usual) here. There has to be a serious effort to remove the dependency on runtime information that is available at compile-time. I tried quite hard on that in 2017~2018, but I ran into all sorts of problems. Exhibit A: We can set an array's length in ` safe`, `nothrow`, `pure` code. But, it gets lowered to a runtime hook that is neither ` safe`, `nothrow`, nor `pure` (https://github.com/dlang/druntime/blob/e47a00bff935c3f079bb567a6ec97663ba384487/src/r /lifetime.d#L1265). In other words, the compiler-runtime interface is a lie. So, if you try to rewrite that as a template to remove the dependency on `TypeInfo`, then the template will run through the semantic phase of the compiler and now you have to be honest, and it doesn't compiler. So, to make that work you have to make all of the code that `_d_arraysetlengthT` calls ` safe`, `nothrow`, nor `pure` to prevent breakage, you'll find that none of it compiles because the "turtles at the bottom" (i.e. `memcpy`, `malloc`, etc...) aren't `pure` or whatever attribute constraint you're trying to apply. Exhibit B: I tried to convert `_d_arraycast` to a template in https://github.com/dlang/druntime/pull/2268 and ran into similar problems. Some tried to help with a `pureMalloc` implementation in https://github.com/dlang/druntime/pull/2276, but that didn't go well either. Walter responded with "Since realloc() free's memory, it cannot ever be considered pure." Well, what the hell are we supposed to do then? IMO, this having dynamic stack allocation for arrays and strings will help (https://issues.dlang.org/show_bug.cgi?id=18788). GDC and LDC already provide this, but DMD's implementation is in druntime (https://github.com/dlang/druntime/blob/9a8edfb48e4842180c706ee26ebd8edb10be53 4/src/rt/alloca.d), so it requires linking in druntime, and now we're at a catch 22. I asked Walter for help with this, as it is beyond my current skills, but he said he didn't have time. Here's what I think will help: 1. Get `alloca` or dynamic stack array allocation working. This will help a lot because we won't have to reach for `malloc` and friends for simple allocations like generating dynamic assert messages 2. Convert `memcpy`, `memset`, and `memcmp` to strongly-typed D templates so they can be used in the implementations when converting runtime hooks to templates. I did some exploration on that and published my results at https://github.com/JinShil/memcpyD. Unfortunately, DMD is missing an AVX512 implementation so I couldn't continue. Lots of obstacles here and I don't see it happening without Walter and Andrei making it a priority. Mike
Jan 08
next sibling parent reply Neia Neutuladh <neia ikeran.org> writes:
On Wed, 09 Jan 2019 02:32:50 +0000, Mike Franklin wrote:
 I tried to convert `_d_arraycast` to a template in
 https://github.com/dlang/druntime/pull/2268 and ran into similar
 problems.  Some tried to help with a `pureMalloc` implementation in
 https://github.com/dlang/druntime/pull/2276, but that didn't go well
 either.  Walter responded with "Since realloc() free's memory, it cannot
 ever be considered pure."  Well, what the hell are we supposed to do
 then?
The specific thing that he replied to was having a public symbol for realloc that was considered pure. Perhaps a private fakePureRealloc() would be more palatable?
Jan 08
parent Mike Franklin <slavo5150 yahoo.com> writes:
On Wednesday, 9 January 2019 at 03:32:17 UTC, Neia Neutuladh 
wrote:

 The specific thing that he replied to was having a public 
 symbol for realloc that was considered pure. Perhaps a private 
 fakePureRealloc() would be more palatable?
Perhaps; I'm not sure. The `pureMalloc` implementation is a lot of clever hackery anyway, so I think it would be best to just implement stack-allocated dynamic arrays (i.e. https://issues.dlang.org/show_bug.cgi?id=18788) and avoid the games. That would have solved the immediate need I had for converting runtime hooks to templates, and would help some of that work move forward. Mike
Jan 08
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2019-01-09 03:32, Mike Franklin wrote:

 Here's what I think will help:
 1.  Get `alloca` or dynamic stack array allocation working.  This will 
 help a lot because we won't have to reach for `malloc` and friends for 
 simple allocations like generating dynamic assert messages
What's the problem with "alloca"?
 2.  Convert `memcpy`, `memset`, and `memcmp` to strongly-typed D 
 templates so they can be used in the implementations when converting 
 runtime hooks to templates.  I did some exploration on that and 
 published my results at https://github.com/JinShil/memcpyD.  
 Unfortunately, DMD is missing an AVX512 implementation so I couldn't 
 continue.
What do you mean "couldn't continue"? It's possible to implement "memcpy" without AVX512. Am I missing something? -- /Jacob Carlborg
Jan 09
parent reply Mike Franklin <slavo5150 yahoo.com> writes:
On Wednesday, 9 January 2019 at 11:01:46 UTC, Jacob Carlborg 
wrote:
 On 2019-01-09 03:32, Mike Franklin wrote:

 Here's what I think will help:
 1.  Get `alloca` or dynamic stack array allocation working.  
 This will help a lot because we won't have to reach for 
 `malloc` and friends for simple allocations like generating 
 dynamic assert messages
What's the problem with "alloca"?
In DMD you can't use it without linking in the runtime, but in LDC and GDC, you can. One of the goals of implementing these runtime hooks as templates is to make more features available in -betterC builds, or for pay-as-you-go runtime implementations. If you need to link in druntime to get `alloca`, you can't implement the runtime hooks as templates and have them work in -betterC.
 2.  Convert `memcpy`, `memset`, and `memcmp` to strongly-typed 
 D templates so they can be used in the implementations when 
 converting runtime hooks to templates.  I did some exploration 
 on that and published my results at 
 https://github.com/JinShil/memcpyD.  Unfortunately, DMD is 
 missing an AVX512 implementation so I couldn't continue.
What do you mean "couldn't continue"? It's possible to implement "memcpy" without AVX512. Am I missing something?
Yes, it's possible, but I don't think it will ever be accepted if it doesn't perform at least as well as the optimized versions in C or assembly that use AVX512 or other SIMD features. It needs to be at least as good as what libc provides, so we need to be able to leverage these unique hardware features to get the best performance. Mike
Jan 09
next sibling parent reply Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Wednesday, 9 January 2019 at 11:49:40 UTC, Mike Franklin wrote:
 On Wednesday, 9 January 2019 at 11:01:46 UTC, Jacob Carlborg 
 wrote:
 On 2019-01-09 03:32, Mike Franklin wrote:

 Here's what I think will help:
 1.  Get `alloca` or dynamic stack array allocation working.  
 This will help a lot because we won't have to reach for 
 `malloc` and friends for simple allocations like generating 
 dynamic assert messages
What's the problem with "alloca"?
In DMD you can't use it without linking in the runtime, but in LDC and GDC, you can. One of the goals of implementing these runtime hooks as templates is to make more features available in -betterC builds, or for pay-as-you-go runtime implementations. If you need to link in druntime to get `alloca`, you can't implement the runtime hooks as templates and have them work in -betterC.
 2.  Convert `memcpy`, `memset`, and `memcmp` to 
 strongly-typed D templates so they can be used in the 
 implementations when converting runtime hooks to templates.  
 I did some exploration on that and published my results at 
 https://github.com/JinShil/memcpyD.  Unfortunately, DMD is 
 missing an AVX512 implementation so I couldn't continue.
What do you mean "couldn't continue"? It's possible to implement "memcpy" without AVX512. Am I missing something?
Yes, it's possible, but I don't think it will ever be accepted if it doesn't perform at least as well as the optimized versions in C or assembly that use AVX512 or other SIMD features. It needs to be at least as good as what libc provides, so we need to be able to leverage these unique hardware features to get the best performance.
AVX512 concerns only a very small part of processors on the market (Skylake, Canon Lake and Cascade Lake). AMD will never implement it and the number of people upgrading to one of the lake cpus from some recent chip is also not that great. I don't see why not having it implemented yet is blocking anything. People who really need AVX512 performance will have implemented memcpy themselves already and for the others, they will have to wait a little bit. It's not as if it couldn't be added later. I really don't understand the problem. This said, another issue with memcpy that very often gets lost is that, because of the fancy benchmarking, its system performance cost is often wrongly assessed, and a lot of heroic efforts are put in optimizing big block transfers, while in reality it's mostly called on small (postblit) to medium blocks. Linus Torvalds had once a rant on that subject on realworldtech. https://www.realworldtech.com/forum/?threadid=168200&curpostid=168589
Jan 09
next sibling parent bioinfornatics <bioinfornatics fedoraproject.org> writes:
On Wednesday, 9 January 2019 at 12:31:13 UTC, Patrick Schluter 
wrote:
 On Wednesday, 9 January 2019 at 11:49:40 UTC, Mike Franklin 
 wrote:
 [...]
AVX512 concerns only a very small part of processors on the market (Skylake, Canon Lake and Cascade Lake). AMD will never implement it and the number of people upgrading to one of the lake cpus from some recent chip is also not that great. I don't see why not having it implemented yet is blocking anything. People who really need AVX512 performance will have implemented memcpy themselves already and for the others, they will have to wait a little bit. It's not as if it couldn't be added later. I really don't understand the problem. This said, another issue with memcpy that very often gets lost is that, because of the fancy benchmarking, its system performance cost is often wrongly assessed, and a lot of heroic efforts are put in optimizing big block transfers, while in reality it's mostly called on small (postblit) to medium blocks. Linus Torvalds had once a rant on that subject on realworldtech. https://www.realworldtech.com/forum/?threadid=168200&curpostid=168589
By reading (quiclkly) these articles: - https://lemire.me/blog/2018/04/19/by-how-much-does-avx-512-slow-down-your-cpu-a-first-experiment/ - https://lemire.me/blog/2018/09/07/avx-512-when-and-how-to-use-these-new-instructions/ it seem that using avx512 can be good if you pin a thread to a core in order to process only avx512 statement.
Jan 09
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jan 09, 2019 at 12:31:13PM +0000, Patrick Schluter via Digitalmars-d
wrote:
[...]
 This said, another issue with memcpy that very often gets lost is
 that, because of the fancy benchmarking, its system performance cost
 is often wrongly assessed, and a lot of heroic efforts are put in
 optimizing big block transfers, while in reality it's mostly called on
 small (postblit) to medium blocks.
EXACTLY!!! Some time ago I took an interest in implementing the equivalent of strchr in the most optimized way possible. For that, I wrote several of my own algorithms and also perused the glibc implementation. Eventually, I realized that the glibc implementation, which uses fancy 64-bit-word scanning with a lot of setup overhead and messy starting/trailing cases, is optimizing for very large scans, i.e., when the byte being sought occurs only rarely in a very large haystack. In those cases it's at the top of benchmarks. However, in the arguably more common case where the byte being sought occurs relatively frequently in small- to medium-sized haystacks, repeatedly searching the haystack incurs a ton of overhead setting up all that fancy machinery, branch hazards, and what-not, where a plain ole `while (*ptr++ != needle) {}` works much better. I suspect many of the C library functions of this sort (incl. memcpy + friends) have a tendency to suffer from this sort of premature optimization. Not to mention that often overly-specialized benchmarks of this sort fail to account for bias caused by the CPU's branch predictor learning the benchmark and the cache hierarchy amortizing the cost of repeatedly searching the same haystack -- things you rarely do in real-life applications. There's a big risk of your "super-optimized" algorithm ending up optimizing for an unrealistic use-case, but having only mediocre or sometimes even poor performance in real-world computations.
 Linus Torvalds had once a rant on that subject on realworldtech.
 https://www.realworldtech.com/forum/?threadid=168200&curpostid=168589
Nice. T -- If the comments and the code disagree, it's likely that *both* are wrong. -- Christopher
Jan 09
parent reply jmh530 <john.michael.hall gmail.com> writes:
On Wednesday, 9 January 2019 at 17:40:38 UTC, H. S. Teoh wrote:
 [snip]

 EXACTLY!!!

 Some time ago I took an interest in implementing the equivalent 
 of strchr in the most optimized way possible. For that, I wrote 
 several of my own algorithms and also perused the glibc 
 implementation.

 Eventually, I realized that the glibc implementation, which 
 uses fancy 64-bit-word scanning with a lot of setup overhead 
 and messy starting/trailing cases, is optimizing for very large 
 scans, i.e., when the byte being sought occurs only rarely in a 
 very large haystack.  In those cases it's at the top of 
 benchmarks.  However, in the arguably more common case where 
 the byte being sought occurs relatively frequently in small- to 
 medium-sized haystacks, repeatedly searching the haystack 
 incurs a ton of overhead setting up all that fancy machinery, 
 branch hazards, and what-not, where a plain ole `while (*ptr++ 
 != needle) {}` works much better.

 I suspect many of the C library functions of this sort (incl. 
 memcpy + friends) have a tendency to suffer from this sort of 
 premature optimization.

 Not to mention that often overly-specialized benchmarks of this 
 sort fail to account for bias caused by the CPU's branch 
 predictor learning the benchmark and the cache hierarchy 
 amortizing the cost of repeatedly searching the same haystack 
 -- things you rarely do in real-life applications.  There's a 
 big risk of your "super-optimized" algorithm ending up 
 optimizing for an unrealistic use-case, but having only 
 mediocre or sometimes even poor performance in real-world 
 computations.
One thing I like about libmir's sum function http://docs.algorithm.dlang.io/latest/mir_math_sum.html was that the algorithm you use to return the sum can be chosen with an enum on the template. So it's really a collection of different sum algorithms all in one. Set the default as something reasonable and then let the user decide if they want something else.
Jan 09
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jan 09, 2019 at 06:55:30PM +0000, jmh530 via Digitalmars-d wrote:
[...]
 One thing I like about libmir's sum function
 http://docs.algorithm.dlang.io/latest/mir_math_sum.html
 was that the algorithm you use to return the sum can be chosen with an
 enum on the template. So it's really a collection of different sum
 algorithms all in one. Set the default as something reasonable and
 then let the user decide if they want something else.
That's an excellent idea. Have a generic default algorithm that performs reasonably well in typical use cases, but also give the user the power to choose a different algorithm if he knows that it would work better with his particular use case. Empowering the user -- over time I've come to learn that this is always the best approach to API design. It's one that has the best chance of standing the test of time. Fancy APIs that don't pay enough attention to this principle tend to eventually fade into obscurity. T -- I am Ohm of Borg. Resistance is voltage over current.
Jan 09
parent Mike Franklin <slavo5150 yahoo.com> writes:
On Wednesday, 9 January 2019 at 19:25:35 UTC, H. S. Teoh wrote:

 That's an excellent idea.  Have a generic default algorithm 
 that performs reasonably well in typical use cases, but also 
 give the user the power to choose a different algorithm if he 
 knows that it would work better with his particular use case.

 Empowering the user -- over time I've come to learn that this 
 is always the best approach to API design.  It's one that has 
 the best chance of standing the test of time.  Fancy APIs that 
 don't pay enough attention to this principle tend to eventually 
 fade into obscurity.
Yes, this is one of the benefits of making `memcpy(T)(T* dest, T* src)` instead of `memcpy(void* dest, void* src, size_t num)`. One can generate a `memcpy` at compile-time that is optimized to the machine that the program is being compiled on (or for). druntime could expose "memcpy configuration settings" for users to tune at compile-time. But, then you have to deal with distribution of binaries. If you are compiling a binary that you want to be able to run on all Intel 64-bit PCs, for example, you can't do that tuning at compile-time; it has to be done at runtime. Assuming my understanding is correct, Agner Fog's implementation sets a function pointer to the most optimized implementation for the machine the program is running on based on an inspection fo the CPU's capabilities at the first invocation of `memcpy`. There's a lot of things like this to consider in order to create a professional `memcpy` implementation. Personally, I'd just like to put the infrastructure in place so those more talented than I can tune it. But as I said before, that first PR that puts said infrastructure in place needs to be justified, and I predict it will be difficult to overcome bias and perception. Reading the comments in this thread fills me with a little more optimism that I'm not the only one who thinks it's a good idea. But, we still need dynamic stack allocation first before any of this can happen. Mike
Jan 09
prev sibling parent reply Mike Franklin <slavo5150 yahoo.com> writes:
On Wednesday, 9 January 2019 at 12:31:13 UTC, Patrick Schluter 
wrote:

 AVX512 concerns only a very small part of processors on the 
 market (Skylake, Canon Lake and Cascade Lake). AMD will never 
 implement it and the number of people upgrading to one of the 
 lake cpus from some recent chip is also not that great.
Yes, I agree, and even the newer chips have "Enhanced REP MOVSB and STOSB operation (ERMSB)" which can compensate. See https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-opt mization-manual.pdf 3.7.6.
 I don't see why not having it implemented yet is blocking 
 anything. People who really need AVX512 performance will have 
 implemented memcpy themselves already and for the others, they 
 will have to wait a little bit. It's not as if it couldn't be 
 added later. I really don't understand the problem.
I remember analyzing other implementations of `memcpy` and they were all using AVX512. I had faith in the authors of those implementations (e.g. Agner Fog) that they knew more than me, so that was what I should be using. Perhaps I should revisit it and just do the best that DMD can do. But also keep in mind that there's a strategy to getting things accepted in DMD and elsewhere. You are often battling perception. The single most challenging aspect of implementing `memcpy` in D is overcoming bias and justifying it to the obstructionists that see it as a complete waste of time. If I can't implement it in AVX512 simply for the purpose of measurement and comparison, it will be more difficult to justify.
 This said, another issue with memcpy that very often gets lost 
 is that, because of the fancy benchmarking, its system 
 performance cost is often wrongly assessed, and a lot of heroic 
 efforts are put in optimizing big block transfers, while in 
 reality it's mostly called on small (postblit) to medium 
 blocks. Linus Torvalds had once a rant on that subject on 
 realworldtech.
 https://www.realworldtech.com/forum/?threadid=168200&curpostid=168589
I understand. I also encountered a lot of difficulting getting consistent measurements in my exploration. Doing proper measurement and analysis for this kind of thing is a skill in and of itself. You're right about the small copies being the norm. As part of my exploration, I write a logging `memcpy` wrapper to see what kind of copies DMD was doing when it compiled itself, and it was as you describe. Perhaps I'll give it another go at a later time, but we need to get dynamic stack allocation working first because many of the runtime hook implementations that will utilize `memcpy` do some error checking and assertions, and we need to be able to generate dynamic error messages for those assertions when the caller is `pure`. We need a solution to this (https://issues.dlang.org/show_bug.cgi?id=18788) first. Mike
Jan 09
parent reply Ethan <gooberman gmail.com> writes:
On Thursday, 10 January 2019 at 00:10:18 UTC, Mike Franklin wrote:
 I remember analyzing other implementations of `memcpy` and they 
 were all using AVX512.  I had faith in the authors of those 
 implementations (e.g. Agner Fog) that they knew more than me, 
 so that was what I should be using. Perhaps I should revisit it 
 and just do the best that DMD can do.
AVX512 is a superset of AVX2, is a superset of AVX, is a superset of SSE. I expect the implementations you were looking at are actually implemented in SSE, where SSE2 is a baseline expectation for x64 processors. I've done some AVX2 code recently with 256-bit values. The performance is significantly slower on AMD processors. I assume their pipeline internally is still 128 bit as a result, and while my 256-bit code can run faster on Intel it needs to run on AMD so I've dropped to 128-bit instructions at most - effectively keeping my code SSE4.1 compatible. I've done a memset_pattern4[1] implementation in SSE previously. The important instruction group is _mm_stream. Which, you will note, was an instruction group first introduced in SSE1 and hasn't had additional writing stream functions added since since SSE 4.1[2]. [1] https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/memset_pattern4.3.html [2] https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=5119,5452,5443,5910,5288,5119,5249,5231&text=_mm_stream
Jan 10
next sibling parent Ethan <gooberman gmail.com> writes:
On Thursday, 10 January 2019 at 10:13:57 UTC, Ethan wrote:
 I've done a memset_pattern4[1] implementation in SSE 
 previously. The important instruction group is _mm_stream. 
 Which, you will note, was an instruction group first introduced 
 in SSE1 and hasn't had additional writing stream functions 
 added since since SSE 4.1[2].
Where's the edit button. The last writing stream function was added in SSE2. A streaming load was added in SSE 4.1 I believe I used that load when optimising string compares.
Jan 10
prev sibling parent reply luckoverthere <luckoverthere gmail.cm> writes:
On Thursday, 10 January 2019 at 10:13:57 UTC, Ethan wrote:
 On Thursday, 10 January 2019 at 00:10:18 UTC, Mike Franklin 
 wrote:
 I remember analyzing other implementations of `memcpy` and 
 they were all using AVX512.  I had faith in the authors of 
 those implementations (e.g. Agner Fog) that they knew more 
 than me, so that was what I should be using. Perhaps I should 
 revisit it and just do the best that DMD can do.
AVX512 is a superset of AVX2, is a superset of AVX, is a superset of SSE. I expect the implementations you were looking at are actually implemented in SSE, where SSE2 is a baseline expectation for x64 processors. I've done some AVX2 code recently with 256-bit values. The performance is significantly slower on AMD processors. I assume their pipeline internally is still 128 bit as a result, and while my 256-bit code can run faster on Intel it needs to run on AMD so I've dropped to 128-bit instructions at most - effectively keeping my code SSE4.1 compatible. I've done a memset_pattern4[1] implementation in SSE previously. The important instruction group is _mm_stream. Which, you will note, was an instruction group first introduced in SSE1 and hasn't had additional writing stream functions added since since SSE 4.1[2]. [1] https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/memset_pattern4.3.html [2] https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=5119,5452,5443,5910,5288,5119,5249,5231&text=_mm_stream
That's disappointing to learn. Ryzen has four 128-bit AVX units, 2 of them can only do addition and the other 2 can only do multiplication. Not sure how the memory is shared between units but if it isn't then it'd need to copy to be able to do an addition then a multiplication.
Jan 10
parent reply Ethan <gooberman gmail.com> writes:
On Thursday, 10 January 2019 at 21:01:09 UTC, luckoverthere wrote:
 That's disappointing to learn. Ryzen has four 128-bit AVX 
 units, 2 of them can only do addition and the other 2 can only 
 do multiplication. Not sure how the memory is shared between 
 units but if it isn't then it'd need to copy to be able to do 
 an addition then a multiplication.
The good news though is that Ryzen's 128-bit pipeline outperforms my Skylake i7 with this code. So you could say they've optimised for the majority usecase. It's reaaaaaally beneficial to do 256-bit logic for my particular use case here since I'm sampling and operating on 8 32-bit values at a time to produce a 32-bit output. But eh, I've gotta write for the build farm hardware.
Jan 11
parent reply bioinfornatics <bioinfornatics fedoraproject.org> writes:
On Friday, 11 January 2019 at 09:36:09 UTC, Ethan wrote:
 On Thursday, 10 January 2019 at 21:01:09 UTC, luckoverthere 
 wrote:
 That's disappointing to learn. Ryzen has four 128-bit AVX 
 units, 2 of them can only do addition and the other 2 can only 
 do multiplication. Not sure how the memory is shared between 
 units but if it isn't then it'd need to copy to be able to do 
 an addition then a multiplication.
The good news though is that Ryzen's 128-bit pipeline outperforms my Skylake i7 with this code. So you could say they've optimised for the majority usecase. It's reaaaaaally beneficial to do 256-bit logic for my particular use case here since I'm sampling and operating on 8 32-bit values at a time to produce a 32-bit output. But eh, I've gotta write for the build farm hardware.
Hi ethan, could you share a piece of code to do that ? thanks you
Jan 11
parent reply Ethan <gooberman gmail.com> writes:
On Friday, 11 January 2019 at 11:10:10 UTC, bioinfornatics wrote:
 Hi ethan, could you share a piece of code to do that ?

 thanks you
Not really. 1) It's very context specific 2) It's for my current employer and is subject to the usual code disclosure NDAs
Jan 11
parent bioinfornatics <bioinfornatics fedoraproject.org> writes:
On Friday, 11 January 2019 at 11:47:20 UTC, Ethan wrote:
 On Friday, 11 January 2019 at 11:10:10 UTC, bioinfornatics 
 wrote:
 Hi ethan, could you share a piece of code to do that ?

 thanks you
Not really. 1) It's very context specific 2) It's for my current employer and is subject to the usual code disclosure NDAs
OK I understand, no problem 😉 So I could try to use this idea for training. As example take 8 value of 32 bit and return the sum or others... But I though AMD had 2 units for sum and units for multiply. I need to get a better understanding on this topics 🤔
Jan 11
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2019-01-09 12:49, Mike Franklin wrote:

 In DMD you can't use it without linking in the runtime, but in LDC and 
 GDC, you can.  One of the goals of implementing these runtime hooks as 
 templates is to make more features available in -betterC builds, or for 
 pay-as-you-go runtime implementations. If you need to link in druntime 
 to get `alloca`, you can't implement the runtime hooks as templates and 
 have them work in -betterC.
Ah, I see.
 Yes, it's possible, but I don't think it will ever be accepted if it 
 doesn't perform at least as well as the optimized versions in C or 
 assembly that use AVX512 or other SIMD features.  It needs to be at 
 least as good as what libc provides, so we need to be able to leverage 
 these unique hardware features to get the best performance.
Perhaps it could be considered as a fallback when a "memcpy" isn't available. -- /Jacob Carlborg
Jan 09
parent Mike Franklin <slavo5150 yahoo.com> writes:
On Wednesday, 9 January 2019 at 19:24:28 UTC, Jacob Carlborg 
wrote:

 Yes, it's possible, but I don't think it will ever be accepted 
 if it doesn't perform at least as well as the optimized 
 versions in C or assembly that use AVX512 or other SIMD 
 features.  It needs to be at least as good as what libc 
 provides, so we need to be able to leverage these unique 
 hardware features to get the best performance.
Perhaps it could be considered as a fallback when a "memcpy" isn't available.
I'm not sure what you mean. DMD currently links in libc, so `memcpy` is always available. Also, it's difficult for me to articulate, but we don't want `void* memcpy(void* destination, const void* source, size_t num)` rewritten in D. We need `void memcpy(T)(T* destination, const T* source)` or some other strongly typed template like that. And as an aside, thanks to https://github.com/dlang/dmd/pull/8504 we now have to be careful about the order of arguments. Anyway, I'm not sure there's much point in hashing this out right now. We need dynamic stack allocation first before any of this can happen because the runtime hooks need to be able to generate dynamic assertion messages in -betterC, and there's only one person I know of that can do that (Walter), and I don't think it's a priority for him right now. Mike
Jan 09