digitalmars.D - Semantics of toString

Justin Johansson (14/14) Nov 05 2009 I assert that the semantics of "toString" or similarly named/purposed me...

Michal Minich (10/34) Nov 05 2009 My practice tells me to use toString only for debugging - to quickly get...
Ary Borenszweig (7/8) Nov 05 2009 A string useful for debugging purposes and, when possible, useful for
Jesse Phillips (5/27) Nov 05 2009 Well some Java author said that the toSting method was only intended for...
Justin Whear (4/26) Nov 05 2009 Two things:
Nick Sabalausky (9/28) Nov 05 2009 (Deliberately not reading the other replies before posting...)
Don (3/19) Nov 05 2009 It's a hack from the early days of D. Should be unavailable unless the

Nick Sabalausky (3/7) Nov 05 2009 What don't you like about it?

Don (18/28) Nov 05 2009 It cannot even do the most basic stuff.

Yigal Chripun (17/45) Nov 06 2009 The first issue you raise is IMO a problem with writefln and not with

=?ISO-8859-1?Q?Pelle_M=E5nsson?= (3/56) Nov 06 2009 How do you do %.3f in {}-notation?

Yigal Chripun (12/68) Nov 06 2009 That is incorrect since in my example I use the format string to switch

Leandro Lucarella (20/28) Nov 06 2009 This is horrible, horrible for internationalization, you just can't assu...

Andrei Alexandrescu (13/40) Nov 06 2009 I think you found a bug in Phobos. I tried this:

Leandro Lucarella (25/54) Nov 06 2009 Yes.

Andrei Alexandrescu (4/52) Nov 06 2009 Thanks!

Yigal Chripun (24/46) Nov 06 2009 F in the above is _not_ a type specifier. It is a format specifier that

Andrei Alexandrescu (4/79) Nov 06 2009 Not sure to what extent it helps, but Phobos supports positional

dsimcha (10/29) Nov 05 2009 readers

div0 (17/20) Nov 05 2009 -----BEGIN PGP SIGNED MESSAGE-----

Justin Johansson (6/26) Nov 05 2009 There are some interesting replies coming along here. Thanks everybody ...

Lutger (4/5) Nov 08 2009 Whatever you got, give it to me as a string for my printf debugging whil...
Lutger (14/22) Nov 08 2009 My other reply didn't take the language agnostic into account, sorry.

Justin Johansson (16/47) Nov 08 2009 Thanks for that Lutger.

Lutger (10/67) Nov 10 2009 Your design makes better sense (to me at least) because it is based on w...

Don (6/75) Nov 10 2009 There is a definite use for such as thing. But the existing toString()

Lutger (5/11) Nov 10 2009 Since you are in the know and probably the biggest toString() hater arou...

Justin Johansson (11/24) Nov 10 2009 I have a feeling (and I may well be wrong) that toString might be used i...

Bill Baxter (17/39) Nov 10 2009 nd:

Justin Johansson (5/47) Nov 10 2009 I think you are right; if I can dig up what it was, and if relevant to t...

Don (7/18) Nov 10 2009 I'm hoping someone will come up with a design.

Justin Johansson (2/26) Nov 10 2009 That's starting to look like a "serialize" method!

Steven Schveighoffer (13/43) Nov 10 2009 As it should. I should be able to print a 10000 element container witho...

Andrei Alexandrescu (6/57) Nov 10 2009 Walter does not feel strongly about Phobos. The save() method in "On

Bill Baxter (13/33) Nov 10 2009 That looks pretty good, actually.

Don (18/56) Nov 10 2009 The thing is, the toString() function is essentially a virtual function

Bill Baxter (15/77) Nov 10 2009 Structs can't have virtual functions... so what do you mean?

grauzone (5/28) Nov 10 2009 Just put it into an "interface DebugOutput", remove Object.toString(),

Don (4/19) Nov 10 2009 How are you supposed to print one with it? It doesn't help.

grauzone (5/26) Nov 10 2009 Structs are a different matter. Nothing dictates that a struct should

Don (3/32) Nov 10 2009 This discussion is about that hack. Yes, it might be unnecessary if

Andrei Alexandrescu (4/28) Nov 10 2009 I think the best option for toString is to take an output range and

Denis Koroskin (4/27) Nov 10 2009 It means toString() must be either a template, or accept an abstract

Andrei Alexandrescu (3/34) Nov 10 2009 It should take an interface.

Bill Baxter (8/46) Nov 10 2009 before

Andrei Alexandrescu (3/40) Nov 10 2009 I am not sure. Opinions as always are welcome.

Bill Baxter (11/64) Nov 10 2009 g

Don (8/59) Nov 11 2009 It also needs to be used by structs, which aren't inherited from Object....

Denis Koroskin (7/49) Nov 11 2009 Some ranges may be polymorphic, so having base interface hierarchy in

Andrei Alexandrescu (5/59) Nov 11 2009 It can't be clone() because it doesn't clone. For example say you have a...

Denis Koroskin (6/65) Nov 11 2009 Well, range doesn't own any of the contents it covers, so deep copy is

Andrei Alexandrescu (5/9) Nov 11 2009 Well so the second sentence contradicts the first. Let me put it another...

Bill Baxter (4/13) Nov 11 2009 makeBreadCrumb() ?

Philippe Sigaud (12/16) Nov 11 2009 different name - opSlice

Denis Koroskin (3/22) Nov 11 2009 It remembers array bounds, not contents.

Steven Schveighoffer (38/41) Nov 12 2009 Bad idea...

Steven Schveighoffer (4/17) Nov 12 2009 Oops, I meant 3 virtual functions -- front, popNext, and empty.

Denis Koroskin (10/32) Nov 12 2009 Output range has only one method: put.

Steven Schveighoffer (18/54) Nov 12 2009 I was referring to range's ability to interact with foreach. An output ...

Andrei Alexandrescu (3/5) Nov 12 2009 I think this particular point is incorrect.

dsimcha (21/26) Nov 12 2009 Most of the overhead from indirect function calls come from the fact tha...

Andrei Alexandrescu (7/47) Nov 12 2009 I think that, on the contrary, working with a delegate is less generic.

Don (7/54) Nov 12 2009 How? It seems to introduce more requirements on the implementation, but

Andrei Alexandrescu (4/59) Nov 12 2009 That seems plausible.

Justin Johansson (5/55) Nov 12 2009 Which you mean -- interfaces, classes or both?

Andrei Alexandrescu (3/56) Nov 12 2009 My understanding is that the costs are comparable.

Andrei Alexandrescu (30/81) Nov 12 2009 You are right. If range interfaces accommodate block transfers, this

Steven Schveighoffer (54/115) Nov 12 2009 IIRC, I don't think C++ iostreams use polymorphism, and I don't think th...

Andrei Alexandrescu (23/137) Nov 12 2009 Oh yes they do. (Did you even google?) Virtual multiple inheritance, the...

Steven Schveighoffer (28/81) Nov 12 2009 From my C++ book, it appears to only use virtual inheritance. I don't ...

Andrei Alexandrescu (14/109) Nov 12 2009 You're right, but there is an issue because as far as I can recall these...

Bill Baxter (16/27) Nov 12 2009 d
Steven Schveighoffer (16/27) Nov 12 2009 Yep, you are right. It appears the reason they do this is so the

Andrei Alexandrescu (8/43) Nov 12 2009 One problem I just realized is that, if we e.g. offer only put(in

Steven Schveighoffer (22/60) Nov 12 2009 char[1] buf;

Andrei Alexandrescu (5/74) Nov 12 2009 I was just thinking of offering an interface that offers utf8 and utf16

Steven Schveighoffer (37/98) Nov 12 2009 :O

Andrei Alexandrescu (6/115) Nov 12 2009 Well a stack-allocated buffer is stack-allocated, and passing a slice

Bill Baxter (5/7) Nov 12 2009 Nonsense! Developers spend a lot of time debugging. Helping people

Andrei Alexandrescu (8/18) Nov 12 2009 Sorry sorry. I just meant to say it's not worth coming with an airtight

Steven Schveighoffer (9/24) Nov 12 2009 The main purpose to serialize is to be able to deserialize. The main

Yigal Chripun (9/39) Nov 12 2009 I'd add to that the a format facility should be locale aware as in .Net.

Steven Schveighoffer (6/8) Nov 12 2009 Debugging is not always done by the developer on his system where a

Steven Schveighoffer (92/104) Nov 12 2009 Some rudamentary attempts at benchmarking:

dsimcha (6/20) Nov 12 2009 Your benchmarks don't show that the direct call is much faster. You had...

Steven Schveighoffer (12/36) Nov 12 2009 The direct call was 5 seconds faster. Divide by 10 billion and you get ...

dsimcha (8/45) Nov 12 2009 Yes, about 0.5 nanoseconds. In other words, if your CPU is roughly 2 GH...

Justin Johansson (10/86) Nov 10 2009 s/over-my-dead-body/over-your-dead-body/ :-)
Bill Baxter (17/94) Nov 10 2009 of

bearophile (4/6) Nov 10 2009 I have added a toString to my copy of the BigInt.
Don (16/97) Nov 10 2009 I almost always want to print the value out in hex. And with some kind

bearophile (7/9) Nov 10 2009 This may help:

Bill Baxter (4/11) Nov 10 2009 Though they may be useful, those don't look to have anything to do

bearophile (4/6) Nov 10 2009 Don has said: "But the performance would still be very poor, and that's ...

Don (3/9) Nov 10 2009 It's problem 2 from my original posts: being able to output something
Bill Baxter (5/9) Nov 10 2009 Maybe it's just my ignorance of BigNum issues, but those links look to

bearophile (13/16) Nov 10 2009 Look the numeral() function inside here from those blog posts:

Bill Baxter (11/25) Nov 10 2009 r by 10, and accumulate the modulus as the digit, converted to ['0', '9'...

bearophile (4/5) Nov 10 2009 You are welcome.

Bill Baxter (5/12) Nov 10 2009 ut
Denis Koroskin (74/191) Nov 10 2009 Yes, it would solve half of the toString problems.

Bill Baxter (55/191) Nov 10 2009 cs

Don (10/161) Nov 10 2009 One thing it doesn't (easily) handle is the case where an int argument

bearophile (4/6) Nov 10 2009 See my post about vectorized lazyness.
Genghis Khan (2/117) Nov 12 2009 亞洲用戶有一個突出...
HOSOKAWA Kenchi (5/32) Nov 12 2009 That is true. UTF8 works well.

Justin Johansson <free beer.com> writes:

I assert that the semantics of "toString" or similarly named/purposed
methods/functions in many
PL's (including and not limited to D) is ill-defined.

To put this statement into perspective, I would be most appreciative of D NG
readers
responding with their own idea(s) of what the semantics of "toString" are (or
should be)
in a language agnostic ideology.

If there are more than, say, two or three different views on the said semantics
then my
"ill-definition" assertion is surely correct.

If there are no replies on this matter, then guess I'm left concludeless.

Just thinking in the language round-up that this is (just another) one of the
things
we should address as a community.

So what does "toString" mean to you?

**beers,
Justin

**caveat: free beer offer available in-store only

Nov 05 2009

Michal Minich <michal minich.sk> writes:

Hello Justin,

 I assert that the semantics of "toString" or similarly named/purposed
 methods/functions in many PL's (including and not limited to D) is
 ill-defined.
 
 To put this statement into perspective, I would be most appreciative
 of D NG readers responding with their own idea(s) of what the
 semantics of "toString" are (or should be) in a language agnostic
 ideology.
 
 If there are more than, say, two or three different views on the said
 semantics then my "ill-definition" assertion is surely correct.
 
 If there are no replies on this matter, then guess I'm left
 concludeless.
 
 Just thinking in the language round-up that this is (just another) one
 of the things we should address as a community.
 
 So what does "toString" mean to you?
 
 **beers,
 Justin
 **caveat: free beer offer available in-store only
 

My practice tells me to use toString only for debugging - to quickly get 
string representation of object in human readable format - nothing else ever. 
So it is good that toString is part of D object class.

It quite unsuitable e.g. for serializing object to xml/html or other formats. 
You may find yourself later finding out that your object should not only 
be toString-ed to xml, but now to json... Better to use specific method for 
specific purpose.

what matters me more of object methods, is opEquals being part of them. But 
that is different story.

Nov 05 2009

Ary Borenszweig <ary esperanto.org.ar> writes:

Justin Johansson wrote:
 So what does "toString" mean to you?

A string useful for debugging purposes and, when possible, useful for 
programming tasks.

For example in Java there's StringWriter and the toString method returns 
the String being written, I think that's fine. An XML node might return 
it's xml representation. But most of the time an object dosen't have a 
use as a string.

Nov 05 2009

Jesse Phillips <jessekphillips+D gmail.com> writes:

Justin Johansson Wrote:

 I assert that the semantics of "toString" or similarly named/purposed
methods/functions in many
 PL's (including and not limited to D) is ill-defined.
 
 To put this statement into perspective, I would be most appreciative of D NG
readers
 responding with their own idea(s) of what the semantics of "toString" are (or
should be)
 in a language agnostic ideology.
 
 If there are more than, say, two or three different views on the said
semantics then my
 "ill-definition" assertion is surely correct.
 
 If there are no replies on this matter, then guess I'm left concludeless.
 
 Just thinking in the language round-up that this is (just another) one of the
things
 we should address as a community.
 
 So what does "toString" mean to you?
 
 **beers,
 Justin
 
 **caveat: free beer offer available in-store only
 

Well some Java author said that the toSting method was only intended for
debugging, but list containers use it so... I don't have that reference :(

You can also check out the question on StackOverflow

http://stackoverflow.com/questions/563676/is-tostring-only-useful-for-debugging

But personally, output to the end-user should not come from toString and
program logic should not be based on the string returned from toString.

Nov 05 2009

Justin Whear <justin economicmodeling.com> writes:

Justin Johansson Wrote:

 I assert that the semantics of "toString" or similarly named/purposed
methods/functions in many
 PL's (including and not limited to D) is ill-defined.
 
 To put this statement into perspective, I would be most appreciative of D NG
readers
 responding with their own idea(s) of what the semantics of "toString" are (or
should be)
 in a language agnostic ideology.
 
 If there are more than, say, two or three different views on the said
semantics then my
 "ill-definition" assertion is surely correct.
 
 If there are no replies on this matter, then guess I'm left concludeless.
 
 Just thinking in the language round-up that this is (just another) one of the
things
 we should address as a community.
 
 So what does "toString" mean to you?
 
 **beers,
 Justin
 
 **caveat: free beer offer available in-store only
 

Two things:
1) Primarily for debugging purposes. It's very convenient.
2) As a default behavior in a few cases. For instance, if a listbox widget
hasn't been given a view that knows how to render objects of type Foo, it can
default to rendering the results of toString.

Nov 05 2009

"Nick Sabalausky" <a a.a> writes:

"Justin Johansson" <free beer.com> wrote in message 
news:hcuhet$15a2$1 digitalmars.com...
I assert that the semantics of "toString" or similarly named/purposed 
methods/functions in many
 PL's (including and not limited to D) is ill-defined.

 To put this statement into perspective, I would be most appreciative of D 
 NG readers
 responding with their own idea(s) of what the semantics of "toString" are 
 (or should be)
 in a language agnostic ideology.

 If there are more than, say, two or three different views on the said 
 semantics then my
 "ill-definition" assertion is surely correct.

 If there are no replies on this matter, then guess I'm left concludeless.

 Just thinking in the language round-up that this is (just another) one of 
 the things
 we should address as a community.

 So what does "toString" mean to you?

 **beers,
 Justin

 **caveat: free beer offer available in-store only

(Deliberately not reading the other replies before posting...)

It means to me, obtain a string-representation of an object (or an instance 
of a non-class type) in whatever form is reasonably appropriate for the 
given type. This string representation might include all data, but this is 
not guaranteed. It might be unique to each object, but this is not 
guaranteed. It might be fully-suitable for serialization, but this is not 
guaranteed.

Nov 05 2009

Don <nospam nospam.com> writes:

Justin Johansson wrote:
 I assert that the semantics of "toString" or similarly named/purposed
methods/functions in many
 PL's (including and not limited to D) is ill-defined.
 
 To put this statement into perspective, I would be most appreciative of D NG
readers
 responding with their own idea(s) of what the semantics of "toString" are (or
should be)
 in a language agnostic ideology.
 
 If there are more than, say, two or three different views on the said
semantics then my
 "ill-definition" assertion is surely correct.
 
 If there are no replies on this matter, then guess I'm left concludeless.
 
 Just thinking in the language round-up that this is (just another) one of the
things
 we should address as a community.
 
 So what does "toString" mean to you?

It's a hack from the early days of D. Should be unavailable unless the 
-debug flag is set, to discourage people from using it. I hate it.

Nov 05 2009

"Nick Sabalausky" <a a.a> writes:

"Don" <nospam nospam.com> wrote in message 
news:hcvf9l$91i$1 digitalmars.com...
 Justin Johansson wrote:
 So what does "toString" mean to you?

 It's a hack from the early days of D. Should be unavailable unless 
 the -debug flag is set, to discourage people from using it. I hate it.

What don't you like about it?

Nov 05 2009

Don <nospam nospam.com> writes:

Nick Sabalausky wrote:
 "Don" <nospam nospam.com> wrote in message 
 news:hcvf9l$91i$1 digitalmars.com...
 Justin Johansson wrote:
 So what does "toString" mean to you?

 It's a hack from the early days of D. Should be unavailable unless 
 the -debug flag is set, to discourage people from using it. I hate it.

 
 What don't you like about it?
 

It cannot even do the most basic stuff.
(1) You can't even make a struct that behaves like an int.

struct MyInt
{
     int z;
     string toString() { .... }
}

void main()
{
    int a = 400;
    MyInt b = 400;
    writefln("%05d %05d", a, b);
    writefln("%x %x", a, b);
}

(2) It doesn't behave like a stream. Suppose you have XmlDoc.toString()
You can't emit the doc, piece by piece. You have to create the ENTIRE 
string in one go!

Nov 05 2009

Yigal Chripun <yigal100 gmail.com> writes:

On 06/11/2009 07:34, Don wrote:
 Nick Sabalausky wrote:
 "Don" <nospam nospam.com> wrote in message
 news:hcvf9l$91i$1 digitalmars.com...
 Justin Johansson wrote:
 So what does "toString" mean to you?

 It's a hack from the early days of D. Should be unavailable unless
 the -debug flag is set, to discourage people from using it. I hate it.

 What don't you like about it?

 It cannot even do the most basic stuff.
 (1) You can't even make a struct that behaves like an int.

 struct MyInt
 {
 int z;
 string toString() { .... }
 }

 void main()
 {
 int a = 400;
 MyInt b = 400;
 writefln("%05d %05d", a, b);
 writefln("%x %x", a, b);
 }

 (2) It doesn't behave like a stream. Suppose you have XmlDoc.toString()
 You can't emit the doc, piece by piece. You have to create the ENTIRE
 string in one go!

The first issue you raise is IMO a problem with writefln and not with 
toString since writefln doesn't handle user-defined types properly.

I think that writefln (btw, horrible name) should only deal with strings 
and their formatting and all other types need to provide an (optionally 
formatted) string.
a numeric type would provide formatting of properties like number of 
decimal places, thousands separator, etc while user defined 
specification type could provide a type of standard format.

auto spec = new Specification(HTML);
string ansi = spec.toString(Specification.ANSI);
string iso = spec.toString(Specification.ISO);


the c style format string that specifies types is a horrible horrible 
thing and should be removed.

regarding the second issue:
forech (node; XmlDoc.preOrder()) writfln("{0}", node.toString());

Nov 06 2009

=?ISO-8859-1?Q?Pelle_M=E5nsson?= <pelle.mansson gmail.com> writes:

Yigal Chripun wrote:
 On 06/11/2009 07:34, Don wrote:
 Nick Sabalausky wrote:
 "Don" <nospam nospam.com> wrote in message
 news:hcvf9l$91i$1 digitalmars.com...
 Justin Johansson wrote:
 So what does "toString" mean to you?

 It's a hack from the early days of D. Should be unavailable unless
 the -debug flag is set, to discourage people from using it. I hate it.

 What don't you like about it?

 It cannot even do the most basic stuff.
 (1) You can't even make a struct that behaves like an int.

 struct MyInt
 {
 int z;
 string toString() { .... }
 }

 void main()
 {
 int a = 400;
 MyInt b = 400;
 writefln("%05d %05d", a, b);
 writefln("%x %x", a, b);
 }

 (2) It doesn't behave like a stream. Suppose you have XmlDoc.toString()
 You can't emit the doc, piece by piece. You have to create the ENTIRE
 string in one go!

 
 The first issue you raise is IMO a problem with writefln and not with 
 toString since writefln doesn't handle user-defined types properly.
 
 I think that writefln (btw, horrible name) should only deal with strings 
 and their formatting and all other types need to provide an (optionally 
 formatted) string.
 a numeric type would provide formatting of properties like number of 
 decimal places, thousands separator, etc while user defined 
 specification type could provide a type of standard format.
 
 auto spec = new Specification(HTML);
 string ansi = spec.toString(Specification.ANSI);
 string iso = spec.toString(Specification.ISO);

 
 the c style format string that specifies types is a horrible horrible 
 thing and should be removed.

How do you do %.3f in {}-notation?

Your formatting string should be written as writeln(ansi, " ", iso);

Nov 06 2009

Yigal Chripun <yigal100 gmail.com> writes:

On 06/11/2009 12:34, Pelle M�nsson wrote:
 Yigal Chripun wrote:
 On 06/11/2009 07:34, Don wrote:
 Nick Sabalausky wrote:
 "Don" <nospam nospam.com> wrote in message
 news:hcvf9l$91i$1 digitalmars.com...
 Justin Johansson wrote:
 So what does "toString" mean to you?

 It's a hack from the early days of D. Should be unavailable unless
 the -debug flag is set, to discourage people from using it. I hate it.

 What don't you like about it?

 It cannot even do the most basic stuff.
 (1) You can't even make a struct that behaves like an int.

 struct MyInt
 {
 int z;
 string toString() { .... }
 }

 void main()
 {
 int a = 400;
 MyInt b = 400;
 writefln("%05d %05d", a, b);
 writefln("%x %x", a, b);
 }

 (2) It doesn't behave like a stream. Suppose you have XmlDoc.toString()
 You can't emit the doc, piece by piece. You have to create the ENTIRE
 string in one go!

 The first issue you raise is IMO a problem with writefln and not with
 toString since writefln doesn't handle user-defined types properly.

 I think that writefln (btw, horrible name) should only deal with
 strings and their formatting and all other types need to provide an
 (optionally formatted) string.
 a numeric type would provide formatting of properties like number of
 decimal places, thousands separator, etc while user defined
 specification type could provide a type of standard format.

 auto spec = new Specification(HTML);
 string ansi = spec.toString(Specification.ANSI);
 string iso = spec.toString(Specification.ISO);


 the c style format string that specifies types is a horrible horrible
 thing and should be removed.


 How do you do %.3f in {}-notation?

writefln("{0:F3}", value);

 Your formatting string should be written as writeln(ansi, " ", iso);

That is incorrect since in my example I use the format string to switch 
the order of the strings. ( hence the numbers inside the {} )

Please go and read the tango documentation starting with 
http://www.dsource.org/projects/tango/wiki/TutCSharpFormatter
it has also links to the MSDN docs which describe the modifiers:
for instance:
http://msdn.microsoft.com/en-us/library/dwhawy9k%28VS.100%29.aspx

This is one area in phobos that needs to be rewritten from scratch or 
better yet, use tango. I'm still waiting for when hell will freeze over 
and tango and phobos will be merged together in one consistent API.

Nov 06 2009

Leandro Lucarella <llucax gmail.com> writes:

Yigal Chripun, el  6 de noviembre a las 14:23 me escribiste:
the c style format string that specifies types is a horrible horrible
thing and should be removed.


 
How do you do %.3f in {}-notation?

 
 writefln("{0:F3}", value);
 
Your formatting string should be written as writeln(ansi, " ", iso);


This is horrible, horrible for internationalization, you just can't assume
how a language order words.

Anyway, about the type in the format, I think it's nice, as you just
proved, tango have it too "{0:F3}" is saying "treat the value as a float
and format it that way". The deal is, the type should not be used to know
the size of the parameter in the stack like in C's printf(), it should be
just a hint to convert the value to another type.

So, type specification is important. Variables reordering is important
too, and you even have it in POSIX's printf():

	printf("%1$d:%2$.*3$d:%4$.*3$d\n", hour, min, precision, sec);

(see http://www.opengroup.org/onlinepubs/9699919799/functions/printf.html)

I like printf()'s format (I don't know if it's just because I'm used to it
though :).

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
"All mail clients suck. This one just sucks less." -me, circa 1995

Nov 06 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Leandro Lucarella wrote:
 Yigal Chripun, el  6 de noviembre a las 14:23 me escribiste:
 the c style format string that specifies types is a horrible horrible
 thing and should be removed.

 How do you do %.3f in {}-notation?

 writefln("{0:F3}", value);

 Your formatting string should be written as writeln(ansi, " ", iso);


 
 This is horrible, horrible for internationalization, you just can't assume
 how a language order words.
 
 Anyway, about the type in the format, I think it's nice, as you just
 proved, tango have it too "{0:F3}" is saying "treat the value as a float
 and format it that way". The deal is, the type should not be used to know
 the size of the parameter in the stack like in C's printf(), it should be
 just a hint to convert the value to another type.
 
 So, type specification is important. Variables reordering is important
 too, and you even have it in POSIX's printf():
 
 	printf("%1$d:%2$.*3$d:%4$.*3$d\n", hour, min, precision, sec);
 
 (see http://www.opengroup.org/onlinepubs/9699919799/functions/printf.html)
 
 I like printf()'s format (I don't know if it's just because I'm used to it
 though :).
 

I think you found a bug in Phobos. I tried this:

import std.stdio;

void main() {
     int hour = 1, min = 2, precision = 2, sec = 3;
     writef("%1$d:%2$.*3$d:%4$.*3$d\n", hour, min, precision, sec);
}

and it prints

1:002:003

But it should really print:

1:02:03

right?


Andrei

Nov 06 2009

Leandro Lucarella <llucax gmail.com> writes:

Andrei Alexandrescu, el  6 de noviembre a las 08:50 me escribiste:
So, type specification is important. Variables reordering is important
too, and you even have it in POSIX's printf():

	printf("%1$d:%2$.*3$d:%4$.*3$d\n", hour, min, precision, sec);

(see http://www.opengroup.org/onlinepubs/9699919799/functions/printf.html)

I like printf()'s format (I don't know if it's just because I'm used to it
though :).

 
 I think you found a bug in Phobos. I tried this:
 
 import std.stdio;
 
 void main() {
     int hour = 1, min = 2, precision = 2, sec = 3;
     writef("%1$d:%2$.*3$d:%4$.*3$d\n", hour, min, precision, sec);
 }
 
 and it prints
 
 1:002:003
 
 But it should really print:
 
 1:02:03
 
 right?

Yes.

------------------------
$ cat t.c
#include <stdio.h>

int main() {
	int hour = 1, min = 2, precision = 2, sec = 3;
	printf("%1$d:%2$.*3$d:%4$.*3$d\n", hour, min, precision, sec);
	return 0;
}

$ make t
cc     t.c   -o t
$ ./t
1:02:03
-----------------------

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Vaporeso, al verse enfundado por la depresión, decide dar fin a su vida
tomando Chinato Garda mezclado con kerosene al 50%. Ante el duro trance
pierde la movilidad en sus miembros derechos: inferior y superior. En
ese momento es considerado como el hombre líder del movimiento de
izquierda de Occidente.

Nov 06 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Leandro Lucarella wrote:
 Andrei Alexandrescu, el  6 de noviembre a las 08:50 me escribiste:
 So, type specification is important. Variables reordering is important
 too, and you even have it in POSIX's printf():

 	printf("%1$d:%2$.*3$d:%4$.*3$d\n", hour, min, precision, sec);

 (see http://www.opengroup.org/onlinepubs/9699919799/functions/printf.html)

 I like printf()'s format (I don't know if it's just because I'm used to it
 though :).

 I think you found a bug in Phobos. I tried this:

 import std.stdio;

 void main() {
     int hour = 1, min = 2, precision = 2, sec = 3;
     writef("%1$d:%2$.*3$d:%4$.*3$d\n", hour, min, precision, sec);
 }

 and it prints

 1:002:003

 But it should really print:

 1:02:03

 right?

 
 Yes.
 
 ------------------------
 $ cat t.c
 #include <stdio.h>
 
 int main() {
 	int hour = 1, min = 2, precision = 2, sec = 3;
 	printf("%1$d:%2$.*3$d:%4$.*3$d\n", hour, min, precision, sec);
 	return 0;
 }
 
 $ make t
 cc     t.c   -o t
 $ ./t
 1:02:03
 -----------------------
 

Thanks!

http://d.puremagic.com/issues/show_bug.cgi?id=3479

Andrei

Nov 06 2009

Yigal Chripun <yigal100 gmail.com> writes:

On 06/11/2009 15:38, Leandro Lucarella wrote:
 Yigal Chripun, el  6 de noviembre a las 14:23 me escribiste:
 the c style format string that specifies types is a horrible horrible
 thing and should be removed.


 How do you do %.3f in {}-notation?

 writefln("{0:F3}", value);

 Your formatting string should be written as writeln(ansi, " ", iso);


 This is horrible, horrible for internationalization, you just can't assume
 how a language order words.

 Anyway, about the type in the format, I think it's nice, as you just
 proved, tango have it too "{0:F3}" is saying "treat the value as a float
 and format it that way". The deal is, the type should not be used to know
 the size of the parameter in the stack like in C's printf(), it should be
 just a hint to convert the value to another type.

 So, type specification is important. Variables reordering is important
 too, and you even have it in POSIX's printf():

 	printf("%1$d:%2$.*3$d:%4$.*3$d\n", hour, min, precision, sec);

 (see http://www.opengroup.org/onlinepubs/9699919799/functions/printf.html)

 I like printf()'s format (I don't know if it's just because I'm used to it
 though :).


F in the above is _not_ a type specifier. It is a format specifier that 
means "fixed". More over, each type defines it's own format specifiers, 
and there's also a way to custom format stuff.
Here's some more examples: (from MSDN)

string myName = "Fred";
Console.WriteLine(String.Format("Name = {0}, hours = {1:hh}, minutes = 
{1:mm}", myName, DateTime.Now));
// Depending on the current time, the example displays output like the 
following:
//    Name = Fred, hours = 11, minutes = 30

string FormatString1 = String.Format("{0:dddd MMMM}", DateTime.Now);
string FormatString2 = DateTime.Now.ToString("dddd MMMM");

Console.WriteLine("{0:F}", DateTime.Now); // NOT float
// F for DateTime means Full date/time pattern (long time).

Another issue with the .NET design is that it's locale aware.
e.g.

// Display using pt-BR culture's short date format
DateTime thisDate = new DateTime(2008, 3, 15);
CultureInfo culture = new CultureInfo("pt-BR");
Console.WriteLine(thisDate.ToString("d", culture));  // Displays 15/3/2008

besides, the printf format is plain unreadable. It's like comparing 
ASCII to Unicode - D moved to native Unicode support and should move to 
this much better design as well.

Nov 06 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Yigal Chripun wrote:
 On 06/11/2009 12:34, Pelle M�nsson wrote:
 Yigal Chripun wrote:
 On 06/11/2009 07:34, Don wrote:
 Nick Sabalausky wrote:
 "Don" <nospam nospam.com> wrote in message
 news:hcvf9l$91i$1 digitalmars.com...
 Justin Johansson wrote:
 So what does "toString" mean to you?

 It's a hack from the early days of D. Should be unavailable unless
 the -debug flag is set, to discourage people from using it. I hate 
 it.

 What don't you like about it?

 It cannot even do the most basic stuff.
 (1) You can't even make a struct that behaves like an int.

 struct MyInt
 {
 int z;
 string toString() { .... }
 }

 void main()
 {
 int a = 400;
 MyInt b = 400;
 writefln("%05d %05d", a, b);
 writefln("%x %x", a, b);
 }

 (2) It doesn't behave like a stream. Suppose you have XmlDoc.toString()
 You can't emit the doc, piece by piece. You have to create the ENTIRE
 string in one go!

 The first issue you raise is IMO a problem with writefln and not with
 toString since writefln doesn't handle user-defined types properly.

 I think that writefln (btw, horrible name) should only deal with
 strings and their formatting and all other types need to provide an
 (optionally formatted) string.
 a numeric type would provide formatting of properties like number of
 decimal places, thousands separator, etc while user defined
 specification type could provide a type of standard format.

 auto spec = new Specification(HTML);
 string ansi = spec.toString(Specification.ANSI);
 string iso = spec.toString(Specification.ISO);


 the c style format string that specifies types is a horrible horrible
 thing and should be removed.


 
 How do you do %.3f in {}-notation?

 
 writefln("{0:F3}", value);
 
 Your formatting string should be written as writeln(ansi, " ", iso);

 
 That is incorrect since in my example I use the format string to switch 
 the order of the strings. ( hence the numbers inside the {} )
 
 Please go and read the tango documentation starting with 
 http://www.dsource.org/projects/tango/wiki/TutCSharpFormatter
 it has also links to the MSDN docs which describe the modifiers:
 for instance:
 http://msdn.microsoft.com/en-us/library/dwhawy9k%28VS.100%29.aspx
 
 This is one area in phobos that needs to be rewritten from scratch or 
 better yet, use tango. I'm still waiting for when hell will freeze over 
 and tango and phobos will be merged together in one consistent API.

Not sure to what extent it helps, but Phobos supports positional 
parameters too.

Andrei

Nov 06 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Don (nospam nospam.com)'s article
 Justin Johansson wrote:
 I assert that the semantics of "toString" or similarly named/purposed


methods/functions in many
 PL's (including and not limited to D) is ill-defined.

 To put this statement into perspective, I would be most appreciative of D NG


readers
 responding with their own idea(s) of what the semantics of "toString" are (or


should be)
 in a language agnostic ideology.

 If there are more than, say, two or three different views on the said


semantics then my
 "ill-definition" assertion is surely correct.

 If there are no replies on this matter, then guess I'm left concludeless.

 Just thinking in the language round-up that this is (just another) one of the


things
 we should address as a community.

 So what does "toString" mean to you?

 It's a hack from the early days of D. Should be unavailable unless the
 -debug flag is set, to discourage people from using it. I hate it.

Why?  You've said this several times w/o giving your reason.  IMHO toString()
is a
great way to get a default string representation of something.  If you care
about
the formatting details, then you use a non-special method.  How else would you
recommend giving objects a sane default string representation?

Nov 05 2009

div0 <div0 users.sourceforge.net> writes:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

dsimcha wrote:
 
 Why?  You've said this several times w/o giving your reason.  IMHO toString()
is a
 great way to get a default string representation of something.

And that's *exactly* what is wrong.

There is *never* a good default for anything.
Just look at all the discussion of nullable.
(shit people even complain about float.init == NaN)

- --
My enormous talent is exceeded only by my outrageous laziness.
http://www.ssTk.co.uk
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFK82NwT9LetA9XoXwRAkh+AJ4552uLI2Fz938zj4SsmY0qIHga3QCfZ5pa
Y5NmGJYF7tz8qk9K6GYB5J0=
=jmrB
-----END PGP SIGNATURE-----

Nov 05 2009

Justin Johansson <no spam.com> writes:

Don Wrote:

 Justin Johansson wrote:
 I assert that the semantics of "toString" or similarly named/purposed
methods/functions in many
 PL's (including and not limited to D) is ill-defined.
 
 To put this statement into perspective, I would be most appreciative of D NG
readers
 responding with their own idea(s) of what the semantics of "toString" are (or
should be)
 in a language agnostic ideology.
 
 If there are more than, say, two or three different views on the said
semantics then my
 "ill-definition" assertion is surely correct.
 
 If there are no replies on this matter, then guess I'm left concludeless.
 
 Just thinking in the language round-up that this is (just another) one of the
things
 we should address as a community.
 
 So what does "toString" mean to you?

 
 It's a hack from the early days of D. Should be unavailable unless the 
 -debug flag is set, to discourage people from using it. I hate it.

There are some interesting replies coming along here.  Thanks everybody for
chipping in.

I must admit though, when I read Don's reply just now the first thought that
went through my mind was "Sweet!"

Justin

Nov 05 2009

Lutger <lutger.blijdestijn gmail.com> writes:

Justin Johansson wrote:

...
 So what does "toString" mean to you?

Whatever you got, give it to me as a string for my printf debugging while my 
debugger is broken.

Nov 08 2009

Lutger <lutger.blijdestijn gmail.com> writes:

Justin Johansson wrote:

 I assert that the semantics of "toString" or similarly named/purposed
 methods/functions in many PL's (including and not limited to D) is
 ill-defined.
 
 To put this statement into perspective, I would be most appreciative of D
 NG readers responding with their own idea(s) of what the semantics of
 "toString" are (or should be) in a language agnostic ideology.
 

My other reply didn't take the language agnostic into account, sorry. 

Semantics of toString would depend on the object, I would think there are 
three general types of objects:

1. objects with only one sensible or one clear default string 
representations, like integers. Maybe even none of these exist (except 
strings themselves?) 

2. objects that, given some formatting options or locale have a clear string 
representation. floating points, dates, curreny and the like. 

3. objects that have no sensible default representation.

toString() would not make sense for 3) type objects and only for 2) type 
objects as part of a formatting / localization package. 

toString() as a debugging aid sometimes doubles as a formatter for 1) and 2) 
class objects, but that may be more confusing than it's worth.

Nov 08 2009

Justin Johansson <no spam.com> writes:

Lutger Wrote:

 Justin Johansson wrote:
 
 I assert that the semantics of "toString" or similarly named/purposed
 methods/functions in many PL's (including and not limited to D) is
 ill-defined.
 
 To put this statement into perspective, I would be most appreciative of D
 NG readers responding with their own idea(s) of what the semantics of
 "toString" are (or should be) in a language agnostic ideology.
 

 
 My other reply didn't take the language agnostic into account, sorry. 
 
 Semantics of toString would depend on the object, I would think there are 
 three general types of objects:
 
 1. objects with only one sensible or one clear default string 
 representations, like integers. Maybe even none of these exist (except 
 strings themselves?) 
 
 2. objects that, given some formatting options or locale have a clear string 
 representation. floating points, dates, curreny and the like. 
 
 3. objects that have no sensible default representation.
 
 toString() would not make sense for 3) type objects and only for 2) type 
 objects as part of a formatting / localization package. 
 
 toString() as a debugging aid sometimes doubles as a formatter for 1) and 2) 
 class objects, but that may be more confusing than it's worth. 
 

Thanks for that Lutger.

Do you think it would make better sense if programming languages/their libraries
separated functions/methods which are currently loosely purposed as "toString"
into methods which are more specific to the types you suggest (leaving only
the types/classifications and number thereof to argue about)?

In my own D project, I've introduced a toDebugString method and left toString
alone.
There are times when I like D's default toString printing out the name of the
object
class.  For debug purposes there are times also when I like to see a string
printed
out in quotes so you can tell the difference between "123" and 123.  Then again,
and since I'm working on a scripting language, sometimes I like to see debug 
output distinguish between different numeric types.

Anyway going by the replies on this topic, looks like most people view toString
as being good for debug purposes and that about it.

Cheers
Justin

Nov 08 2009

Lutger <lutger.blijdestijn gmail.com> writes:

Justin Johansson wrote:

 Lutger Wrote:
 
 Justin Johansson wrote:
 
 I assert that the semantics of "toString" or similarly named/purposed
 methods/functions in many PL's (including and not limited to D) is
 ill-defined.
 
 To put this statement into perspective, I would be most appreciative of
 D NG readers responding with their own idea(s) of what the semantics of
 "toString" are (or should be) in a language agnostic ideology.
 

 
 My other reply didn't take the language agnostic into account, sorry.
 
 Semantics of toString would depend on the object, I would think there are
 three general types of objects:
 
 1. objects with only one sensible or one clear default string
 representations, like integers. Maybe even none of these exist (except
 strings themselves?)
 
 2. objects that, given some formatting options or locale have a clear
 string representation. floating points, dates, curreny and the like.
 
 3. objects that have no sensible default representation.
 
 toString() would not make sense for 3) type objects and only for 2) type
 objects as part of a formatting / localization package.
 
 toString() as a debugging aid sometimes doubles as a formatter for 1) and
 2) class objects, but that may be more confusing than it's worth.
 

 
 Thanks for that Lutger.
 
 Do you think it would make better sense if programming languages/their
 libraries separated functions/methods which are currently loosely purposed
 as "toString" into methods which are more specific to the types you
 suggest (leaving only the types/classifications and number thereof to
 argue about)?
 
 In my own D project, I've introduced a toDebugString method and left
 toString alone. There are times when I like D's default toString printing
 out the name of the object
 class.  For debug purposes there are times also when I like to see a
 string printed
 out in quotes so you can tell the difference between "123" and 123.  Then
 again, and since I'm working on a scripting language, sometimes I like to
 see debug output distinguish between different numeric types.
 
 Anyway going by the replies on this topic, looks like most people view
 toString as being good for debug purposes and that about it.
 
 Cheers
 Justin
 

Your design makes better sense (to me at least) because it is based on why 
you want a string from some object. 

Take .NET for example: it does provide very elaborate and nice formatting 
options based and toString() with parameters. For some types however, the 
default toString() gives you the name of the type itself which is in no way 
related to formatting an object. You learn to work with it, but I find it a 
bit muddled. 

As a last note, I think people view toString as a debug thing mostly because 
it is very underpowered.

Nov 10 2009

Don <nospam nospam.com> writes:

Lutger wrote:
 Justin Johansson wrote:
 
 Lutger Wrote:

 Justin Johansson wrote:

 I assert that the semantics of "toString" or similarly named/purposed
 methods/functions in many PL's (including and not limited to D) is
 ill-defined.

 To put this statement into perspective, I would be most appreciative of
 D NG readers responding with their own idea(s) of what the semantics of
 "toString" are (or should be) in a language agnostic ideology.

 My other reply didn't take the language agnostic into account, sorry.

 Semantics of toString would depend on the object, I would think there are
 three general types of objects:

 1. objects with only one sensible or one clear default string
 representations, like integers. Maybe even none of these exist (except
 strings themselves?)

 2. objects that, given some formatting options or locale have a clear
 string representation. floating points, dates, curreny and the like.

 3. objects that have no sensible default representation.

 toString() would not make sense for 3) type objects and only for 2) type
 objects as part of a formatting / localization package.

 toString() as a debugging aid sometimes doubles as a formatter for 1) and
 2) class objects, but that may be more confusing than it's worth.

 Thanks for that Lutger.

 Do you think it would make better sense if programming languages/their
 libraries separated functions/methods which are currently loosely purposed
 as "toString" into methods which are more specific to the types you
 suggest (leaving only the types/classifications and number thereof to
 argue about)?

 In my own D project, I've introduced a toDebugString method and left
 toString alone. There are times when I like D's default toString printing
 out the name of the object
 class.  For debug purposes there are times also when I like to see a
 string printed
 out in quotes so you can tell the difference between "123" and 123.  Then
 again, and since I'm working on a scripting language, sometimes I like to
 see debug output distinguish between different numeric types.

 Anyway going by the replies on this topic, looks like most people view
 toString as being good for debug purposes and that about it.

 Cheers
 Justin

 
 Your design makes better sense (to me at least) because it is based on why 
 you want a string from some object. 
 
 Take .NET for example: it does provide very elaborate and nice formatting 
 options based and toString() with parameters. For some types however, the 
 default toString() gives you the name of the type itself which is in no way 
 related to formatting an object. You learn to work with it, but I find it a 
 bit muddled. 
 
 As a last note, I think people view toString as a debug thing mostly because 
 it is very underpowered.

There is a definite use for such as thing. But the existing toString() 
is much, much worse than useless. People think you can do something with 
it, but you can't.
eg, people have asked for BigInt to support toString(). That is an 
over-my-dead-body.

Nov 10 2009

Lutger <lutger.blijdestijn gmail.com> writes:

Don wrote:
...
 
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

Since you are in the know and probably the biggest toString() hater around: 
are there plans (or rejections thereof) to change toString() before D2 turns 
gold? Seems to me it could break quite some code.

Nov 10 2009

Justin Johansson <no spam.com> writes:

Lutger Wrote:

 Don wrote:
 ...
 
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

 
 Since you are in the know and probably the biggest toString() hater around: 
 are there plans (or rejections thereof) to change toString() before D2 turns 
 gold? Seems to me it could break quite some code.
 

I have a feeling (and I may well be wrong) that toString might be used in
relation to associative arrays.  I implemented an AA recently based upon
a struct key (I think).  Though I cannot remember the exact details I do
remember DMD saying something about toString not implemented and
so without thinking I gave the struct a toString and that kept DMD happy.
Since the code was throw-away I didn't bother to investigate.

Like I say, I cannot remember the details but others may recall some similar
experience.  For all I know it may be a case of RTFM?

beers,
Justin

Nov 10 2009

Bill Baxter <wbaxter gmail.com> writes:

On Tue, Nov 10, 2009 at 3:59 AM, Justin Johansson <no spam.com> wrote:
 Lutger Wrote:

 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something wi=



th
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

 Since you are in the know and probably the biggest toString() hater arou=


nd:
 are there plans (or rejections thereof) to change toString() before D2 t=


urns
 gold? Seems to me it could break quite some code.

 I have a feeling (and I may well be wrong) that toString might be used in
 relation to associative arrays. =A0I implemented an AA recently based upo=

n
 a struct key (I think). =A0Though I cannot remember the exact details I d=

o
 remember DMD saying something about toString not implemented and
 so without thinking I gave the struct a toString and that kept DMD happy.
 Since the code was throw-away I didn't bother to investigate.

 Like I say, I cannot remember the details but others may recall some simi=

lar
 experience. =A0For all I know it may be a case of RTFM?

Shouldn't be the case.  From TFM:
"""
Classes can be used as the KeyType. For this to work, the class
definition must override the following member functions of class
Object:

=95hash_t toHash()
=95bool opEquals(Object)
=95int opCmp(Object)
"""

--bb

Nov 10 2009

Justin Johansson <no spam.com> writes:

Bill Baxter Wrote:

 On Tue, Nov 10, 2009 at 3:59 AM, Justin Johansson <no spam.com> wrote:
 Lutger Wrote:

 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

 Since you are in the know and probably the biggest toString() hater around:
 are there plans (or rejections thereof) to change toString() before D2 turns
 gold? Seems to me it could break quite some code.

 I have a feeling (and I may well be wrong) that toString might be used in
 relation to associative arrays. �I implemented an AA recently based upon
 a struct key (I think). �Though I cannot remember the exact details I do
 remember DMD saying something about toString not implemented and
 so without thinking I gave the struct a toString and that kept DMD happy.
 Since the code was throw-away I didn't bother to investigate.

 Like I say, I cannot remember the details but others may recall some similar
 experience. �For all I know it may be a case of RTFM?

 
 Shouldn't be the case.  From TFM:
 """
 Classes can be used as the KeyType. For this to work, the class
 definition must override the following member functions of class
 Object:
 
 �hash_t toHash()
 �bool opEquals(Object)
 �int opCmp(Object)
 """
 
 --bb

I think you are right; if I can dig up what it was, and if relevant to this
discussion,
I'll post it.  Ignore what I said for mom.

Just wondering now though and in reference to Lutger's comment

 Since you are in the know and probably the biggest toString() hater around:
 are there plans (or rejections thereof) to change toString() before D2 turns
 gold? Seems to me it could break quite some code.



how much core code would be broken if toString was actually banished?

Nov 10 2009

Don <nospam nospam.com> writes:

Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

 
 Since you are in the know and probably the biggest toString() hater around: 
 are there plans (or rejections thereof) to change toString() before D2 turns 
 gold? Seems to me it could break quite some code.


I'm hoping someone will come up with a design.

Straw man:

void toString(void delegate(const(char)[]) sink, string fmt) {

// fmt holds the format string from writefln/formatln.
// call sink() to print partial results.

}

Nov 10 2009

Justin Johansson <no spam.com> writes:

Don Wrote:

 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

 
 Since you are in the know and probably the biggest toString() hater around: 
 are there plans (or rejections thereof) to change toString() before D2 turns 
 gold? Seems to me it could break quite some code.

 
 
 I'm hoping someone will come up with a design.
 
 Straw man:
 
 void toString(void delegate(const(char)[]) sink, string fmt) {
 
 // fmt holds the format string from writefln/formatln.
 // call sink() to print partial results.
 
 }

That's starting to look like a "serialize" method!

Nov 10 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 10 Nov 2009 07:49:11 -0500, Justin Johansson <no spam.com> wrote:

 Don Wrote:

 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing  


 toString()
 is much, much worse than useless. People think you can do something  


 with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

 Since you are in the know and probably the biggest toString() hater  

 around:
 are there plans (or rejections thereof) to change toString() before  

 D2 turns
 gold? Seems to me it could break quite some code.


 I'm hoping someone will come up with a design.

 Straw man:

 void toString(void delegate(const(char)[]) sink, string fmt) {

 // fmt holds the format string from writefln/formatln.
 // call sink() to print partial results.

 }

 That's starting to look like a "serialize" method!

As it should.  I should be able to print a 10000 element container without  
having to load a string representation of 10000 elements in memory.

I'd also like to see the name toString changed to something more  
appropriate, like output().

And although I think a direct translation is mostly possible, emulating  
writefln string formatting from tango would be a burden.  I don't know if  
there's any way around it without coming up with some complicated  
"formatting provider" interface/object implementation, and I don't think  
it's worth it.

Unfortunately, I doubt Walter accepts this, it's been proposed in the past  
without success.

-Steve

Nov 10 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 On Tue, 10 Nov 2009 07:49:11 -0500, Justin Johansson <no spam.com> wrote:
 
 Don Wrote:

 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing 


 toString()
 is much, much worse than useless. People think you can do 


 something with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

 Since you are in the know and probably the biggest toString() hater 

 around:
 are there plans (or rejections thereof) to change toString() before 

 D2 turns
 gold? Seems to me it could break quite some code.


 I'm hoping someone will come up with a design.

 Straw man:

 void toString(void delegate(const(char)[]) sink, string fmt) {

 // fmt holds the format string from writefln/formatln.
 // call sink() to print partial results.

 }

 That's starting to look like a "serialize" method!

 
 As it should.  I should be able to print a 10000 element container 
 without having to load a string representation of 10000 elements in memory.
 
 I'd also like to see the name toString changed to something more 
 appropriate, like output().
 
 And although I think a direct translation is mostly possible, emulating 
 writefln string formatting from tango would be a burden.  I don't know 
 if there's any way around it without coming up with some complicated 
 "formatting provider" interface/object implementation, and I don't think 
 it's worth it.
 
 Unfortunately, I doubt Walter accepts this, it's been proposed in the 
 past without success.
 
 -Steve

Walter does not feel strongly about Phobos. The save() method in "On 
Iteration" intently makes it possible to define ranges as interfaces, 
which in turn should pave the way towards defining a coherent text 
streaming mechanism.	

Andrei

Nov 10 2009

Bill Baxter <wbaxter gmail.com> writes:

On Tue, Nov 10, 2009 at 4:40 AM, Don <nospam nospam.com> wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

 Since you are in the know and probably the biggest toString() hater
 around: are there plans (or rejections thereof) to change toString() before
 D2 turns gold? Seems to me it could break quite some code.


 I'm hoping someone will come up with a design.

 Straw man:

 void toString(void delegate(const(char)[]) sink, string fmt) {

 // fmt holds the format string from writefln/formatln.
 // call sink() to print partial results.

 }

That looks pretty good, actually.
I guess I would like to see plain no-arg toString() still supported.

A default toString() could be implemented in terms of the fancy one as:

string toString() {
     char buf[];
     toString( (string s) { buf ~= s; }, "" );
     return assumeUnique!(buf);
}

could be a mixin in a library I suppose.

I think I would like to see the format strings not necessarily tied to
writefln's particular format.

--bb

Nov 10 2009

Don <nospam nospam.com> writes:

Bill Baxter wrote:
 On Tue, Nov 10, 2009 at 4:40 AM, Don <nospam nospam.com> wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

 Since you are in the know and probably the biggest toString() hater
 around: are there plans (or rejections thereof) to change toString() before
 D2 turns gold? Seems to me it could break quite some code.

 I'm hoping someone will come up with a design.

 Straw man:

 void toString(void delegate(const(char)[]) sink, string fmt) {

 // fmt holds the format string from writefln/formatln.
 // call sink() to print partial results.

 }

 
 That looks pretty good, actually.
 I guess I would like to see plain no-arg toString() still supported.

The thing is, the toString() function is essentially a virtual function 
present in every struct. Each one of those functions needs a very strong 
justification to exist.

 A default toString() could be implemented in terms of the fancy one as:
 
 string toString() {
      char buf[];
      toString( (string s) { buf ~= s; }, "" );
      return assumeUnique!(buf);
 }
 
 could be a mixin in a library I suppose.

More for the benefit of consumers, or producers?

Because
void toString(void delegate(const(char)[]) sink, string fmt) {
    sink("xxx");
}
isn't much more complex than:
string toString()
{
   return "xxx";
}
other than the signature.


 
 I think I would like to see the format strings not necessarily tied to
 writefln's particular format.

I think the format strings are actually pretty similar, Tango vs 
writefln? There might be enough common ground. I think the Tango format 
is a slight superset of the writefln one.

Nov 10 2009

Bill Baxter <wbaxter gmail.com> writes:

On Tue, Nov 10, 2009 at 7:29 AM, Don <nospam nospam.com> wrote:
 Bill Baxter wrote:
 On Tue, Nov 10, 2009 at 4:40 AM, Don <nospam nospam.com> wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString(=





)
 is much, much worse than useless. People think you can do something
 with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

 Since you are in the know and probably the biggest toString() hater
 around: are there plans (or rejections thereof) to change toString()
 before
 D2 turns gold? Seems to me it could break quite some code.

 I'm hoping someone will come up with a design.

 Straw man:

 void toString(void delegate(const(char)[]) sink, string fmt) {

 // fmt holds the format string from writefln/formatln.
 // call sink() to print partial results.

 }

 That looks pretty good, actually.
 I guess I would like to see plain no-arg toString() still supported.

 The thing is, the toString() function is essentially a virtual function
 present in every struct. Each one of those functions needs a very strong
 justification to exist.

Structs can't have virtual functions... so what do you mean?

 A default toString() could be implemented in terms of the fancy one as:

 string toString() {
 =A0 =A0 char buf[];
 =A0 =A0 toString( (string s) { buf ~=3D s; }, "" );
 =A0 =A0 return assumeUnique!(buf);
 }

 could be a mixin in a library I suppose.

 More for the benefit of consumers, or producers?

Consumers.  I was just thinking it would be a little annoying to have
to reproduce the above 3-line snippet of code every time I want to get
the string version of an object.

But I guess such needs can be adequately served by std.string.format
or sformat.  So scratch that, no old-style toString() needed.

 Because
 void toString(void delegate(const(char)[]) sink, string fmt) {
 =A0 sink("xxx");
 }
 isn't much more complex than:
 string toString()
 {
 =A0return "xxx";
 }
 other than the signature.

Yeh, for authors of toString methods it's fine.

Well, a different way to write delegates would be nice, but that's a
different discussion.

 I think I would like to see the format strings not necessarily tied to
 writefln's particular format.

 I think the format strings are actually pretty similar, Tango vs writefln=

?
 There might be enough common ground. I think the Tango format is a slight
 superset of the writefln one.

Pretty similar, maybe, but I'd be surprised if they just happened to
be identical without any attempt at compatibility having been made.

--bb

Nov 10 2009

grauzone <none example.net> writes:

Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.



How are you supposed to print a BigInt then?

 Since you are in the know and probably the biggest toString() hater 
 around: are there plans (or rejections thereof) to change toString() 
 before D2 turns gold? Seems to me it could break quite some code.

 
 
 I'm hoping someone will come up with a design.
 
 Straw man:
 
 void toString(void delegate(const(char)[]) sink, string fmt) {
 
 // fmt holds the format string from writefln/formatln.
 // call sink() to print partial results.
 
 }

Just put it into an "interface DebugOutput", remove Object.toString(), 
and be done with it. That interface could be defined in the same module 
as writefln or format, and its use will be clear.

Nov 10 2009

Don <nospam nospam.com> writes:

grauzone wrote:
 Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something 
 with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.



 
 How are you supposed to print a BigInt then?

How are you supposed to print one with it? It doesn't help.
(The problem even more obvious if you consider BigFloat).

 Just put it into an "interface DebugOutput", remove Object.toString(), 
 and be done with it. That interface could be defined in the same module 
 as writefln or format, and its use will be clear.

BigInt is a struct, so it doesn't have interfaces.

Nov 10 2009

grauzone <none example.net> writes:

Don wrote:
 grauzone wrote:
 Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something 
 with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.



 How are you supposed to print a BigInt then?

 How are you supposed to print one with it? It doesn't help.
 (The problem even more obvious if you consider BigFloat).
 
 Just put it into an "interface DebugOutput", remove Object.toString(), 
 and be done with it. That interface could be defined in the same 
 module as writefln or format, and its use will be clear.

 
 BigInt is a struct, so it doesn't have interfaces.

Structs are a different matter. Nothing dictates that a struct should 
have a toString method, or what arguments that method should have, 
right? (There's this compiler/runtime hack to make struct toString work 
with writefln, but now that wirtefln uses compile time varargs, it can go.)

Nov 10 2009

Don <nospam nospam.com> writes:

grauzone wrote:
 Don wrote:
 grauzone wrote:
 Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing 
 toString()
 is much, much worse than useless. People think you can do 
 something with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.



 How are you supposed to print a BigInt then?

 How are you supposed to print one with it? It doesn't help.
 (The problem even more obvious if you consider BigFloat).

 Just put it into an "interface DebugOutput", remove 
 Object.toString(), and be done with it. That interface could be 
 defined in the same module as writefln or format, and its use will be 
 clear.

 BigInt is a struct, so it doesn't have interfaces.

 
 Structs are a different matter. Nothing dictates that a struct should 
 have a toString method, or what arguments that method should have, 
 right? (There's this compiler/runtime hack to make struct toString work 
 with writefln, but now that wirtefln uses compile time varargs, it can go.)

This discussion is about that hack. Yes, it might be unnecessary if 
compile time varargs work sufficiently well.

Nov 10 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

 Since you are in the know and probably the biggest toString() hater 
 around: are there plans (or rejections thereof) to change toString() 
 before D2 turns gold? Seems to me it could break quite some code.

 
 
 I'm hoping someone will come up with a design.
 
 Straw man:
 
 void toString(void delegate(const(char)[]) sink, string fmt) {
 
 // fmt holds the format string from writefln/formatln.
 // call sink() to print partial results.
 
 }

I think the best option for toString is to take an output range and 
write to it. (The sink is a simplified range.)

Andrei

Nov 10 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Wed, 11 Nov 2009 02:49:54 +0300, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something  
 with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

 Since you are in the know and probably the biggest toString() hater  
 around: are there plans (or rejections thereof) to change toString()  
 before D2 turns gold? Seems to me it could break quite some code.

   I'm hoping someone will come up with a design.
  Straw man:
  void toString(void delegate(const(char)[]) sink, string fmt) {
  // fmt holds the format string from writefln/formatln.
 // call sink() to print partial results.
  }

 I think the best option for toString is to take an output range and  
 write to it. (The sink is a simplified range.)

 Andrei

It means toString() must be either a template, or accept an abstract  
InputRange interface?

Nov 10 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Denis Koroskin wrote:
 On Wed, 11 Nov 2009 02:49:54 +0300, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something 
 with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

 Since you are in the know and probably the biggest toString() hater 
 around: are there plans (or rejections thereof) to change toString() 
 before D2 turns gold? Seems to me it could break quite some code.

   I'm hoping someone will come up with a design.
  Straw man:
  void toString(void delegate(const(char)[]) sink, string fmt) {
  // fmt holds the format string from writefln/formatln.
 // call sink() to print partial results.
  }

 I think the best option for toString is to take an output range and 
 write to it. (The sink is a simplified range.)

 Andrei

 
 It means toString() must be either a template, or accept an abstract 
 InputRange interface?

It should take an interface.

Andrei

Nov 10 2009

Bill Baxter <wbaxter gmail.com> writes:

2009/11/10 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:
 Denis Koroskin wrote:
 On Wed, 11 Nov 2009 02:49:54 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString=






()
 is much, much worse than useless. People think you can do something
 with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

 Since you are in the know and probably the biggest toString() hater
 around: are there plans (or rejections thereof) to change toString() =





before
 D2 turns gold? Seems to me it could break quite some code.

 =A0I'm hoping someone will come up with a design.
 =A0Straw man:
 =A0void toString(void delegate(const(char)[]) sink, string fmt) {
 =A0// fmt holds the format string from writefln/formatln.
 // call sink() to print partial results.
 =A0}

 I think the best option for toString is to take an output range and wri=



te
 to it. (The sink is a simplified range.)

 Andrei

 It means toString() must be either a template, or accept an abstract
 InputRange interface?

 It should take an interface.

So yet another type in object.d?
Or require users in import something specific in every module that's
going to use toString?

--bb

Nov 10 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 2009/11/10 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:
 Denis Koroskin wrote:
 On Wed, 11 Nov 2009 02:49:54 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing toString()
 is much, much worse than useless. People think you can do something
 with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

 Since you are in the know and probably the biggest toString() hater
 around: are there plans (or rejections thereof) to change toString() before
 D2 turns gold? Seems to me it could break quite some code.

  I'm hoping someone will come up with a design.
  Straw man:
  void toString(void delegate(const(char)[]) sink, string fmt) {
  // fmt holds the format string from writefln/formatln.
 // call sink() to print partial results.
  }

 I think the best option for toString is to take an output range and write
 to it. (The sink is a simplified range.)

 Andrei

 It means toString() must be either a template, or accept an abstract
 InputRange interface?

 It should take an interface.

 
 So yet another type in object.d?
 Or require users in import something specific in every module that's
 going to use toString?
 
 --bb

I am not sure. Opinions as always are welcome.

Andrei

Nov 10 2009

Bill Baxter <wbaxter gmail.com> writes:

On Tue, Nov 10, 2009 at 5:27 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 2009/11/10 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:
 Denis Koroskin wrote:
 On Wed, 11 Nov 2009 02:49:54 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing
 toString()
 is much, much worse than useless. People think you can do somethin=








g
 with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

 Since you are in the know and probably the biggest toString() hater
 around: are there plans (or rejections thereof) to change toString(=







)
 before
 D2 turns gold? Seems to me it could break quite some code.

 =A0I'm hoping someone will come up with a design.
 =A0Straw man:
 =A0void toString(void delegate(const(char)[]) sink, string fmt) {
 =A0// fmt holds the format string from writefln/formatln.
 // call sink() to print partial results.
 =A0}

 I think the best option for toString is to take an output range and
 write
 to it. (The sink is a simplified range.)

 Andrei

 It means toString() must be either a template, or accept an abstract
 InputRange interface?

 It should take an interface.

 So yet another type in object.d?
 Or require users in import something specific in every module that's
 going to use toString?

 --bb

 I am not sure. Opinions as always are welcome.

That's why my opinion is that the delegate idea is nice.  :-)

But I guess toString is already defined by Object, right?  So it would
make sense for an interface needed by an Object method to be defined
in object.d.   I suppose it could be an interface defined inside the
Object class itself?  (Does that work? can you define interfaces
inside classes?)

--bb

Nov 10 2009

Don <nospam nospam.com> writes:

Bill Baxter wrote:
 On Tue, Nov 10, 2009 at 5:27 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 2009/11/10 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:
 Denis Koroskin wrote:
 On Wed, 11 Nov 2009 02:49:54 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing
 toString()
 is much, much worse than useless. People think you can do something
 with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

 Since you are in the know and probably the biggest toString() hater
 around: are there plans (or rejections thereof) to change toString()
 before
 D2 turns gold? Seems to me it could break quite some code.

  I'm hoping someone will come up with a design.
  Straw man:
  void toString(void delegate(const(char)[]) sink, string fmt) {
  // fmt holds the format string from writefln/formatln.
 // call sink() to print partial results.
  }

 I think the best option for toString is to take an output range and
 write
 to it. (The sink is a simplified range.)

 Andrei

 It means toString() must be either a template, or accept an abstract
 InputRange interface?

 It should take an interface.

 So yet another type in object.d?
 Or require users in import something specific in every module that's
 going to use toString?

 --bb

 I am not sure. Opinions as always are welcome.

 
 That's why my opinion is that the delegate idea is nice.  :-)
 
 But I guess toString is already defined by Object, right?  So it would
 make sense for an interface needed by an Object method to be defined
 in object.d.   I suppose it could be an interface defined inside the
 Object class itself?  (Does that work? can you define interfaces
 inside classes?)

It also needs to be used by structs, which aren't inherited from Object. 
So I don't see how a nested interface could work.

I suggest a design acceptance criterion: the simplest case should be 
about as simple as:
    return "xxx";
or
    put("xxx");

Nov 11 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Wed, 11 Nov 2009 04:27:45 +0300, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Bill Baxter wrote:
 2009/11/10 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:
 Denis Koroskin wrote:
 On Wed, 11 Nov 2009 02:49:54 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing  
 toString()
 is much, much worse than useless. People think you can do  
 something
 with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

 Since you are in the know and probably the biggest toString() hater
 around: are there plans (or rejections thereof) to change  
 toString() before
 D2 turns gold? Seems to me it could break quite some code.

  I'm hoping someone will come up with a design.
  Straw man:
  void toString(void delegate(const(char)[]) sink, string fmt) {
  // fmt holds the format string from writefln/formatln.
 // call sink() to print partial results.
  }

 I think the best option for toString is to take an output range and  
 write
 to it. (The sink is a simplified range.)

 Andrei

 It means toString() must be either a template, or accept an abstract
 InputRange interface?

 It should take an interface.

  So yet another type in object.d?
 Or require users in import something specific in every module that's
 going to use toString?
  --bb

 I am not sure. Opinions as always are welcome.

 Andrei

Some ranges may be polymorphic, so having base interface hierarchy in  
Phobos would be useful anyway.

BTW, save() is already implemented and used throughout the Phobos under a  
different name - opSlice (i.e. auto copy = range[]). It's a bikeshed  
discussion, but why save() and not opSlice(), or even clone()?

Nov 11 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Denis Koroskin wrote:
 On Wed, 11 Nov 2009 04:27:45 +0300, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Bill Baxter wrote:
 2009/11/10 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:
 Denis Koroskin wrote:
 On Wed, 11 Nov 2009 02:49:54 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing 
 toString()
 is much, much worse than useless. People think you can do 
 something
 with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

 Since you are in the know and probably the biggest toString() hater
 around: are there plans (or rejections thereof) to change 
 toString() before
 D2 turns gold? Seems to me it could break quite some code.

  I'm hoping someone will come up with a design.
  Straw man:
  void toString(void delegate(const(char)[]) sink, string fmt) {
  // fmt holds the format string from writefln/formatln.
 // call sink() to print partial results.
  }

 I think the best option for toString is to take an output range 
 and write
 to it. (The sink is a simplified range.)

 Andrei

 It means toString() must be either a template, or accept an abstract
 InputRange interface?

 It should take an interface.

  So yet another type in object.d?
 Or require users in import something specific in every module that's
 going to use toString?
  --bb

 I am not sure. Opinions as always are welcome.

 Andrei

 
 Some ranges may be polymorphic, so having base interface hierarchy in 
 Phobos would be useful anyway.
 
 BTW, save() is already implemented and used throughout the Phobos under 
 a different name - opSlice (i.e. auto copy = range[]). It's a bikeshed 
 discussion, but why save() and not opSlice(), or even clone()?

It can't be clone() because it doesn't clone. For example say you have a 
T[] - one would expect clone() actually copies the content. But using 
opSlice is a good idea.

Andrei

Nov 11 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Wed, 11 Nov 2009 18:50:47 +0300, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Denis Koroskin wrote:
 On Wed, 11 Nov 2009 04:27:45 +0300, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Bill Baxter wrote:
 2009/11/10 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:
 Denis Koroskin wrote:
 On Wed, 11 Nov 2009 02:49:54 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Don wrote:
 Lutger wrote:
 Don wrote:
 ...
 There is a definite use for such as thing. But the existing  
 toString()
 is much, much worse than useless. People think you can do  
 something
 with
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is  
 an
 over-my-dead-body.

 Since you are in the know and probably the biggest toString()  
 hater
 around: are there plans (or rejections thereof) to change  
 toString() before
 D2 turns gold? Seems to me it could break quite some code.

  I'm hoping someone will come up with a design.
  Straw man:
  void toString(void delegate(const(char)[]) sink, string fmt) {
  // fmt holds the format string from writefln/formatln.
 // call sink() to print partial results.
  }

 I think the best option for toString is to take an output range  
 and write
 to it. (The sink is a simplified range.)

 Andrei

 It means toString() must be either a template, or accept an abstract
 InputRange interface?

 It should take an interface.

  So yet another type in object.d?
 Or require users in import something specific in every module that's
 going to use toString?
  --bb

 I am not sure. Opinions as always are welcome.

 Andrei

  Some ranges may be polymorphic, so having base interface hierarchy in  
 Phobos would be useful anyway.
  BTW, save() is already implemented and used throughout the Phobos  
 under a different name - opSlice (i.e. auto copy = range[]). It's a  
 bikeshed discussion, but why save() and not opSlice(), or even clone()?

 It can't be clone() because it doesn't clone. For example say you have a  
 T[] - one would expect clone() actually copies the content. But using  
 opSlice is a good idea.

 Andrei

Well, range doesn't own any of the contents it covers, so deep copy is  
impossible.
Yet, there is also .dup array property which is pretends to be a standard  
way of creating instance copies.

Nov 11 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Denis Koroskin wrote:
 Well, range doesn't own any of the contents it covers, so deep copy is 
 impossible.
 Yet, there is also .dup array property which is pretends to be a 
 standard way of creating instance copies.

Well so the second sentence contradicts the first. Let me put it another 
way: you have the entire vocabulary at your disposal to define save(). 
Wouldn't you think clone() may be a bit more confusing than others?

Andrei

Nov 11 2009

Bill Baxter <wbaxter gmail.com> writes:

2009/11/11 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:
 Denis Koroskin wrote:
 Well, range doesn't own any of the contents it covers, so deep copy is
 impossible.
 Yet, there is also .dup array property which is pretends to be a standard
 way of creating instance copies.

 Well so the second sentence contradicts the first. Let me put it another
 way: you have the entire vocabulary at your disposal to define save().
 Wouldn't you think clone() may be a bit more confusing than others?

makeBreadCrumb() ?
:-)

--bb

Nov 11 2009

Philippe Sigaud <philippe.sigaud gmail.com> writes:

Denis:
BTW, save() is already implemented and used throughout the Phobos under a

different name - opSlice
 (i.e. auto copy = range[]). It's a bikeshed discussion, but why save() and
not opSlice(), or even clone()?

2009/11/11 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>

 It can't be clone() because it doesn't clone. For example say you have a
 T[] - one would expect clone() actually copies the content. But using
 opSlice is a good idea.

I don't get it. Shouldn't save() copy the content?

Do you mean we could use opSlice() (the parameterless version) as a save
function and write "auto r2 = r1[];"?
But, again maybe I don't get something: for dyn. arrays (aka the range
archetype) opSlice is not a save, it's just an alias. So using opSlice
doesn't work for remembering positions with arrays.

   Philippe

Nov 11 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Wed, 11 Nov 2009 20:08:52 +0300, Philippe Sigaud  
<philippe.sigaud gmail.com> wrote:

 Denis:
 BTW, save() is already implemented and used throughout the Phobos under  
 a

 different name - opSlice
  (i.e. auto copy = range[]). It's a bikeshed discussion, but why save()  
 and
 not opSlice(), or even clone()?

 2009/11/11 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>

 It can't be clone() because it doesn't clone. For example say you have a
 T[] - one would expect clone() actually copies the content. But using
 opSlice is a good idea.

 I don't get it. Shouldn't save() copy the content?

 Do you mean we could use opSlice() (the parameterless version) as a save
 function and write "auto r2 = r1[];"?
 But, again maybe I don't get something: for dyn. arrays (aka the range
 archetype) opSlice is not a save, it's just an alias. So using opSlice
 doesn't work for remembering positions with arrays.

    Philippe

It remembers array bounds, not contents.

Nov 11 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range and  
 write to it. (The sink is a simplified range.)

Bad idea...

A range only makes sense as a struct, not an interface/object.  I'll tell  
you why: performance.

Ranges are special in two respects:

1. They are foreachable.  I think everyone agrees that calling 2 interface  
functions per loop iteration is much lower performing than using opApply,  
which calls one delegate function per loop.  My recommendation -- use  
opApply when dealing with polymorphism.  I don't think there's a way  
around this.
2. They are useful for passing to std.algorithm.  But std.algorithm is  
template-interfaced.  No need for using interfaces because the correct  
instatiation will be chosen.

If you are intending to add a streaming module that uses ranges, would it  
not be templated for the range type as std.algorithm is?  If not, the next  
logical choice is a delegate, which requires no vtable lookup.  Using an  
interface is just asking for a performance penalty for not much gain.

Here's what I mean by not much gain: I would expect a stream range that  
does output to have a method in it for outputting a buffer (I'd laugh at  
you if you wanted to define a stream range that outputs a character at a  
time).  So the difference between:

x.toString(outputRange, format)

and

x.toString(&outputRange.sink, format)

is pretty darn minimal, and if outputRange is an interface or object, this  
saves a virtual call per buffer write.  Plus the second form is more  
universal, you can pass any delegate, and not have to use a range type to  
wrap a delegate.

Don't fall into the "OOP newbie" trap -- where just because you've found a  
new concept that is amazing, you want to use it for everything.  I say  
this because I've seen in the past where someone discovers the power of  
OOP and then wants to use it for everything, when in some cases, it's  
overkill.  Just look at some Java "classes"...

 From another thread:
 Walter does not feel strongly about Phobos.

Huh?  I feel like this sentence doesn't make sense, so maybe there's a  
typo.

-Steve

Nov 12 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 12 Nov 2009 08:22:26 -0500, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range and  
 write to it. (The sink is a simplified range.)

 Bad idea...

 A range only makes sense as a struct, not an interface/object.  I'll  
 tell you why: performance.

 Ranges are special in two respects:

 1. They are foreachable.  I think everyone agrees that calling 2  
 interface functions per loop iteration is much lower performing than  
 using opApply, which calls one delegate function per loop.  My  
 recommendation -- use opApply when dealing with polymorphism.  I don't  
 think there's a way around this.

Oops, I meant 3 virtual functions -- front, popNext, and empty.

-Steve

Nov 12 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Thu, 12 Nov 2009 16:23:22 +0300, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Thu, 12 Nov 2009 08:22:26 -0500, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range and  
 write to it. (The sink is a simplified range.)

 Bad idea...

 A range only makes sense as a struct, not an interface/object.  I'll  
 tell you why: performance.

 Ranges are special in two respects:

 1. They are foreachable.  I think everyone agrees that calling 2  
 interface functions per loop iteration is much lower performing than  
 using opApply, which calls one delegate function per loop.  My  
 recommendation -- use opApply when dealing with polymorphism.  I don't  
 think there's a way around this.

 Oops, I meant 3 virtual functions -- front, popNext, and empty.

 -Steve

Output range has only one method: put.

I'm not sure, but I don't think there is a performance difference between  
calling a virtual function through an interface and invoking a delegate.

But I agree passing a delegate is more generic. You can substitute an  
output range with a delegate (obj.toString(&range.put, fmt)) without any  
performance hit, but not vice versa (obj.toString(new  
DelegateWrapRange(&myput), fmt) implies an additional allocation and  
additional indirection per range.put call).

Nov 12 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 12 Nov 2009 08:56:06 -0500, Denis Koroskin <2korden gmail.com>  
wrote:

 On Thu, 12 Nov 2009 16:23:22 +0300, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 On Thu, 12 Nov 2009 08:22:26 -0500, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range and  
 write to it. (The sink is a simplified range.)

 Bad idea...

 A range only makes sense as a struct, not an interface/object.  I'll  
 tell you why: performance.

 Ranges are special in two respects:

 1. They are foreachable.  I think everyone agrees that calling 2  
 interface functions per loop iteration is much lower performing than  
 using opApply, which calls one delegate function per loop.  My  
 recommendation -- use opApply when dealing with polymorphism.  I don't  
 think there's a way around this.

 Oops, I meant 3 virtual functions -- front, popNext, and empty.

 -Steve

 Output range has only one method: put.

I was referring to range's ability to interact with foreach.  An output  
range wouldn't qualify as a foreachable entity anyways (and rightfully  
so).  Just covering all the bases.

 I'm not sure, but I don't think there is a performance difference  
 between calling a virtual function through an interface and invoking a  
 delegate.

Yes, there is:

A delegate is equivalent to a struct member function call.  (load data  
pointer (i.e. this), push args, call function)
A virtual function uses a vtable to look up the function address, and then  
is equivalent to a struct member call.
An interface function call is equivalent to a virtual call with the added  
penalty that you might have to adjust the 'this' pointer before calling.

 But I agree passing a delegate is more generic. You can substitute an  
 output range with a delegate (obj.toString(&range.put, fmt)) without any  
 performance hit, but not vice versa (obj.toString(new  
 DelegateWrapRange(&myput), fmt) implies an additional allocation and  
 additional indirection per range.put call).

You can use scope classes to avoid the allocation, but you can't get  
around the virtual/interface call penalty.

But even if a range is a struct, it's simply a different form of delegate,  
one in which you undoubtedly call only one member function.  Might as well  
use a delegate to allow the most usefulness.

-Steve

Nov 12 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 A delegate is equivalent to a struct member function call.  (load data 
 pointer (i.e. this), push args, call function)

I think this particular point is incorrect.

Andrei

Nov 12 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Steven Schveighoffer wrote:
 A delegate is equivalent to a struct member function call.  (load data
 pointer (i.e. this), push args, call function)

 I think this particular point is incorrect.
 Andrei

Most of the overhead from indirect function calls come from the fact that they
(usually) can't be inlined, not because they are indirect.  The struct member
function call is faster mostly because it can be inlined, not because it's
direct.

Here's roughly what the ASM would look like for a call to a member function of a
struct on the stack, if I is a metasyntactic variable for any immediate value:

mov EAX, EBP;  // Copy frame pointer to EAX
add EAX, I;    // Add the offset of the struct to EAX.
push EAX;      // EAX is now the this ptr.  Push it.
call I;        // Call the function.

And for a delegate that lives on the stack:

mov EAX, [EBP + I];  // Move delegate's this ptr into EAX.
push EAX;            // Push delegate's this ptr onto stack.
call [EBP + I];      // Call whatever address is at offset I from EBP.

I've actually benchmarked how much indirect function calls cost compared to
direct
calls that aren't inlined.  The short answer is it's not measurable, at least
when
calling the same function indirectly in a loop over and over.  It could in
theory
cause pipeline stalls because it's a branch, but according to some Intel
optimization manual Don posted here a while back, modern CPUs predict the
address
of indirect function calls in their branch predictor.  This means that if the
same
path is taken again and again, the overhead will be negligible.

Nov 12 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Denis Koroskin wrote:
 On Thu, 12 Nov 2009 16:23:22 +0300, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:
 
 On Thu, 12 Nov 2009 08:22:26 -0500, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:

 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range and 
 write to it. (The sink is a simplified range.)

 Bad idea...

 A range only makes sense as a struct, not an interface/object.  I'll 
 tell you why: performance.

 Ranges are special in two respects:

 1. They are foreachable.  I think everyone agrees that calling 2 
 interface functions per loop iteration is much lower performing than 
 using opApply, which calls one delegate function per loop.  My 
 recommendation -- use opApply when dealing with polymorphism.  I 
 don't think there's a way around this.

 Oops, I meant 3 virtual functions -- front, popNext, and empty.

 -Steve

 
 Output range has only one method: put.
 
 I'm not sure, but I don't think there is a performance difference 
 between calling a virtual function through an interface and invoking a 
 delegate.
 
 But I agree passing a delegate is more generic. You can substitute an 
 output range with a delegate (obj.toString(&range.put, fmt)) without any 
 performance hit, but not vice versa (obj.toString(new 
 DelegateWrapRange(&myput), fmt) implies an additional allocation and 
 additional indirection per range.put call).

I think that, on the contrary, working with a delegate is less generic. 
A delegate is cost-wise much like a class with only one (non-final) 
method. Since we're taking that hit already, we may as well define 
actual interfaces and classes that have multiple methods. That makes 
things more flexible and more efficient.

Andrei

Nov 12 2009

Don <nospam nospam.com> writes:

Andrei Alexandrescu wrote:
 Denis Koroskin wrote:
 On Thu, 12 Nov 2009 16:23:22 +0300, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:

 On Thu, 12 Nov 2009 08:22:26 -0500, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:

 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range and 
 write to it. (The sink is a simplified range.)

 Bad idea...

 A range only makes sense as a struct, not an interface/object.  I'll 
 tell you why: performance.

 Ranges are special in two respects:

 1. They are foreachable.  I think everyone agrees that calling 2 
 interface functions per loop iteration is much lower performing than 
 using opApply, which calls one delegate function per loop.  My 
 recommendation -- use opApply when dealing with polymorphism.  I 
 don't think there's a way around this.

 Oops, I meant 3 virtual functions -- front, popNext, and empty.

 -Steve

 Output range has only one method: put.

 I'm not sure, but I don't think there is a performance difference 
 between calling a virtual function through an interface and invoking a 
 delegate.

 But I agree passing a delegate is more generic. You can substitute an 
 output range with a delegate (obj.toString(&range.put, fmt)) without 
 any performance hit, but not vice versa (obj.toString(new 
 DelegateWrapRange(&myput), fmt) implies an additional allocation and 
 additional indirection per range.put call).

 
 I think that, on the contrary, working with a delegate is less generic. 
 A delegate is cost-wise much like a class with only one (non-final) 
 method. Since we're taking that hit already, we may as well define 
 actual interfaces and classes that have multiple methods. That makes 
 things more flexible and more efficient.

How? It seems to introduce more requirements on the implementation, but 
I'm not seeing any benefit in exchange.

FWIW, with regard to performance, I can easily imagine the compiler 
being able to perform the equivalent of a "named return value" 
optimisation on a delegate return, giving some chance of inlining.
That's a lot less obvious with an interface.

Nov 12 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Don wrote:
 Andrei Alexandrescu wrote:
 Denis Koroskin wrote:
 On Thu, 12 Nov 2009 16:23:22 +0300, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:

 On Thu, 12 Nov 2009 08:22:26 -0500, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:

 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range 
 and write to it. (The sink is a simplified range.)

 Bad idea...

 A range only makes sense as a struct, not an interface/object.  
 I'll tell you why: performance.

 Ranges are special in two respects:

 1. They are foreachable.  I think everyone agrees that calling 2 
 interface functions per loop iteration is much lower performing 
 than using opApply, which calls one delegate function per loop.  My 
 recommendation -- use opApply when dealing with polymorphism.  I 
 don't think there's a way around this.

 Oops, I meant 3 virtual functions -- front, popNext, and empty.

 -Steve

 Output range has only one method: put.

 I'm not sure, but I don't think there is a performance difference 
 between calling a virtual function through an interface and invoking 
 a delegate.

 But I agree passing a delegate is more generic. You can substitute an 
 output range with a delegate (obj.toString(&range.put, fmt)) without 
 any performance hit, but not vice versa (obj.toString(new 
 DelegateWrapRange(&myput), fmt) implies an additional allocation and 
 additional indirection per range.put call).

 I think that, on the contrary, working with a delegate is less 
 generic. A delegate is cost-wise much like a class with only one 
 (non-final) method. Since we're taking that hit already, we may as 
 well define actual interfaces and classes that have multiple methods. 
 That makes things more flexible and more efficient.

 
 How? It seems to introduce more requirements on the implementation, but 
 I'm not seeing any benefit in exchange.

The benefit is that it allows writing all character widths.

 FWIW, with regard to performance, I can easily imagine the compiler 
 being able to perform the equivalent of a "named return value" 
 optimisation on a delegate return, giving some chance of inlining.
 That's a lot less obvious with an interface.

That seems plausible.


Andrei

Nov 12 2009

Justin Johansson <no spam.com> writes:

Andrei Alexandrescu Wrote:

 Denis Koroskin wrote:
 On Thu, 12 Nov 2009 16:23:22 +0300, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:
 
 On Thu, 12 Nov 2009 08:22:26 -0500, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:

 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range and 
 write to it. (The sink is a simplified range.)

 Bad idea...

 A range only makes sense as a struct, not an interface/object.  I'll 
 tell you why: performance.

 Ranges are special in two respects:

 1. They are foreachable.  I think everyone agrees that calling 2 
 interface functions per loop iteration is much lower performing than 
 using opApply, which calls one delegate function per loop.  My 
 recommendation -- use opApply when dealing with polymorphism.  I 
 don't think there's a way around this.

 Oops, I meant 3 virtual functions -- front, popNext, and empty.

 -Steve

 
 Output range has only one method: put.
 
 I'm not sure, but I don't think there is a performance difference 
 between calling a virtual function through an interface and invoking a 
 delegate.
 
 But I agree passing a delegate is more generic. You can substitute an 
 output range with a delegate (obj.toString(&range.put, fmt)) without any 
 performance hit, but not vice versa (obj.toString(new 
 DelegateWrapRange(&myput), fmt) implies an additional allocation and 
 additional indirection per range.put call).

 
 I think that, on the contrary, working with a delegate is less generic. 
 A delegate is cost-wise much like a class with only one (non-final) 
 method. Since we're taking that hit already, we may as well define 
 actual interfaces and classes that have multiple methods. That makes 
 things more flexible and more efficient.
 
 Andrei

"Since we're taking that hit already, we may as well define 
 actual interfaces and classes that have multiple methods."

Which you mean -- interfaces, classes or both?
Don't interfaces have a higher cost than classes?

Justin

Nov 12 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Justin Johansson wrote:
 Andrei Alexandrescu Wrote:
 
 Denis Koroskin wrote:
 On Thu, 12 Nov 2009 16:23:22 +0300, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:

 On Thu, 12 Nov 2009 08:22:26 -0500, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:

 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range and 
 write to it. (The sink is a simplified range.)

 Bad idea...

 A range only makes sense as a struct, not an interface/object.  I'll 
 tell you why: performance.

 Ranges are special in two respects:

 1. They are foreachable.  I think everyone agrees that calling 2 
 interface functions per loop iteration is much lower performing than 
 using opApply, which calls one delegate function per loop.  My 
 recommendation -- use opApply when dealing with polymorphism.  I 
 don't think there's a way around this.

 Oops, I meant 3 virtual functions -- front, popNext, and empty.

 -Steve

 Output range has only one method: put.

 I'm not sure, but I don't think there is a performance difference 
 between calling a virtual function through an interface and invoking a 
 delegate.

 But I agree passing a delegate is more generic. You can substitute an 
 output range with a delegate (obj.toString(&range.put, fmt)) without any 
 performance hit, but not vice versa (obj.toString(new 
 DelegateWrapRange(&myput), fmt) implies an additional allocation and 
 additional indirection per range.put call).

 I think that, on the contrary, working with a delegate is less generic. 
 A delegate is cost-wise much like a class with only one (non-final) 
 method. Since we're taking that hit already, we may as well define 
 actual interfaces and classes that have multiple methods. That makes 
 things more flexible and more efficient.

 Andrei

 
 "Since we're taking that hit already, we may as well define 
 actual interfaces and classes that have multiple methods."

 
 Which you mean -- interfaces, classes or both?
 Don't interfaces have a higher cost than classes?

My understanding is that the costs are comparable.

Andrei

Nov 12 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 I think the best option for toString is to take an output range and 
 write to it. (The sink is a simplified range.)

 
 Bad idea...
 
 A range only makes sense as a struct, not an interface/object.  I'll 
 tell you why: performance.

You are right. If range interfaces accommodate block transfers, this 
problem may be addressed. I agree that one virtual call per character 
output would be overkill. (I seem to recall it's one of the reasons why 
C++'s iostreams are so inefficient.)

 Ranges are special in two respects:
 
 1. They are foreachable.  I think everyone agrees that calling 2 
 interface functions per loop iteration is much lower performing than 
 using opApply, which calls one delegate function per loop.  My 
 recommendation -- use opApply when dealing with polymorphism.  I don't 
 think there's a way around this.

 2. They are useful for passing to std.algorithm.  But std.algorithm is 
 template-interfaced.  No need for using interfaces because the correct 
 instatiation will be chosen.
 
 If you are intending to add a streaming module that uses ranges, would 
 it not be templated for the range type as std.algorithm is?  If not, the 
 next logical choice is a delegate, which requires no vtable lookup.  
 Using an interface is just asking for a performance penalty for not much 
 gain.

I think the cost of calling through the delegate is roughly the same as 
a virtual call.

 Here's what I mean by not much gain: I would expect a stream range that 
 does output to have a method in it for outputting a buffer (I'd laugh at 
 you if you wanted to define a stream range that outputs a character at a 
 time).  So the difference between:

Well I'd laugh at you if you thought I'm that brain dead :o).

 x.toString(outputRange, format)
 
 and
 
 x.toString(&outputRange.sink, format)
 
 is pretty darn minimal, and if outputRange is an interface or object, 
 this saves a virtual call per buffer write.  Plus the second form is 
 more universal, you can pass any delegate, and not have to use a range 
 type to wrap a delegate.
 
 Don't fall into the "OOP newbie" trap -- where just because you've found 
 a new concept that is amazing, you want to use it for everything.  I say 
 this because I've seen in the past where someone discovers the power of 
 OOP and then wants to use it for everything, when in some cases, it's 
 overkill.  Just look at some Java "classes"...

There is no need to worry that I'll fall into at least that particular 
OOP newbie trap.

What I think we should do is define a text output interface that allows 
writing individual characters of all widths and also arrays of all 
widths. That would be a universal means for text output.

interface TextOutputStream {
     void put(dchar); // also accommodates char and wchar
     void put(in char[]);
     void put(in wchar[]);
     void put(in dchar[]);
}

The toString method (re-baptized as toStream) would take such an 
interface. Better ideas are always welcome. Perhaps I'm falling another 
OOP newbie trap! (Seriously!)

One possible course of action would be to extend the text output stream 
to print (and possibly format) some or all primitive types, a la today's 
phobos streams. That would make TextOutputStream fatter and more 
diluted, something that I don't like. But then we might define a 
FormattingTextOutputStream that extends TextOutputStream with all that 
stuff.

  From another thread:
 Walter does not feel strongly about Phobos.

 
 Huh?  I feel like this sentence doesn't make sense, so maybe there's a 
 typo.

I meant to say, Walter does not want to do library design.


Andrei

Nov 12 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 12 Nov 2009 10:29:17 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range and  
 write to it. (The sink is a simplified range.)

  Bad idea...
  A range only makes sense as a struct, not an interface/object.  I'll  
 tell you why: performance.

 You are right. If range interfaces accommodate block transfers, this  
 problem may be addressed. I agree that one virtual call per character  
 output would be overkill. (I seem to recall it's one of the reasons why  
 C++'s iostreams are so inefficient.)

IIRC, I don't think C++ iostreams use polymorphism, and I don't think they  
use the "one char at a time" method.

 Ranges are special in two respects:
  1. They are foreachable.  I think everyone agrees that calling 2  
 interface functions per loop iteration is much lower performing than  
 using opApply, which calls one delegate function per loop.  My  
 recommendation -- use opApply when dealing with polymorphism.  I don't  
 think there's a way around this.

  >
 2. They are useful for passing to std.algorithm.  But std.algorithm is  
 template-interfaced.  No need for using interfaces because the correct  
 instatiation will be chosen.
  If you are intending to add a streaming module that uses ranges, would  
 it not be templated for the range type as std.algorithm is?  If not,  
 the next logical choice is a delegate, which requires no vtable  
 lookup.  Using an interface is just asking for a performance penalty  
 for not much gain.

 I think the cost of calling through the delegate is roughly the same as  
 a virtual call.

Not exactly.  I think you are right that struct member calls are faster  
than delegates, but only slightly.  The difference being that a struct  
member call does not need to load the function address from the stack, it  
can hard-code the address directly.

However, virtual calls have to be lower performing because you are doing  
two indirections, one to the class vtable, then one to the function  
address itself.  Plus those two locations are most likely located on the  
heap, not the stack, and so may not be in the cache.

 x.toString(outputRange, format)
  and
  x.toString(&outputRange.sink, format)
  is pretty darn minimal, and if outputRange is an interface or object,  
 this saves a virtual call per buffer write.  Plus the second form is  
 more universal, you can pass any delegate, and not have to use a range  
 type to wrap a delegate.
  Don't fall into the "OOP newbie" trap -- where just because you've  
 found a new concept that is amazing, you want to use it for  
 everything.  I say this because I've seen in the past where someone  
 discovers the power of OOP and then wants to use it for everything,  
 when in some cases, it's overkill.  Just look at some Java "classes"...

 There is no need to worry that I'll fall into at least that particular  
 OOP newbie trap.

 What I think we should do is define a text output interface that allows  
 writing individual characters of all widths and also arrays of all  
 widths. That would be a universal means for text output.

 interface TextOutputStream {
      void put(dchar); // also accommodates char and wchar
      void put(in char[]);
      void put(in wchar[]);
      void put(in dchar[]);
 }

 The toString method (re-baptized as toStream) would take such an  
 interface. Better ideas are always welcome. Perhaps I'm falling another  
 OOP newbie trap! (Seriously!)

This still fits within a single function, which takes one of the 3 widths  
(pick one, they can all be translated to eachother):

void put(in char[] str)
{
   foreach(dchar dc; str)
   {
      put((&dc)[0..1]);
   }
}

Note that you probably want to build a buffer of dchars instead of putting  
one at a time, but you get the idea.

Also, putting a single character is probably pretty uncommon, but can be  
handled in a similar fashion.

That being said, one other point that makes all this moot is -- toString  
is for debugging, not for general purpose.  We don't need to support  
everything that is possible.  You should be able to say "hey, toString  
only accepts char[], deal."  Of course, you could substitute wchar[] or  
dchar[], but I think by far char[] is the most common (and is the default  
type for string literals).

That's not to say there is no reason to have a TextOutputStream object.   
Such a thing is perfectly usable for a toString which takes a char[]  
delegate sink, just pass &put.  In fact, there could be a default toString  
function in Object that does just that:

class Object
{
    ...
    void toString(delegate void(in char[] buf) put, string fmt) const
    {}
    void toString(TextOutputStream tos, string fmt) const
    { toString(&tos.put, fmt); }
}

Of course, then TextOutputStream has to be druntime-accessible, so maybe  
it's not a great idea...  But there are ways around that:

abstract class BaseTextOutputStream : TextOutputStream {
     void format(const Object o, string fmt) { o.toString(&this.put, fmt); }
}

  From another thread:
 Walter does not feel strongly about Phobos.

  Huh?  I feel like this sentence doesn't make sense, so maybe there's a  
 typo.

 I meant to say, Walter does not want to do library design.

I'm trying to remember but I thought he did care about this particular  
issue, but it may be muddled in my memory.  Also note that toString has  
special status from the compiler in regards to structs (that hack with the  
xtoString function in the struct's typeinfo), so it doesn't just affect  
library code.

-Steve

Nov 12 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 10:29:17 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:
 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range and 
 write to it. (The sink is a simplified range.)

  Bad idea...
  A range only makes sense as a struct, not an interface/object.  I'll 
 tell you why: performance.

 You are right. If range interfaces accommodate block transfers, this 
 problem may be addressed. I agree that one virtual call per character 
 output would be overkill. (I seem to recall it's one of the reasons 
 why C++'s iostreams are so inefficient.)

 
 IIRC, I don't think C++ iostreams use polymorphism

Oh yes they do. (Did you even google?) Virtual multiple inheritance, the 
works.

http://www.deitel.com/articles/cplusplus_tutorials/20060225/virtualBaseClass/

, and I don't think 
 they use the "one char at a time" method.

Well they do offer one char at a time and also a block transfer.

http://msdn.microsoft.com/en-us/library/760t8w1z%28VS.80%29.aspx

I'm not sure how the heck but they still manage to call one virtual 
method per char, otherwise they'd be plenty fast, which they aren't. I 
seem to recall write() has a default implementation that calls put() in 
a loop or something. It's not a topic that I want to study closely. 
iostreams suck, why spend time on learning the quirks of a broken design.

 Ranges are special in two respects:
  1. They are foreachable.  I think everyone agrees that calling 2 
 interface functions per loop iteration is much lower performing than 
 using opApply, which calls one delegate function per loop.  My 
 recommendation -- use opApply when dealing with polymorphism.  I 
 don't think there's a way around this.

  >
 2. They are useful for passing to std.algorithm.  But std.algorithm 
 is template-interfaced.  No need for using interfaces because the 
 correct instatiation will be chosen.
  If you are intending to add a streaming module that uses ranges, 
 would it not be templated for the range type as std.algorithm is?  If 
 not, the next logical choice is a delegate, which requires no vtable 
 lookup.  Using an interface is just asking for a performance penalty 
 for not much gain.

 I think the cost of calling through the delegate is roughly the same 
 as a virtual call.

 
 Not exactly.  I think you are right that struct member calls are faster 
 than delegates, but only slightly.  The difference being that a struct 
 member call does not need to load the function address from the stack, 
 it can hard-code the address directly.
 
 However, virtual calls have to be lower performing because you are doing 
 two indirections, one to the class vtable, then one to the function 
 address itself.  Plus those two locations are most likely located on the 
 heap, not the stack, and so may not be in the cache.

I think the only way to figure is to measure. For one thing I disagree 
with the comment about the cache - a vtable is quite likely to be warm 
after a couple of calls.

I know one thing - Walter's old format function used delegates and it 
was unusably slow.

 x.toString(outputRange, format)
  and
  x.toString(&outputRange.sink, format)
  is pretty darn minimal, and if outputRange is an interface or 
 object, this saves a virtual call per buffer write.  Plus the second 
 form is more universal, you can pass any delegate, and not have to 
 use a range type to wrap a delegate.
  Don't fall into the "OOP newbie" trap -- where just because you've 
 found a new concept that is amazing, you want to use it for 
 everything.  I say this because I've seen in the past where someone 
 discovers the power of OOP and then wants to use it for everything, 
 when in some cases, it's overkill.  Just look at some Java "classes"...

 There is no need to worry that I'll fall into at least that particular 
 OOP newbie trap.

 What I think we should do is define a text output interface that 
 allows writing individual characters of all widths and also arrays of 
 all widths. That would be a universal means for text output.

 interface TextOutputStream {
      void put(dchar); // also accommodates char and wchar
      void put(in char[]);
      void put(in wchar[]);
      void put(in dchar[]);
 }

 The toString method (re-baptized as toStream) would take such an 
 interface. Better ideas are always welcome. Perhaps I'm falling 
 another OOP newbie trap! (Seriously!)

 
 This still fits within a single function, which takes one of the 3 
 widths (pick one, they can all be translated to eachother):
 
 void put(in char[] str)
 {
   foreach(dchar dc; str)
   {
      put((&dc)[0..1]);
   }
 }
 
 Note that you probably want to build a buffer of dchars instead of 
 putting one at a time, but you get the idea.

I don't get the idea. I'm seeing one virtual call per character.

 Also, putting a single character is probably pretty uncommon, but can be 
 handled in a similar fashion.

I'm not sure about the uncommonality of outputting one character, but it 
may be good to discourage it just to not foster slow code.

 That being said, one other point that makes all this moot is -- toString 
 is for debugging, not for general purpose.  We don't need to support 
 everything that is possible.  You should be able to say "hey, toString 
 only accepts char[], deal."  Of course, you could substitute wchar[] or 
 dchar[], but I think by far char[] is the most common (and is the 
 default type for string literals).

I was hoping we could elevate the usefulness of toString a bit.

 That's not to say there is no reason to have a TextOutputStream object.  
 Such a thing is perfectly usable for a toString which takes a char[] 
 delegate sink, just pass &put.  In fact, there could be a default 
 toString function in Object that does just that:
 
 class Object
 {
    ...
    void toString(delegate void(in char[] buf) put, string fmt) const
    {}
    void toString(TextOutputStream tos, string fmt) const
    { toString(&tos.put, fmt); }
 }

I'd agree with the delegate idea if we established that UTF-8 is favored 
compared to all other formats.


Andrei

Nov 12 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 12 Nov 2009 11:46:48 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 10:29:17 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range and  
 write to it. (The sink is a simplified range.)

  Bad idea...
  A range only makes sense as a struct, not an interface/object.  I'll  
 tell you why: performance.

 You are right. If range interfaces accommodate block transfers, this  
 problem may be addressed. I agree that one virtual call per character  
 output would be overkill. (I seem to recall it's one of the reasons  
 why C++'s iostreams are so inefficient.)

  IIRC, I don't think C++ iostreams use polymorphism

 Oh yes they do. (Did you even google?) Virtual multiple inheritance, the  
 works.

 http://www.deitel.com/articles/cplusplus_tutorials/20060225/virtualBaseClass/

 From my C++ book, it appears to only use virtual inheritance.  I don't  
know enough about virtual inheritance to know how that changes function  
calls.

As far as virtual functions, only the destructor is virtual, so there is  
no issue there.

  void put(in char[] str)
 {
   foreach(dchar dc; str)
   {
      put((&dc)[0..1]);
   }
 }
  Note that you probably want to build a buffer of dchars instead of  
 putting one at a time, but you get the idea.

 I don't get the idea. I'm seeing one virtual call per character.

You missed the note.  I didn't implement it, but you could easily  
implement a stack-allocated buffer to cache the conversions, passing  
multiple converted code-points at once.  But I don't think it's even worth  
discussing per my other points.

 That being said, one other point that makes all this moot is --  
 toString is for debugging, not for general purpose.  We don't need to  
 support everything that is possible.  You should be able to say "hey,  
 toString only accepts char[], deal."  Of course, you could substitute  
 wchar[] or dchar[], but I think by far char[] is the most common (and  
 is the default type for string literals).

 I was hoping we could elevate the usefulness of toString a bit.

Whatever kind of data the output stream gets, it's going to convert it to  
the format it wants anyways (as for stdout, I think that would be utf8),  
the only benefit is if you have data stored in a different width that you  
wanted to output.  Calling a conversion function in that case I think is  
reasonable enough, and saves the output stream from having to convert/deal  
with it.

In other words, I don't think it's going to be that common a case where  
you need anything other than utf8 output, and therefore the cost of  
creating an interface, making virtual calls, disallowing simple delegate  
passing etc is worth the convenience *just in case* you have data stored  
as wchar[] you want to output.

 That's not to say there is no reason to have a TextOutputStream  
 object.  Such a thing is perfectly usable for a toString which takes a  
 char[] delegate sink, just pass &put.  In fact, there could be a  
 default toString function in Object that does just that:
  class Object
 {
    ...
    void toString(delegate void(in char[] buf) put, string fmt) const
    {}
    void toString(TextOutputStream tos, string fmt) const
    { toString(&tos.put, fmt); }
 }

 I'd agree with the delegate idea if we established that UTF-8 is favored  
 compared to all other formats.

D seems to favor UTF8 -- it is the default type for string literals.  I  
don't think I've ever used dchar, and I usually only use wchar to talk to  
Win32 functions when required.

The question I'd ask is -- how common is it where the versions other than  
char[] would be more convenient?

-Steve

Nov 12 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 11:46:48 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 10:29:17 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range 
 and write to it. (The sink is a simplified range.)

  Bad idea...
  A range only makes sense as a struct, not an interface/object.  
 I'll tell you why: performance.

 You are right. If range interfaces accommodate block transfers, this 
 problem may be addressed. I agree that one virtual call per 
 character output would be overkill. (I seem to recall it's one of 
 the reasons why C++'s iostreams are so inefficient.)

  IIRC, I don't think C++ iostreams use polymorphism

 Oh yes they do. (Did you even google?) Virtual multiple inheritance, 
 the works.

 http://www.deitel.com/articles/cplusplus_tutorials/20060225/virtualBaseClass/ 

 
  From my C++ book, it appears to only use virtual inheritance.  I don't 
 know enough about virtual inheritance to know how that changes function 
 calls.
 
 As far as virtual functions, only the destructor is virtual, so there is 
 no issue there.

You're right, but there is an issue because as far as I can recall these 
functions' implementation do end up calling a virtual function per char; 
that might be streambuf.overflow. I'm not keen on investigating this any 
further, but I'd be grateful if you shared any related knowledge. At the 
end of the day, there seem to be violent agreement that we don't want 
one virtual call per character or one delegate call per character.

  void put(in char[] str)
 {
   foreach(dchar dc; str)
   {
      put((&dc)[0..1]);
   }
 }
  Note that you probably want to build a buffer of dchars instead of 
 putting one at a time, but you get the idea.

 I don't get the idea. I'm seeing one virtual call per character.

 
 You missed the note.  I didn't implement it, but you could easily 
 implement a stack-allocated buffer to cache the conversions, passing 
 multiple converted code-points at once.  But I don't think it's even 
 worth discussing per my other points.
 
 That being said, one other point that makes all this moot is -- 
 toString is for debugging, not for general purpose.  We don't need to 
 support everything that is possible.  You should be able to say "hey, 
 toString only accepts char[], deal."  Of course, you could substitute 
 wchar[] or dchar[], but I think by far char[] is the most common (and 
 is the default type for string literals).

 I was hoping we could elevate the usefulness of toString a bit.

 
 Whatever kind of data the output stream gets, it's going to convert it 
 to the format it wants anyways (as for stdout, I think that would be 
 utf8), the only benefit is if you have data stored in a different width 
 that you wanted to output.  Calling a conversion function in that case I 
 think is reasonable enough, and saves the output stream from having to 
 convert/deal with it.
 
 In other words, I don't think it's going to be that common a case where 
 you need anything other than utf8 output, and therefore the cost of 
 creating an interface, making virtual calls, disallowing simple delegate 
 passing etc is worth the convenience *just in case* you have data stored 
 as wchar[] you want to output.

I'm not sure.

http://www.gnu.org/s/libc/manual/html_node/Streams-and-I18N.html#Streams-and-I18N

gnu defines means to set and detect a utf-16 console, which dmd observes 
(grep std/ for fwide). But then I'm not sure how many are using that 
kind of stuff.

 That's not to say there is no reason to have a TextOutputStream 
 object.  Such a thing is perfectly usable for a toString which takes 
 a char[] delegate sink, just pass &put.  In fact, there could be a 
 default toString function in Object that does just that:
  class Object
 {
    ...
    void toString(delegate void(in char[] buf) put, string fmt) const
    {}
    void toString(TextOutputStream tos, string fmt) const
    { toString(&tos.put, fmt); }
 }

 I'd agree with the delegate idea if we established that UTF-8 is 
 favored compared to all other formats.

 
 D seems to favor UTF8 -- it is the default type for string literals.  I 
 don't think I've ever used dchar, and I usually only use wchar to talk 
 to Win32 functions when required.
 
 The question I'd ask is -- how common is it where the versions other 
 than char[] would be more convenient?

I don't know. I think Asian-language users might give a salient answer.


Andrei

Nov 12 2009

Bill Baxter <wbaxter gmail.com> writes:

On Thu, Nov 12, 2009 at 10:46 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 I'd agree with the delegate idea if we established that UTF-8 is favore=



d
 compared to all other formats.

 D seems to favor UTF8 -- it is the default type for string literals. =A0=


I
 don't think I've ever used dchar, and I usually only use wchar to talk t=


o
 Win32 functions when required.

 The question I'd ask is -- how common is it where the versions other tha=


n
 char[] would be more convenient?

 I don't know. I think Asian-language users might give a salient answer.

This isn't authoritative, but I don't think utf-16 is commonly used in
Japan (except for calling Windows APIs).

If you look at Mozilla the default Japanese encoding listed is
Shift-JIS.  A lot of Japanese email still gets sent as ISO-2022-JP.
Otherwise utf-8 I think.   A quick look at www.asahi.com shows they're
using EUC-JP.  nicovideo.jp is using utf-8.  I seem to recall that my
Japanese Visual Studio even saved files in Utf-8, or at least could be
set to use utf-8.   In short, I think utf-8 is closer to being a
widely accepted standard for documents over there than utf-16 is.

--bb

Nov 12 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 12 Nov 2009 13:46:08 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

   From my C++ book, it appears to only use virtual inheritance.  I  
 don't know enough about virtual inheritance to know how that changes  
 function calls.
  As far as virtual functions, only the destructor is virtual, so there  
 is no issue there.

 You're right, but there is an issue because as far as I can recall these  
 functions' implementation do end up calling a virtual function per char;  
 that might be streambuf.overflow. I'm not keen on investigating this any  
 further, but I'd be grateful if you shared any related knowledge.

Yep, you are right.  It appears the reason they do this is so the  
conversion to the appropriate width can be done per character (and is a  
no-op for char).

 At the end of the day, there seem to be violent agreement that we don't  
 want one virtual call per character or one delegate call per character.

After running my tests, it appears the virtual call vs. delegate is so  
negligible, and the virtual call vs. direct call is only slightly less  
negligible, I think the virtualness may not matter.  However, I think  
avoiding one *call* per character is a worthy goal.

This doesn't mean I change my mind :)  I still think there is little  
benefit to having to conjure up an entire object just to convert something  
to a string vs. writing a simple inner function.

One way to find out is to support only char[], and see who complains :)   
It'd be much easier to go from supporting char[] to supporting all the  
widths than going from supporting all to just one.

-Steve

Nov 12 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 13:46:08 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
   From my C++ book, it appears to only use virtual inheritance.  I 
 don't know enough about virtual inheritance to know how that changes 
 function calls.
  As far as virtual functions, only the destructor is virtual, so 
 there is no issue there.

 You're right, but there is an issue because as far as I can recall 
 these functions' implementation do end up calling a virtual function 
 per char; that might be streambuf.overflow. I'm not keen on 
 investigating this any further, but I'd be grateful if you shared any 
 related knowledge.

 
 Yep, you are right.  It appears the reason they do this is so the 
 conversion to the appropriate width can be done per character (and is a 
 no-op for char).
 
 At the end of the day, there seem to be violent agreement that we 
 don't want one virtual call per character or one delegate call per 
 character.

 
 After running my tests, it appears the virtual call vs. delegate is so 
 negligible, and the virtual call vs. direct call is only slightly less 
 negligible, I think the virtualness may not matter.  However, I think 
 avoiding one *call* per character is a worthy goal.
 
 This doesn't mean I change my mind :)  I still think there is little 
 benefit to having to conjure up an entire object just to convert 
 something to a string vs. writing a simple inner function.
 
 One way to find out is to support only char[], and see who complains :)  
 It'd be much easier to go from supporting char[] to supporting all the 
 widths than going from supporting all to just one.

One problem I just realized is that, if we e.g. offer only put(in 
char[]) or a delegate to that effect, we make it impossible to output 
one character efficiently. The (&c)[0 .. 1] trick will not work in safe 
mode. You'd have to allocate a one-element array dynamically.

Also, many OSs adopted UTF-16 as their standard format. It may be wise 
to design for compatibility.


Andrei

Nov 12 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 12 Nov 2009 14:40:12 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 13:46:08 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

   From my C++ book, it appears to only use virtual inheritance.  I  
 don't know enough about virtual inheritance to know how that changes  
 function calls.
  As far as virtual functions, only the destructor is virtual, so  
 there is no issue there.

 You're right, but there is an issue because as far as I can recall  
 these functions' implementation do end up calling a virtual function  
 per char; that might be streambuf.overflow. I'm not keen on  
 investigating this any further, but I'd be grateful if you shared any  
 related knowledge.

  Yep, you are right.  It appears the reason they do this is so the  
 conversion to the appropriate width can be done per character (and is a  
 no-op for char).

 At the end of the day, there seem to be violent agreement that we  
 don't want one virtual call per character or one delegate call per  
 character.

  After running my tests, it appears the virtual call vs. delegate is so  
 negligible, and the virtual call vs. direct call is only slightly less  
 negligible, I think the virtualness may not matter.  However, I think  
 avoiding one *call* per character is a worthy goal.
  This doesn't mean I change my mind :)  I still think there is little  
 benefit to having to conjure up an entire object just to convert  
 something to a string vs. writing a simple inner function.
  One way to find out is to support only char[], and see who complains  
 :)  It'd be much easier to go from supporting char[] to supporting all  
 the widths than going from supporting all to just one.

 One problem I just realized is that, if we e.g. offer only put(in  
 char[]) or a delegate to that effect, we make it impossible to output  
 one character efficiently. The (&c)[0 .. 1] trick will not work in safe  
 mode. You'd have to allocate a one-element array dynamically.

char[1] buf;
buf[0] = c;
put(buf);

Although it would be a useful feature to be able to convert a value type  
to an array of one element reference, especially since that should be as  
safe as taking a slice of a static array.

Another solution, although I'm unaware of the added costs:

void toString(void delegate(in char[]...) put, string fmt);

 Also, many OSs adopted UTF-16 as their standard format. It may be wise  
 to design for compatibility.

So you want toString's to look like this?

version(utf16isdefault)
{
   textobj.put("Array: "w);
   ...
}
else
{
   textobj.put("Array: ");
   ...
}

-Steve

Nov 12 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 14:40:12 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 13:46:08 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

   From my C++ book, it appears to only use virtual inheritance.  I 
 don't know enough about virtual inheritance to know how that 
 changes function calls.
  As far as virtual functions, only the destructor is virtual, so 
 there is no issue there.

 You're right, but there is an issue because as far as I can recall 
 these functions' implementation do end up calling a virtual function 
 per char; that might be streambuf.overflow. I'm not keen on 
 investigating this any further, but I'd be grateful if you shared 
 any related knowledge.

  Yep, you are right.  It appears the reason they do this is so the 
 conversion to the appropriate width can be done per character (and is 
 a no-op for char).

 At the end of the day, there seem to be violent agreement that we 
 don't want one virtual call per character or one delegate call per 
 character.

  After running my tests, it appears the virtual call vs. delegate is 
 so negligible, and the virtual call vs. direct call is only slightly 
 less negligible, I think the virtualness may not matter.  However, I 
 think avoiding one *call* per character is a worthy goal.
  This doesn't mean I change my mind :)  I still think there is little 
 benefit to having to conjure up an entire object just to convert 
 something to a string vs. writing a simple inner function.
  One way to find out is to support only char[], and see who complains 
 :)  It'd be much easier to go from supporting char[] to supporting 
 all the widths than going from supporting all to just one.

 One problem I just realized is that, if we e.g. offer only put(in 
 char[]) or a delegate to that effect, we make it impossible to output 
 one character efficiently. The (&c)[0 .. 1] trick will not work in 
 safe mode. You'd have to allocate a one-element array dynamically.

 
 char[1] buf;
 buf[0] = c;
 put(buf);

This would not compile in SafeD.

 Although it would be a useful feature to be able to convert a value type 
 to an array of one element reference, especially since that should be as 
 safe as taking a slice of a static array.
 
 Another solution, although I'm unaware of the added costs:
 
 void toString(void delegate(in char[]...) put, string fmt);
 
 Also, many OSs adopted UTF-16 as their standard format. It may be wise 
 to design for compatibility.

 
 So you want toString's to look like this?
 
 version(utf16isdefault)
 {
   textobj.put("Array: "w);
   ...
 }
 else
 {
   textobj.put("Array: ");
   ...
 }
 
 -Steve


I was just thinking of offering an interface that offers utf8 and utf16 
and utf32.


Andrei

Nov 12 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 12 Nov 2009 16:19:39 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 14:40:12 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 13:46:08 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

   From my C++ book, it appears to only use virtual inheritance.  I  
 don't know enough about virtual inheritance to know how that  
 changes function calls.
  As far as virtual functions, only the destructor is virtual, so  
 there is no issue there.

 You're right, but there is an issue because as far as I can recall  
 these functions' implementation do end up calling a virtual function  
 per char; that might be streambuf.overflow. I'm not keen on  
 investigating this any further, but I'd be grateful if you shared  
 any related knowledge.

  Yep, you are right.  It appears the reason they do this is so the  
 conversion to the appropriate width can be done per character (and is  
 a no-op for char).

 At the end of the day, there seem to be violent agreement that we  
 don't want one virtual call per character or one delegate call per  
 character.

  After running my tests, it appears the virtual call vs. delegate is  
 so negligible, and the virtual call vs. direct call is only slightly  
 less negligible, I think the virtualness may not matter.  However, I  
 think avoiding one *call* per character is a worthy goal.
  This doesn't mean I change my mind :)  I still think there is little  
 benefit to having to conjure up an entire object just to convert  
 something to a string vs. writing a simple inner function.
  One way to find out is to support only char[], and see who complains  
 :)  It'd be much easier to go from supporting char[] to supporting  
 all the widths than going from supporting all to just one.

 One problem I just realized is that, if we e.g. offer only put(in  
 char[]) or a delegate to that effect, we make it impossible to output  
 one character efficiently. The (&c)[0 .. 1] trick will not work in  
 safe mode. You'd have to allocate a one-element array dynamically.

  char[1] buf;
 buf[0] = c;
 put(buf);

 This would not compile in SafeD.

:O

Why not?  I would expect that using a local buffer would be the main way  
for converting non-string things to strings, or to avoid calling the  
delegate/vfunction lots of times.

i.e. if I want to output an integer i:


if(i == 0) put("0");
else
{
   char[20] buf;
   int idx = buf.length - 1;
   while(i != 0)
   {
     buf[idx] = i % 10;
     --idx;
     i /= 10;
   }
   put(buf[idx..$]); // no compily in SafeD???
}

Do I have to allocate a heap buffer in SafeD?

 Also, many OSs adopted UTF-16 as their standard format. It may be wise  
 to design for compatibility.

  So you want toString's to look like this?
  version(utf16isdefault)
 {
   textobj.put("Array: "w);
   ...
 }
 else
 {
   textobj.put("Array: ");
   ...
 }
  -Steve


 I was just thinking of offering an interface that offers utf8 and utf16  
 and utf32.

Yes, and your explaination for this is because many OSes adopt UTF-16 as  
their standard format.  My expectation is that the outputter will convert  
to the required OS format anyways, regardless of what you pass it, so why  
should we write code to cater to what the OS wants?  I'd like to write  
string-handling code once and be done with it, not try to optimize my  
toString functions so that they use the "right" methods for the current  
OS.  I asserted that the only reason you want to use the functions other  
than the char[] version is in the case where your data is *stored* as  
wchar[] or dchar[].  Otherwise, it makes no sense to do the conversion  
because the outputter already does it for you.  So the question becomes,  
how often do you need to output data that's already in dchar[] or wchar[]  
format, and is it worth passing around a list of functions just in case  
you need that, or should you just call a conversion routine the few times  
you need it?

Let's not forget that this is mainly for debugging...

-Steve

Nov 12 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 16:19:39 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 14:40:12 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 13:46:08 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

   From my C++ book, it appears to only use virtual inheritance.  
 I don't know enough about virtual inheritance to know how that 
 changes function calls.
  As far as virtual functions, only the destructor is virtual, so 
 there is no issue there.

 You're right, but there is an issue because as far as I can recall 
 these functions' implementation do end up calling a virtual 
 function per char; that might be streambuf.overflow. I'm not keen 
 on investigating this any further, but I'd be grateful if you 
 shared any related knowledge.

  Yep, you are right.  It appears the reason they do this is so the 
 conversion to the appropriate width can be done per character (and 
 is a no-op for char).

 At the end of the day, there seem to be violent agreement that we 
 don't want one virtual call per character or one delegate call per 
 character.

  After running my tests, it appears the virtual call vs. delegate 
 is so negligible, and the virtual call vs. direct call is only 
 slightly less negligible, I think the virtualness may not matter.  
 However, I think avoiding one *call* per character is a worthy goal.
  This doesn't mean I change my mind :)  I still think there is 
 little benefit to having to conjure up an entire object just to 
 convert something to a string vs. writing a simple inner function.
  One way to find out is to support only char[], and see who 
 complains :)  It'd be much easier to go from supporting char[] to 
 supporting all the widths than going from supporting all to just one.

 One problem I just realized is that, if we e.g. offer only put(in 
 char[]) or a delegate to that effect, we make it impossible to 
 output one character efficiently. The (&c)[0 .. 1] trick will not 
 work in safe mode. You'd have to allocate a one-element array 
 dynamically.

  char[1] buf;
 buf[0] = c;
 put(buf);

 This would not compile in SafeD.

 
 :O
 
 Why not?  I would expect that using a local buffer would be the main way 
 for converting non-string things to strings, or to avoid calling the 
 delegate/vfunction lots of times.

Well a stack-allocated buffer is stack-allocated, and passing a slice 
out of it to a function may cause the function to escape the slice.

 i.e. if I want to output an integer i:
 
 
 if(i == 0) put("0");
 else
 {
   char[20] buf;
   int idx = buf.length - 1;
   while(i != 0)
   {
     buf[idx] = i % 10;
     --idx;
     i /= 10;
   }
   put(buf[idx..$]); // no compily in SafeD???
 }
 
 Do I have to allocate a heap buffer in SafeD?

I'm afraid so. Unless of course you have a put(dchar) routine handy :o).

 Also, many OSs adopted UTF-16 as their standard format. It may be 
 wise to design for compatibility.

  So you want toString's to look like this?
  version(utf16isdefault)
 {
   textobj.put("Array: "w);
   ...
 }
 else
 {
   textobj.put("Array: ");
   ...
 }
  -Steve


 I was just thinking of offering an interface that offers utf8 and 
 utf16 and utf32.

 
 Yes, and your explaination for this is because many OSes adopt UTF-16 as 
 their standard format.  My expectation is that the outputter will 
 convert to the required OS format anyways, regardless of what you pass 
 it, so why should we write code to cater to what the OS wants?  I'd like 
 to write string-handling code once and be done with it, not try to 
 optimize my toString functions so that they use the "right" methods for 
 the current OS.  I asserted that the only reason you want to use the 
 functions other than the char[] version is in the case where your data 
 is *stored* as wchar[] or dchar[].  Otherwise, it makes no sense to do 
 the conversion because the outputter already does it for you.  So the 
 question becomes, how often do you need to output data that's already in 
 dchar[] or wchar[] format, and is it worth passing around a list of 
 functions just in case you need that, or should you just call a 
 conversion routine the few times you need it?
 
 Let's not forget that this is mainly for debugging...

If it's mainly for debugging maybe it's not worth spending time on.

Andrei

Nov 12 2009

Bill Baxter <wbaxter gmail.com> writes:

On Thu, Nov 12, 2009 at 1:54 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 Let's not forget that this is mainly for debugging...

 If it's mainly for debugging maybe it's not worth spending time on.

Nonsense!  Developers spend a lot of time debugging.  Helping people
debug their programs is certainly worth spending time on.

--bb

Nov 12 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Thu, Nov 12, 2009 at 1:54 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Let's not forget that this is mainly for debugging...

 If it's mainly for debugging maybe it's not worth spending time on.

 
 Nonsense!  Developers spend a lot of time debugging.  Helping people
 debug their programs is certainly worth spending time on.
 
 --bb

Sorry sorry. I just meant to say it's not worth coming with an airtight 
design. We might afford some extra conversions and extra virtual calls I 
guess.

But that being said, I'd so much want to start thinking of an actual 
text serialization infrastructure. Why develop one later with the 
mention "well use that stuff for debugging only, this is the real stuff."

Andrei

Nov 12 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 12 Nov 2009 17:13:06 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Bill Baxter wrote:
 On Thu, Nov 12, 2009 at 1:54 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Let's not forget that this is mainly for debugging...

 If it's mainly for debugging maybe it's not worth spending time on.

  Nonsense!  Developers spend a lot of time debugging.  Helping people
 debug their programs is certainly worth spending time on.
  --bb

 Sorry sorry. I just meant to say it's not worth coming with an airtight  
 design. We might afford some extra conversions and extra virtual calls I  
 guess.

 But that being said, I'd so much want to start thinking of an actual  
 text serialization infrastructure. Why develop one later with the  
 mention "well use that stuff for debugging only, this is the real stuff."

The main purpose to serialize is to be able to deserialize.  The main  
reason to print debug information is so a person can read it.  I don't  
know if those two goals overlap enough.

I think we need both.  Maybe one uses the other, I'm not sure, but a way  
to say "here's how you interact with writefln and friends" would be very  
nice.

-Steve

Nov 12 2009

Yigal Chripun <yigal100 gmail.com> writes:

Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 17:13:06 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Bill Baxter wrote:
 On Thu, Nov 12, 2009 at 1:54 PM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 Let's not forget that this is mainly for debugging...

 If it's mainly for debugging maybe it's not worth spending time on.

  Nonsense!  Developers spend a lot of time debugging.  Helping people
 debug their programs is certainly worth spending time on.
  --bb

 Sorry sorry. I just meant to say it's not worth coming with an 
 airtight design. We might afford some extra conversions and extra 
 virtual calls I guess.

 But that being said, I'd so much want to start thinking of an actual 
 text serialization infrastructure. Why develop one later with the 
 mention "well use that stuff for debugging only, this is the real stuff."

 
 The main purpose to serialize is to be able to deserialize.  The main 
 reason to print debug information is so a person can read it.  I don't 
 know if those two goals overlap enough.
 
 I think we need both.  Maybe one uses the other, I'm not sure, but a way 
 to say "here's how you interact with writefln and friends" would be very 
 nice.
 
 -Steve

I'd add to that the a format facility should be locale aware as in .Net.
i.e: (pseudo-code)

auto str = format("{0}", 2.4, CurrentCulture);
// or specify a specific locale

str will be either "2.4" or "2,4" based on locale.

this serves an entirely different purpose from serialization even though 
both have common parts.

you can't and shouldn't try to de-serialize the above text representation.

Nov 12 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 12 Nov 2009 16:54:13 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:


  Let's not forget that this is mainly for debugging...

 If it's mainly for debugging maybe it's not worth spending time on.

Debugging is not always done by the developer on his system where a  
debugger is available.  The main use I see for toString is logging (for  
the purpose of debugging post-mortem failures on customer's systems).

-Steve

Nov 12 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 12 Nov 2009 11:14:56 -0500, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Thu, 12 Nov 2009 10:29:17 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the cost of calling through the delegate is roughly the same as  
 a virtual call.

 Not exactly.  I think you are right that struct member calls are faster  
 than delegates, but only slightly.  The difference being that a struct  
 member call does not need to load the function address from the stack,  
 it can hard-code the address directly.

 However, virtual calls have to be lower performing because you are doing  
 two indirections, one to the class vtable, then one to the function  
 address itself.  Plus those two locations are most likely located on the  
 heap, not the stack, and so may not be in the cache.

Some rudamentary attempts at benchmarking:

testme.d:

struct S
{
     void foo(int x){}
}

interface I
{
     void foo(int x);
}

class C : I
{
     void foo(int x){}
}

const loopcount = 10_000_000_000L;

void doVirtual()
{
     C c = new C;
     for(auto x = loopcount; x > 0; x--)
         c.foo(x);
}

void doInterface()
{
     I i = new C;
     for(auto x = loopcount; x > 0; x--)
         i.foo(x);
}

void doDelegate()
{
     auto d = new C;
     auto dg = &d.foo;
     for(auto x = loopcount; x > 0; x--)
         dg(x);
}

void doStruct()
{
     S s;
     for(auto x = loopcount; x > 0; x--)
         s.foo(x);
}

void main(char[][] args)
{
     switch(args[1])
     {
         case "virtual":
             doVirtual();
             break;
         case "interface":
             doInterface();
             break;
         case "struct":
             doStruct();
             break;
         case "delegate":
             doDelegate();
             break;
     }
}


[steves steveslaptop testd]$ time ./testme interface

real	1m18.152s
user	1m16.638s
sys	0m0.015s
[steves steveslaptop testd]$ time ./testme virtual

real	1m11.146s
user	1m10.497s
sys	0m0.014s
[steves steveslaptop testd]$ time ./testme struct

real	1m5.828s
user	1m5.249s
sys	0m0.011s
[steves steveslaptop testd]$ time ./testme delegate

real	1m10.464s
user	1m9.856s
sys	0m0.010s


According to this, delegates are slightly faster than virtual calls, but  
not by much.  By far a direct call is faster, but I was surprised at how  
little overhead virtual calls add in relation to the loop counter.  I had  
to use 10 billion loops or else the difference was undetectable.

I used dmd 1.046 -release -O (the -release is needed to get rid of the  
class method checking the invariant every call).

The relative assembly for calling a virtual method is:

mov	ECX,[EBX]
mov	EAX,EBX
push	dword ptr -8[EBP]
call	dword ptr 014h[ECX]

and the assembly for calling a delegate is:

push	dword ptr -8[EBP]
mov	EAX,-010h[EBP]
call	EBX

-Steve

Nov 12 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 By far a direct call is faster, but I was surprised at how
 little overhead virtual calls add in relation to the loop counter.  I had
 to use 10 billion loops or else the difference was undetectable.
 I used dmd 1.046 -release -O (the -release is needed to get rid of the
 class method checking the invariant every call).
 The relative assembly for calling a virtual method is:
 mov	ECX,[EBX]
 mov	EAX,EBX
 push	dword ptr -8[EBP]
 call	dword ptr 014h[ECX]
 and the assembly for calling a delegate is:
 push	dword ptr -8[EBP]
 mov	EAX,-010h[EBP]
 call	EBX
 -Steve

Your benchmarks don't show that the direct call is much faster.  You had
inlining
disabled.  Was this intentional?  If so, it proves my point that most of the
overhead from virtual calls comes from the fact that they can't usually be
inlined, not because they're virtual.

Nov 12 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 12 Nov 2009 12:38:00 -0500, dsimcha <dsimcha yahoo.com> wrote:

 == Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
  By far a direct call is faster, but I was surprised at how
 little overhead virtual calls add in relation to the loop counter.  I  
 had
 to use 10 billion loops or else the difference was undetectable.
 I used dmd 1.046 -release -O (the -release is needed to get rid of the
 class method checking the invariant every call).
 The relative assembly for calling a virtual method is:
 mov	ECX,[EBX]
 mov	EAX,EBX
 push	dword ptr -8[EBP]
 call	dword ptr 014h[ECX]
 and the assembly for calling a delegate is:
 push	dword ptr -8[EBP]
 mov	EAX,-010h[EBP]
 call	EBX
 -Steve

 Your benchmarks don't show that the direct call is much faster.  You had  
 inlining
 disabled.  Was this intentional?  If so, it proves my point that most of  
 the
 overhead from virtual calls comes from the fact that they can't usually  
 be
 inlined, not because they're virtual.

The direct call was 5 seconds faster.  Divide by 10 billion and you get a  
small but present amount.

Inlining makes the struct member function call disappear (b/c foo does  
nothing!), so it's not really a relevant benchmark.

I did the "struct" version as a baseline.  Consider that the struct  
version is the cost of doing the loop increments, pushing the 'this'  
pointer and argument, and calling the function.  Any difference from that  
is the overhead of virtual/delegate/interface calls.

Inlining is not possible with delegates (yet), so it's not really  
important for this argument.

-Steve

Nov 12 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 On Thu, 12 Nov 2009 12:38:00 -0500, dsimcha <dsimcha yahoo.com> wrote:
 == Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
  By far a direct call is faster, but I was surprised at how
 little overhead virtual calls add in relation to the loop counter.  I
 had
 to use 10 billion loops or else the difference was undetectable.
 I used dmd 1.046 -release -O (the -release is needed to get rid of the
 class method checking the invariant every call).
 The relative assembly for calling a virtual method is:
 mov	ECX,[EBX]
 mov	EAX,EBX
 push	dword ptr -8[EBP]
 call	dword ptr 014h[ECX]
 and the assembly for calling a delegate is:
 push	dword ptr -8[EBP]
 mov	EAX,-010h[EBP]
 call	EBX
 -Steve

 Your benchmarks don't show that the direct call is much faster.  You had
 inlining
 disabled.  Was this intentional?  If so, it proves my point that most of
 the
 overhead from virtual calls comes from the fact that they can't usually
 be
 inlined, not because they're virtual.

 The direct call was 5 seconds faster.  Divide by 10 billion and you get a
 small but present amount.

Yes, about 0.5 nanoseconds.  In other words, if your CPU is roughly 2 GHz, about
one **clock cycle**.  This is definitely negligible IMHO.

 Inlining makes the struct member function call disappear (b/c foo does
 nothing!), so it's not really a relevant benchmark.

Right, my point is that the overhead of indirect function calls compared to
direct
function calls is too small to ever be worth considering assuming the direct
function call is not inlined.  However, when the direct function call may be
inlined, this is where indirect calls really hurt because they usually can't be
inlined.

 I did the "struct" version as a baseline.  Consider that the struct
 version is the cost of doing the loop increments, pushing the 'this'
 pointer and argument, and calling the function.  Any difference from that
 is the overhead of virtual/delegate/interface calls.
 Inlining is not possible with delegates (yet), so it's not really
 important for this argument.
 -Steve

Nov 12 2009

Justin Johansson <no spam.com> writes:

Don Wrote:

 Lutger wrote:
 Justin Johansson wrote:
 
 Lutger Wrote:

 Justin Johansson wrote:

 I assert that the semantics of "toString" or similarly named/purposed
 methods/functions in many PL's (including and not limited to D) is
 ill-defined.

 To put this statement into perspective, I would be most appreciative of
 D NG readers responding with their own idea(s) of what the semantics of
 "toString" are (or should be) in a language agnostic ideology.

 My other reply didn't take the language agnostic into account, sorry.

 Semantics of toString would depend on the object, I would think there are
 three general types of objects:

 1. objects with only one sensible or one clear default string
 representations, like integers. Maybe even none of these exist (except
 strings themselves?)

 2. objects that, given some formatting options or locale have a clear
 string representation. floating points, dates, curreny and the like.

 3. objects that have no sensible default representation.

 toString() would not make sense for 3) type objects and only for 2) type
 objects as part of a formatting / localization package.

 toString() as a debugging aid sometimes doubles as a formatter for 1) and
 2) class objects, but that may be more confusing than it's worth.

 Thanks for that Lutger.

 Do you think it would make better sense if programming languages/their
 libraries separated functions/methods which are currently loosely purposed
 as "toString" into methods which are more specific to the types you
 suggest (leaving only the types/classifications and number thereof to
 argue about)?

 In my own D project, I've introduced a toDebugString method and left
 toString alone. There are times when I like D's default toString printing
 out the name of the object
 class.  For debug purposes there are times also when I like to see a
 string printed
 out in quotes so you can tell the difference between "123" and 123.  Then
 again, and since I'm working on a scripting language, sometimes I like to
 see debug output distinguish between different numeric types.

 Anyway going by the replies on this topic, looks like most people view
 toString as being good for debug purposes and that about it.

 Cheers
 Justin

 
 Your design makes better sense (to me at least) because it is based on why 
 you want a string from some object. 
 
 Take .NET for example: it does provide very elaborate and nice formatting 
 options based and toString() with parameters. For some types however, the 
 default toString() gives you the name of the type itself which is in no way 
 related to formatting an object. You learn to work with it, but I find it a 
 bit muddled. 
 
 As a last note, I think people view toString as a debug thing mostly because 
 it is very underpowered.

 
 There is a definite use for such as thing. But the existing toString() 
 is much, much worse than useless. People think you can do something with 
 it, but you can't.
 eg, people have asked for BigInt to support toString(). That is an 
 over-my-dead-body.

s/over-my-dead-body/over-your-dead-body/ :-)

At least those are the words that Brendan Eich uses when people
seek to make JavaScript multi-threaded.

http://weblogs.mozillazine.org/roadmap/archives/2007/02/threads_suck.html

http://www.teknico.net/misc/fortune/concurrency.en.txt

Google:

http://www.google.com.au/#hl=en&q=Brendan+Eich+"your+dead+body"

Best regards and thanks to all respondents on "toString" topic,

Justin

Nov 10 2009

Bill Baxter <wbaxter gmail.com> writes:

On Tue, Nov 10, 2009 at 2:51 AM, Don <nospam nospam.com> wrote:
 Lutger wrote:
 Justin Johansson wrote:

 Lutger Wrote:

 Justin Johansson wrote:

 I assert that the semantics of "toString" or similarly named/purposed
 methods/functions in many PL's (including and not limited to D) is
 ill-defined.

 To put this statement into perspective, I would be most appreciative =





of
 D NG readers responding with their own idea(s) of what the semantics =





of
 "toString" are (or should be) in a language agnostic ideology.

 My other reply didn't take the language agnostic into account, sorry.

 Semantics of toString would depend on the object, I would think there
 are
 three general types of objects:

 1. objects with only one sensible or one clear default string
 representations, like integers. Maybe even none of these exist (except
 strings themselves?)

 2. objects that, given some formatting options or locale have a clear
 string representation. floating points, dates, curreny and the like.

 3. objects that have no sensible default representation.

 toString() would not make sense for 3) type objects and only for 2) ty=




pe
 objects as part of a formatting / localization package.

 toString() as a debugging aid sometimes doubles as a formatter for 1)
 and
 2) class objects, but that may be more confusing than it's worth.

 Thanks for that Lutger.

 Do you think it would make better sense if programming languages/their
 libraries separated functions/methods which are currently loosely
 purposed
 as "toString" into methods which are more specific to the types you
 suggest (leaving only the types/classifications and number thereof to
 argue about)?

 In my own D project, I've introduced a toDebugString method and left
 toString alone. There are times when I like D's default toString printi=



ng
 out the name of the object
 class. =A0For debug purposes there are times also when I like to see a
 string printed
 out in quotes so you can tell the difference between "123" and 123. =A0=



Then
 again, and since I'm working on a scripting language, sometimes I like =



to
 see debug output distinguish between different numeric types.

 Anyway going by the replies on this topic, looks like most people view
 toString as being good for debug purposes and that about it.

 Cheers
 Justin

 Your design makes better sense (to me at least) because it is based on w=


hy
 you want a string from some object.
 Take .NET for example: it does provide very elaborate and nice formattin=


g
 options based and toString() with parameters. For some types however, th=


e
 default toString() gives you the name of the type itself which is in no =


way
 related to formatting an object. You learn to work with it, but I find i=


t a
 bit muddled.
 As a last note, I think people view toString as a debug thing mostly
 because it is very underpowered.

 There is a definite use for such as thing. But the existing toString() is
 much, much worse than useless. People think you can do something with it,
 but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

You can definitely do something with it -- printf debugging.  And if I
were using BigInt, that's exactly why I'd want BigInt to have a
toString.  Just out of curiousity, how does someone print out the
value of a BigInt right now?

--bb

Nov 10 2009

bearophile <bearophileHUGS lycos.com> writes:

Bill Baxter:

 Just out of curiousity, how does someone print out the
 value of a BigInt right now?

I have added a toString to my copy of the BigInt.

Bye,
bearophile

Nov 10 2009

Don <nospam nospam.com> writes:

Bill Baxter wrote:
 On Tue, Nov 10, 2009 at 2:51 AM, Don <nospam nospam.com> wrote:
 Lutger wrote:
 Justin Johansson wrote:

 Lutger Wrote:

 Justin Johansson wrote:

 I assert that the semantics of "toString" or similarly named/purposed
 methods/functions in many PL's (including and not limited to D) is
 ill-defined.

 To put this statement into perspective, I would be most appreciative of
 D NG readers responding with their own idea(s) of what the semantics of
 "toString" are (or should be) in a language agnostic ideology.

 My other reply didn't take the language agnostic into account, sorry.

 Semantics of toString would depend on the object, I would think there
 are
 three general types of objects:

 1. objects with only one sensible or one clear default string
 representations, like integers. Maybe even none of these exist (except
 strings themselves?)

 2. objects that, given some formatting options or locale have a clear
 string representation. floating points, dates, curreny and the like.

 3. objects that have no sensible default representation.

 toString() would not make sense for 3) type objects and only for 2) type
 objects as part of a formatting / localization package.

 toString() as a debugging aid sometimes doubles as a formatter for 1)
 and
 2) class objects, but that may be more confusing than it's worth.

 Thanks for that Lutger.

 Do you think it would make better sense if programming languages/their
 libraries separated functions/methods which are currently loosely
 purposed
 as "toString" into methods which are more specific to the types you
 suggest (leaving only the types/classifications and number thereof to
 argue about)?

 In my own D project, I've introduced a toDebugString method and left
 toString alone. There are times when I like D's default toString printing
 out the name of the object
 class.  For debug purposes there are times also when I like to see a
 string printed
 out in quotes so you can tell the difference between "123" and 123.  Then
 again, and since I'm working on a scripting language, sometimes I like to
 see debug output distinguish between different numeric types.

 Anyway going by the replies on this topic, looks like most people view
 toString as being good for debug purposes and that about it.

 Cheers
 Justin

 Your design makes better sense (to me at least) because it is based on why
 you want a string from some object.
 Take .NET for example: it does provide very elaborate and nice formatting
 options based and toString() with parameters. For some types however, the
 default toString() gives you the name of the type itself which is in no way
 related to formatting an object. You learn to work with it, but I find it a
 bit muddled.
 As a last note, I think people view toString as a debug thing mostly
 because it is very underpowered.

 There is a definite use for such as thing. But the existing toString() is
 much, much worse than useless. People think you can do something with it,
 but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

 
 You can definitely do something with it -- printf debugging.  And if I
 were using BigInt, that's exactly why I'd want BigInt to have a
 toString. 

I almost always want to print the value out in hex. And with some kind 
of digit separators, so that I can see how many digits it has.

  Just out of curiousity, how does someone print out the
 value of a BigInt right now?

In Tango, there's just .toHex() and .toDecimalString(). Needs proper 
formatting options, it's the biggest thing which isn't done. I hit one 
too many compiler segfaults and starting patching the compiler instead 
<g>. But I really want a decent toString().

Given a BigInt n, you should be able to just do

writefln("%s %x", n, n);  // Phobos
formatln("{0} {0:X}", n); // Tango

To solve this part of the issue, it would be enough to have toString() 
take a string parameter. (it would be "x" or "X" in this case).

string toString(string fmt);
But the performance would still be very poor, and that's much more 
difficult to solve.

Nov 10 2009

bearophile <bearophileHUGS lycos.com> writes:

Don:
 But the performance would still be very poor, and that's much more 
 difficult to solve.

This may help:

http://fredrik-j.blogspot.com/2008/07/making-division-in-python-faster.html
http://fredrik-j.blogspot.com/2008/07/division-sequel-with-bonus-material.html
http://bugs.python.org/issue3451

Bye,
bearophile

Nov 10 2009

Bill Baxter <wbaxter gmail.com> writes:

On Tue, Nov 10, 2009 at 6:11 AM, bearophile <bearophileHUGS lycos.com> wrote:
 Don:
 But the performance would still be very poor, and that's much more
 difficult to solve.

 This may help:

 http://fredrik-j.blogspot.com/2008/07/making-division-in-python-faster.html
 http://fredrik-j.blogspot.com/2008/07/division-sequel-with-bonus-material.html
 http://bugs.python.org/issue3451

Though they may be useful, those don't look to have anything to do
with formatting user types into strings, which is the subject at hand.

--bb

Nov 10 2009

bearophile <bearophileHUGS lycos.com> writes:

Bill Baxter:

 Though they may be useful, those don't look to have anything to do
 with formatting user types into strings, which is the subject at hand.

Don has said: "But the performance would still be very poor, and that's much
more difficult to solve." And those links show a way to quickly convert a large
multi-precision integer into a string. What is that I am missing?

Bye,
bearophile

Nov 10 2009

Don <nospam nospam.com> writes:

bearophile wrote:
 Bill Baxter:
 
 Though they may be useful, those don't look to have anything to do
 with formatting user types into strings, which is the subject at hand.

 
 Don has said: "But the performance would still be very poor, and that's much
more difficult to solve." And those links show a way to quickly convert a large
multi-precision integer into a string. What is that I am missing?

It's problem 2 from my original posts: being able to output something 
large (eg an xml doc) in a piece-by-piece manner.

Nov 10 2009

Bill Baxter <wbaxter gmail.com> writes:

On Tue, Nov 10, 2009 at 7:04 AM, bearophile <bearophileHUGS lycos.com> wrote:
 Bill Baxter:

 Though they may be useful, those don't look to have anything to do
 with formatting user types into strings, which is the subject at hand.

 Don has said: "But the performance would still be very poor, and that's much
more difficult to solve." And those links show a way to quickly convert a large
multi-precision integer into a string. What is that I am missing?

Maybe it's just my ignorance of BigNum issues, but those links look to
me to be about divsion and not generating string representations.  Are
those somehow synonymous in BigInt land?

--bb

Nov 10 2009

bearophile <bearophileHUGS lycos.com> writes:

Bill Baxter:
 Maybe it's just my ignorance of BigNum issues, but those links look to
 me to be about divsion and not generating string representations.  Are
 those somehow synonymous in BigInt land?

Look the numeral() function inside here from those blog posts:
http://www.dd.chalmers.se/~frejohl/code/div.py

To convert a positive integer to string you have to keep dividing a number by
10, and accumulate the modulus as the digit, converted to ['0', '9']. When the
number is zero you are done:

n = 541489
result = ""
while n:
    n, digit = divmod(n, 10)



But all those large divisions are slow if the number is huge. So that div.py
python program shows a faster algorithm that does something smarter, to
decrease the computational complexity of all that.

Bye,
bearophile

Nov 10 2009

Bill Baxter <wbaxter gmail.com> writes:

On Tue, Nov 10, 2009 at 9:16 AM, bearophile <bearophileHUGS lycos.com> wrot=
e:
 Bill Baxter:
 Maybe it's just my ignorance of BigNum issues, but those links look to
 me to be about divsion and not generating string representations. =A0Are
 those somehow synonymous in BigInt land?

 Look the numeral() function inside here from those blog posts:
 http://www.dd.chalmers.se/~frejohl/code/div.py

 To convert a positive integer to string you have to keep dividing a numbe=

r by 10, and accumulate the modulus as the digit, converted to ['0', '9']. =
When the number is zero you are done:
 n =3D 541489
 result =3D ""
 while n:
 =A0 =A0n, digit =3D divmod(n, 10)



 But all those large divisions are slow if the number is huge. So that div=

.py python program shows a faster algorithm that does something smarter, to=
 decrease the computational complexity of all that.

Well, anyway, slowness of BigInt is not what Don was referring to.  He
was talking about the general slowness of a toString interface that
forces allocating enough memory to hold the entire result, instead of
being able to dole out the result piecemeal.

--bb

Nov 10 2009

bearophile <bearophileHUGS lycos.com> writes:

Bill Baxter:
 Well, anyway, [...]

You are welcome.

Bye,
bearophile

Nov 10 2009

Bill Baxter <wbaxter gmail.com> writes:

On Tue, Nov 10, 2009 at 4:30 AM, Don <nospam nospam.com> wrote:
 =A0Just out of curiousity, how does someone print out the

 value of a BigInt right now?

 In Tango, there's just .toHex() and .toDecimalString(). Needs proper
 formatting options, it's the biggest thing which isn't done. I hit one to=

o
 many compiler segfaults and starting patching the compiler instead <g>. B=

ut
 I really want a decent toString().

Ah, ok.  So there is something, it's just not called "toString".

--bb

Nov 10 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Tue, 10 Nov 2009 15:30:20 +0300, Don <nospam nospam.com> wrote:

 Bill Baxter wrote:
 On Tue, Nov 10, 2009 at 2:51 AM, Don <nospam nospam.com> wrote:
 Lutger wrote:
 Justin Johansson wrote:

 Lutger Wrote:

 Justin Johansson wrote:

 I assert that the semantics of "toString" or similarly  
 named/purposed
 methods/functions in many PL's (including and not limited to D) is
 ill-defined.

 To put this statement into perspective, I would be most  
 appreciative of
 D NG readers responding with their own idea(s) of what the  
 semantics of
 "toString" are (or should be) in a language agnostic ideology.

 My other reply didn't take the language agnostic into account,  
 sorry.

 Semantics of toString would depend on the object, I would think  
 there
 are
 three general types of objects:

 1. objects with only one sensible or one clear default string
 representations, like integers. Maybe even none of these exist  
 (except
 strings themselves?)

 2. objects that, given some formatting options or locale have a  
 clear
 string representation. floating points, dates, curreny and the like.

 3. objects that have no sensible default representation.

 toString() would not make sense for 3) type objects and only for 2)  
 type
 objects as part of a formatting / localization package.

 toString() as a debugging aid sometimes doubles as a formatter for  
 1)
 and
 2) class objects, but that may be more confusing than it's worth.

 Thanks for that Lutger.

 Do you think it would make better sense if programming  
 languages/their
 libraries separated functions/methods which are currently loosely
 purposed
 as "toString" into methods which are more specific to the types you
 suggest (leaving only the types/classifications and number thereof to
 argue about)?

 In my own D project, I've introduced a toDebugString method and left
 toString alone. There are times when I like D's default toString  
 printing
 out the name of the object
 class.  For debug purposes there are times also when I like to see a
 string printed
 out in quotes so you can tell the difference between "123" and 123.   
 Then
 again, and since I'm working on a scripting language, sometimes I  
 like to
 see debug output distinguish between different numeric types.

 Anyway going by the replies on this topic, looks like most people  
 view
 toString as being good for debug purposes and that about it.

 Cheers
 Justin

 Your design makes better sense (to me at least) because it is based  
 on why
 you want a string from some object.
 Take .NET for example: it does provide very elaborate and nice  
 formatting
 options based and toString() with parameters. For some types however,  
 the
 default toString() gives you the name of the type itself which is in  
 no way
 related to formatting an object. You learn to work with it, but I  
 find it a
 bit muddled.
 As a last note, I think people view toString as a debug thing mostly
 because it is very underpowered.

 There is a definite use for such as thing. But the existing toString()  
 is
 much, much worse than useless. People think you can do something with  
 it,
 but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

  You can definitely do something with it -- printf debugging.  And if I
 were using BigInt, that's exactly why I'd want BigInt to have a
 toString.

 I almost always want to print the value out in hex. And with some kind  
 of digit separators, so that I can see how many digits it has.

   Just out of curiousity, how does someone print out the
 value of a BigInt right now?

 In Tango, there's just .toHex() and .toDecimalString(). Needs proper  
 formatting options, it's the biggest thing which isn't done. I hit one  
 too many compiler segfaults and starting patching the compiler instead  
 <g>. But I really want a decent toString().

 Given a BigInt n, you should be able to just do

 writefln("%s %x", n, n);  // Phobos
 formatln("{0} {0:X}", n); // Tango

 To solve this part of the issue, it would be enough to have toString()  
 take a string parameter. (it would be "x" or "X" in this case).

 string toString(string fmt);
 But the performance would still be very poor, and that's much more  
 difficult to solve.

Yes, it would solve half of the toString problems.

Another part (i.e. memory allocation) could be solved by providing an  
optional buffer to the toString:

char[] toString(string format = "s" /* comes from %s which is a default  
qualifier */, char[] buffer = null)
{
     // operate on the buffer, possibly resizing it
     // which is safe and fast - it only allocates
     // when *really* necessary, instead of always, as now
     return buffer;
}

You can use it almost the same way you used it before:

string s = assumeUnique(someObject.toString()); // because we return a  
mutable string now

Optimization example:

int sprintf(string format, ...)
{
     char[512] preallocatedBuffer;
     char[] buffer = preallocatedBuffer[]; // buffer may grow, but
     // initially points to a preallocatedBuffer

     char[] storage = buffer[]; // storage for a current element

     ...
     for (...) { // iterate over qualifiers (and arguments)
         string currentQualifier = format[i..j];
         auto currentArgument = argsTuple[n];

         char[] result = currentArgument.toString(storage);
         if (result.ptr is storage.ptr) {
             // okay, string was constructed in-place
             storage = storage[result.length..$];
         } else {
             // storage didn't have enough space for the whole
             // string (a reallocation occurred)

             int offset = buffer.length - storage.length;

             // increase the capacity
             buffer.length *= 2;

             // append our string to the buffer
	    buffer[offset..offset+storage.length] = storage[];

             // renew the temporary storage
             storage = preallocatedBuffer[];
         }
     }
     ...
}

Another example:

class Array(T)
{
     // ...
     private T[] elements;

     char[] toString(string format, char[] buffer) {
         auto builder = StringBuilder(buffer); // reallocates when no space  
left
         builder.append("[");
         foreach (i, o; elements) {
             if (i > 0) builder.append(", "); // separator

             buffer = builder.getBuffer()[appender.length..$];
             char[] result = o.toString(format, buffer);
             if (result.ptr is buffer.ptr) {
                 // no reallocation
                 builder.length += result.length; // without copying
             } else {
                 builder.append(result);
             }
         }

         builder.append("]");

         return builder.toString();
     }
}

auto array = new Array!(int);
array ~= [0, 1, 2, 3, 4];
assert(array.toString() == "[0, 1, 2, 3, 4]");

It's not very easy to take advantage of, but it's usable the old way  
(well, almost).

Any ideas?

Nov 10 2009

Bill Baxter <wbaxter gmail.com> writes:

2009/11/10 Denis Koroskin <2korden gmail.com>:
 On Tue, 10 Nov 2009 15:30:20 +0300, Don <nospam nospam.com> wrote:

 Bill Baxter wrote:
 On Tue, Nov 10, 2009 at 2:51 AM, Don <nospam nospam.com> wrote:
 Lutger wrote:
 Justin Johansson wrote:

 Lutger Wrote:

 Justin Johansson wrote:

 I assert that the semantics of "toString" or similarly
 named/purposed
 methods/functions in many PL's (including and not limited to D) is
 ill-defined.

 To put this statement into perspective, I would be most appreciati=








ve
 of
 D NG readers responding with their own idea(s) of what the semanti=








cs
 of
 "toString" are (or should be) in a language agnostic ideology.

 My other reply didn't take the language agnostic into account, sorr=







y.
 Semantics of toString would depend on the object, I would think the=







re
 are
 three general types of objects:

 1. objects with only one sensible or one clear default string
 representations, like integers. Maybe even none of these exist
 (except
 strings themselves?)

 2. objects that, given some formatting options or locale have a cle=







ar
 string representation. floating points, dates, curreny and the like=







.
 3. objects that have no sensible default representation.

 toString() would not make sense for 3) type objects and only for 2)
 type
 objects as part of a formatting / localization package.

 toString() as a debugging aid sometimes doubles as a formatter for =







1)
 and
 2) class objects, but that may be more confusing than it's worth.

 Thanks for that Lutger.

 Do you think it would make better sense if programming languages/the=






ir
 libraries separated functions/methods which are currently loosely
 purposed
 as "toString" into methods which are more specific to the types you
 suggest (leaving only the types/classifications and number thereof t=






o
 argue about)?

 In my own D project, I've introduced a toDebugString method and left
 toString alone. There are times when I like D's default toString
 printing
 out the name of the object
 class. =A0For debug purposes there are times also when I like to see=






 a
 string printed
 out in quotes so you can tell the difference between "123" and 123.
 =A0Then
 again, and since I'm working on a scripting language, sometimes I li=






ke
 to
 see debug output distinguish between different numeric types.

 Anyway going by the replies on this topic, looks like most people vi=






ew
 toString as being good for debug purposes and that about it.

 Cheers
 Justin

 Your design makes better sense (to me at least) because it is based o=





n
 why
 you want a string from some object.
 Take .NET for example: it does provide very elaborate and nice
 formatting
 options based and toString() with parameters. For some types however,
 the
 default toString() gives you the name of the type itself which is in =





no
 way
 related to formatting an object. You learn to work with it, but I fin=





d
 it a
 bit muddled.
 As a last note, I think people view toString as a debug thing mostly
 because it is very underpowered.

 There is a definite use for such as thing. But the existing toString()
 is
 much, much worse than useless. People think you can do something with
 it,
 but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

 =A0You can definitely do something with it -- printf debugging. =A0And =



if I
 were using BigInt, that's exactly why I'd want BigInt to have a
 toString.

 I almost always want to print the value out in hex. And with some kind o=


f
 digit separators, so that I can see how many digits it has.

 =A0Just out of curiousity, how does someone print out the
 value of a BigInt right now?

 In Tango, there's just .toHex() and .toDecimalString(). Needs proper
 formatting options, it's the biggest thing which isn't done. I hit one t=


oo
 many compiler segfaults and starting patching the compiler instead <g>. =


But
 I really want a decent toString().

 Given a BigInt n, you should be able to just do

 writefln("%s %x", n, n); =A0// Phobos
 formatln("{0} {0:X}", n); // Tango

 To solve this part of the issue, it would be enough to have toString()
 take a string parameter. (it would be "x" or "X" in this case).

 string toString(string fmt);
 But the performance would still be very poor, and that's much more
 difficult to solve.

 Yes, it would solve half of the toString problems.

 Another part (i.e. memory allocation) could be solved by providing an
 optional buffer to the toString:

 char[] toString(string format =3D "s" /* comes from %s which is a default
 qualifier */, char[] buffer =3D null)
 {
 =A0 =A0// operate on the buffer, possibly resizing it
 =A0 =A0// which is safe and fast - it only allocates
 =A0 =A0// when *really* necessary, instead of always, as now
 =A0 =A0return buffer;
 }

With Don's delegate idea, if you do have a toString with special
performance concerns, then it can use its own stack-allocated buffer.

void toString(void delegate(const(char)[]) put, string format)
{
    char[512] preallocBuffer;
    foreach( ... ) {
           ...
           put(preallocBuffer[0..lenUsed]);
    }
}

Which in some cases (like writefln) should be almost as efficient as
passing a buffer in.  It avoids willy-nilly unbounded allocations
anyway.
But the nice thing is that it's easy to upgrade to.  You can keep it
simple and leave toString pretty much like you had it before, just
changing the signature and the return.

void toString(void delegate(const(char)[]) put, string format)
{
    char ret[];
    foreach( ... ) {
           ...
           ret ~=3D "...";
    }
    put(ret);  // only this line needed to change for Don-style toString
}

And to get the string you just need to call format:

    assert(std.string.format(thing) =3D=3D "blah");


If the buffer is going to be passed in, then probably it should be
passed in as a full fledged output stream object with .write() methods
and such.  I don't want to have to worry about buffer management to
write a toString method.  That should be encapsulated.  But it seems
to me that Don's method offers exactly the right minimality of
interface to allow encapsulating that management without requiring it
to be done in a heavy-handed way.

--bb

Nov 10 2009

Don <nospam nospam.com> writes:

Bill Baxter wrote:
 2009/11/10 Denis Koroskin <2korden gmail.com>:
 On Tue, 10 Nov 2009 15:30:20 +0300, Don <nospam nospam.com> wrote:

 Bill Baxter wrote:
 On Tue, Nov 10, 2009 at 2:51 AM, Don <nospam nospam.com> wrote:
 Lutger wrote:
 Justin Johansson wrote:

 Lutger Wrote:

 Justin Johansson wrote:

 I assert that the semantics of "toString" or similarly
 named/purposed
 methods/functions in many PL's (including and not limited to D) is
 ill-defined.

 To put this statement into perspective, I would be most appreciative
 of
 D NG readers responding with their own idea(s) of what the semantics
 of
 "toString" are (or should be) in a language agnostic ideology.

 My other reply didn't take the language agnostic into account, sorry.

 Semantics of toString would depend on the object, I would think there
 are
 three general types of objects:

 1. objects with only one sensible or one clear default string
 representations, like integers. Maybe even none of these exist
 (except
 strings themselves?)

 2. objects that, given some formatting options or locale have a clear
 string representation. floating points, dates, curreny and the like.

 3. objects that have no sensible default representation.

 toString() would not make sense for 3) type objects and only for 2)
 type
 objects as part of a formatting / localization package.

 toString() as a debugging aid sometimes doubles as a formatter for 1)
 and
 2) class objects, but that may be more confusing than it's worth.

 Thanks for that Lutger.

 Do you think it would make better sense if programming languages/their
 libraries separated functions/methods which are currently loosely
 purposed
 as "toString" into methods which are more specific to the types you
 suggest (leaving only the types/classifications and number thereof to
 argue about)?

 In my own D project, I've introduced a toDebugString method and left
 toString alone. There are times when I like D's default toString
 printing
 out the name of the object
 class.  For debug purposes there are times also when I like to see a
 string printed
 out in quotes so you can tell the difference between "123" and 123.
  Then
 again, and since I'm working on a scripting language, sometimes I like
 to
 see debug output distinguish between different numeric types.

 Anyway going by the replies on this topic, looks like most people view
 toString as being good for debug purposes and that about it.

 Cheers
 Justin

 Your design makes better sense (to me at least) because it is based on
 why
 you want a string from some object.
 Take .NET for example: it does provide very elaborate and nice
 formatting
 options based and toString() with parameters. For some types however,
 the
 default toString() gives you the name of the type itself which is in no
 way
 related to formatting an object. You learn to work with it, but I find
 it a
 bit muddled.
 As a last note, I think people view toString as a debug thing mostly
 because it is very underpowered.

 There is a definite use for such as thing. But the existing toString()
 is
 much, much worse than useless. People think you can do something with
 it,
 but you can't.
 eg, people have asked for BigInt to support toString(). That is an
 over-my-dead-body.

  You can definitely do something with it -- printf debugging.  And if I
 were using BigInt, that's exactly why I'd want BigInt to have a
 toString.

 I almost always want to print the value out in hex. And with some kind of
 digit separators, so that I can see how many digits it has.

  Just out of curiousity, how does someone print out the
 value of a BigInt right now?

 In Tango, there's just .toHex() and .toDecimalString(). Needs proper
 formatting options, it's the biggest thing which isn't done. I hit one too
 many compiler segfaults and starting patching the compiler instead <g>. But
 I really want a decent toString().

 Given a BigInt n, you should be able to just do

 writefln("%s %x", n, n);  // Phobos
 formatln("{0} {0:X}", n); // Tango

 To solve this part of the issue, it would be enough to have toString()
 take a string parameter. (it would be "x" or "X" in this case).

 string toString(string fmt);
 But the performance would still be very poor, and that's much more
 difficult to solve.

 Yes, it would solve half of the toString problems.

 Another part (i.e. memory allocation) could be solved by providing an
 optional buffer to the toString:

 char[] toString(string format = "s" /* comes from %s which is a default
 qualifier */, char[] buffer = null)
 {
    // operate on the buffer, possibly resizing it
    // which is safe and fast - it only allocates
    // when *really* necessary, instead of always, as now
    return buffer;
 }

 
 With Don's delegate idea, if you do have a toString with special
 performance concerns, then it can use its own stack-allocated buffer.
 
 void toString(void delegate(const(char)[]) put, string format)
 {
     char[512] preallocBuffer;
     foreach( ... ) {
            ...
            put(preallocBuffer[0..lenUsed]);
     }
 }

Thanks. 'put' is so much better than 'sink'. <g>

 If the buffer is going to be passed in, then probably it should be
 passed in as a full fledged output stream object with .write() methods
 and such.  I don't want to have to worry about buffer management to
 write a toString method.  That should be encapsulated.  But it seems
 to me that Don's method offers exactly the right minimality of
 interface to allow encapsulating that management without requiring it
 to be done in a heavy-handed way.

One thing it doesn't (easily) handle is the case where an int argument 
gives the length of another one. (eg the "%*s" writefln format). I guess 
this can still be handled (very inefficiently) by converting the 
parameter value into a text number -- generally, though, that'd only be 
for direct interchangability with a built-in type; you'd normally do 
such things by calling a member function on the struct.

The other issue is grauzone's comment: perhaps compile-time varargs make 
this whole approach obsolete.

Nov 10 2009

bearophile <bearophileHUGS lycos.com> writes:

Don:

 It's problem 2 from my original posts: being able to output something 
 large (eg an xml doc) in a piece-by-piece manner.

See my post about vectorized lazyness.

Bye,
bearophile

Nov 10 2009

Genghis Khan <genghis outer.mn> writes:

Andrei Alexandrescu Wrote:

 Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 11:46:48 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:
 On Thu, 12 Nov 2009 10:29:17 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 I think the best option for toString is to take an output range 
 and write to it. (The sink is a simplified range.)

  Bad idea...
  A range only makes sense as a struct, not an interface/object.  
 I'll tell you why: performance.

 You are right. If range interfaces accommodate block transfers, this 
 problem may be addressed. I agree that one virtual call per 
 character output would be overkill. (I seem to recall it's one of 
 the reasons why C++'s iostreams are so inefficient.)

  IIRC, I don't think C++ iostreams use polymorphism

 Oh yes they do. (Did you even google?) Virtual multiple inheritance, 
 the works.

 http://www.deitel.com/articles/cplusplus_tutorials/20060225/virtualBaseClass/ 

 
  From my C++ book, it appears to only use virtual inheritance.  I don't 
 know enough about virtual inheritance to know how that changes function 
 calls.
 
 As far as virtual functions, only the destructor is virtual, so there is 
 no issue there.

 
 You're right, but there is an issue because as far as I can recall these 
 functions' implementation do end up calling a virtual function per char; 
 that might be streambuf.overflow. I'm not keen on investigating this any 
 further, but I'd be grateful if you shared any related knowledge. At the 
 end of the day, there seem to be violent agreement that we don't want 
 one virtual call per character or one delegate call per character.
 
  void put(in char[] str)
 {
   foreach(dchar dc; str)
   {
      put((&dc)[0..1]);
   }
 }
  Note that you probably want to build a buffer of dchars instead of 
 putting one at a time, but you get the idea.

 I don't get the idea. I'm seeing one virtual call per character.

 
 You missed the note.  I didn't implement it, but you could easily 
 implement a stack-allocated buffer to cache the conversions, passing 
 multiple converted code-points at once.  But I don't think it's even 
 worth discussing per my other points.
 
 That being said, one other point that makes all this moot is -- 
 toString is for debugging, not for general purpose.  We don't need to 
 support everything that is possible.  You should be able to say "hey, 
 toString only accepts char[], deal."  Of course, you could substitute 
 wchar[] or dchar[], but I think by far char[] is the most common (and 
 is the default type for string literals).

 I was hoping we could elevate the usefulness of toString a bit.

 
 Whatever kind of data the output stream gets, it's going to convert it 
 to the format it wants anyways (as for stdout, I think that would be 
 utf8), the only benefit is if you have data stored in a different width 
 that you wanted to output.  Calling a conversion function in that case I 
 think is reasonable enough, and saves the output stream from having to 
 convert/deal with it.
 
 In other words, I don't think it's going to be that common a case where 
 you need anything other than utf8 output, and therefore the cost of 
 creating an interface, making virtual calls, disallowing simple delegate 
 passing etc is worth the convenience *just in case* you have data stored 
 as wchar[] you want to output.

 
 I'm not sure.
 
 http://www.gnu.org/s/libc/manual/html_node/Streams-and-I18N.html#Streams-and-I18N
 
 gnu defines means to set and detect a utf-16 console, which dmd observes 
 (grep std/ for fwide). But then I'm not sure how many are using that 
 kind of stuff.
 
 That's not to say there is no reason to have a TextOutputStream 
 object.  Such a thing is perfectly usable for a toString which takes 
 a char[] delegate sink, just pass &put.  In fact, there could be a 
 default toString function in Object that does just that:
  class Object
 {
    ...
    void toString(delegate void(in char[] buf) put, string fmt) const
    {}
    void toString(TextOutputStream tos, string fmt) const
    { toString(&tos.put, fmt); }
 }

 I'd agree with the delegate idea if we established that UTF-8 is 
 favored compared to all other formats.

 
 D seems to favor UTF8 -- it is the default type for string literals.  I 
 don't think I've ever used dchar, and I usually only use wchar to talk 
 to Win32 functions when required.
 
 The question I'd ask is -- how common is it where the versions other 
 than char[] would be more convenient?

 
 I don't know. I think Asian-language users might give a salient answer.

Nov 12 2009

HOSOKAWA Kenchi <hskwk inter7.jp> writes:

Bill Baxter Wrote:

 On Thu, Nov 12, 2009 at 10:46 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 I'd agree with the delegate idea if we established that UTF-8 is favored
 compared to all other formats.


 don't think I've ever used dchar, and I usually only use wchar to talk to
 Win32 functions when required.

 The question I'd ask is -- how common is it where the versions other than
 char[] would be more convenient?

 I don't know. I think Asian-language users might give a salient answer.

 
 This isn't authoritative, but I don't think utf-16 is commonly used in
 Japan (except for calling Windows APIs).
 
 If you look at Mozilla the default Japanese encoding listed is
 Shift-JIS.  A lot of Japanese email still gets sent as ISO-2022-JP.
 Otherwise utf-8 I think.   A quick look at www.asahi.com shows they're
 using EUC-JP.  nicovideo.jp is using utf-8.  I seem to recall that my
 Japanese Visual Studio even saved files in Utf-8, or at least could be
 set to use utf-8.   In short, I think utf-8 is closer to being a
 widely accepted standard for documents over there than utf-16 is.
 
 --bb

That is true. UTF8 works well.
Now few person believe dream of fixed length UTF16. Surrogate Pairs must die.

We, maybe not only Japanese but all Asian users, also need converters between
UTFs and traditional local encoding.
Implementations are up to local users.

Nov 12 2009

D Programming

C/C++ Programming

Other

digitalmars.D - Semantics of toString