digitalmars.D - stdio performance in tango, stdlib, and perl
- Andrei Alexandrescu (See Website For Email) (49/49) Mar 21 2007 I've ran a couple of simple tests comparing Perl, D's stdlib (the coming...
-
Walter Bright
(3/5)
Mar 21 2007
Can you add a C++
to the mix? I think that would be a very - Andrei Alexandrescu (See Website For Email) (28/34) Mar 21 2007 Obliged. Darn, I had to wait a *lot* longer.
- Walter Bright (4/38) Mar 21 2007 This is awesomely bad. Although it's possible to get very fast code out
- Andrei Alexandrescu (See Website For Email) (4/46) Mar 21 2007 I don't know exactly what sync'ing does in C++, but probably it isn't
- Walter Bright (5/10) Mar 21 2007 I think it means bringing the iostream I/O buffer in to sync with the
- Andrei Alexandrescu (See Website For Email) (3/15) Mar 21 2007 Aha, so readln is better _and_ more compatible. Great!
- kris (3/26) Mar 21 2007 Out of interest, how does the currently shipping Phobos fare in this tes...
- Andrei Alexandrescu (See Website For Email) (4/32) Mar 21 2007 I don't have it anymore. Couldn't write a test anyway, because currently...
- James Dennett (27/70) Mar 21 2007 Try the way IOStreams would be used if you didn't want
-
torhu
(99/113)
Mar 21 2007
- torhu (6/25) Mar 22 2007 I've run some of the tests with more accurate timing. Andrei's Tango
- kris (7/35) Mar 22 2007 Just for jollies, a briefly optimized tango.io was tried also: it came
- Andrei Alexandrescu (See Website For Email) (7/46) Mar 22 2007 Is it compatible with C's stdio? IOW, would this sequence work?
- kris (39/95) Mar 22 2007 Nope. Tango is for D, not C. In order to make a arguably better library,...
- Andrei Alexandrescu (See Website For Email) (23/124) Mar 22 2007 That's not what my tests show on Linux, where Perl and readln beat Tango...
- kris (17/164) Mar 22 2007 Oh, come now. Yesterday Tango was the "fastest" on your machine, and
- Andrei Alexandrescu (See Website For Email) (16/33) Mar 22 2007 Probably it's a misunderstanding. Yesterday the Tango that did not
- Sean Kelly (7/20) Mar 22 2007 We're in the process of getting an automated nightly snapshot process
- Andrei Alexandrescu (See Website For Email) (7/13) Mar 22 2007 I think you'd make a lot of people happy. Several documented attempts of...
- Sean Kelly (8/13) Mar 22 2007 This page describes one way to use Tango and Phobos together:
- Andrei Alexandrescu (See Website For Email) (29/41) Mar 22 2007 Here's what worked for me. The script also allows compiling dmd programs...
- Sean Kelly (13/57) Mar 22 2007 This is intentional, though it may change later based on user feedback.
- Andrei Alexandrescu (See Website For Email) (5/31) Mar 22 2007 cat is not comparable. Besides, there must be some overhead associated
- torhu (22/48) Mar 22 2007 Couple of more results:
- Sean Kelly (10/51) Mar 22 2007 Oh good. I was hoping someone would test Tango without flushing every
- torhu (3/19) Mar 24 2007 Whoops, can anyone spot the bug? When I fixed it, the time it took to
- Frits van Bommel (2/22) Mar 24 2007 I'm guessing the fact that sizeof(buf) != 1000 ?
- torhu (3/9) Mar 25 2007 I think you were the first to post. Go buy yourself a lollipop, you've
- Sean Kelly (3/23) Mar 24 2007 The fgets(sizeof(buf)) looks like it could affect read performance a tad...
- Andrei Alexandrescu (See Website For Email) (9/81) Mar 22 2007 With your code pasted and wind from behind:
- James Dennett (14/98) Mar 22 2007 IOStreams is a terrible chunk of library design, and
- Andrei Alexandrescu (See Website For Email) (6/13) Mar 22 2007 Indeed. Then you'll be glad to hear that D will soon accommodate smarter...
- Roberto Mariottini (6/16) Mar 22 2007 The portable way to write a newline in C++ is to use the 'endl'
- torhu (5/10) Mar 22 2007 Unless a file is opened in binary mode, '\n' will be translated into
- Deewiant (3/6) Mar 22 2007 But I don't think this is the case in Tango, so Cout(line)("\n") should ...
- Andrei Alexandrescu (See Website For Email) (3/19) Mar 22 2007 Wrong. Newline translation will be correct on both systems.
- Roberto Mariottini (6/13) Mar 23 2007 It depends on how you open the file: 'endl' works even with files open
- James Dennett (13/29) Mar 23 2007 The difference between '\n' and std::endl in C++ is only
- kris (17/75) Mar 21 2007 There's a couple of things to look at here:
- Andrei Alexandrescu (See Website For Email) (23/44) Mar 21 2007 The test code assumed taking a look at each line before printing it, so
- kris (7/35) Mar 21 2007 Just suggesting that the scanning for [\r]\n patterns is likely a good
- Andrei Alexandrescu (See Website For Email) (40/80) Mar 21 2007 Well probably but must be tested. Newlines comprise about 3% of the file...
- kris (12/47) Mar 21 2007 Yeah, I can imagine. Module tango.io.Console at line 119 should have a
- Walter Bright (2/4) Mar 21 2007 The flush on newline should only be done if isatty() returns !=0.
- kris (3/11) Mar 21 2007 yep; if you were to submit a ticket for that, it would be appreciated :)
- Andrei Alexandrescu (See Website For Email) (17/36) Mar 21 2007 Why not? Programs using the standard input and output are ubiquitous,
- Derek Parnell (15/17) Mar 21 2007 Most programs I run that do lots of I/O only take seconds to run, so if
- Davidl (3/16) Mar 21 2007 u r working on database?
- Derek Parnell (10/13) Mar 21 2007 Yep. A light-weight, single-user D/B suitable for "home" applications.
- kris (10/22) Mar 21 2007 If tango were terribly terribly slow instead, then it would be cause for...
- Andrei Alexandrescu (See Website For Email) (12/41) Mar 21 2007 That's great, but by and large, the attitude that "this is the simple
- kris (10/50) Mar 21 2007 Oh, if there's any implication that Tango ought to be "faster" than it
- Andrei Alexandrescu (See Website For Email) (3/47) Mar 22 2007 Do it and let's test.
- kris (5/12) Mar 22 2007 you can try it right now with a Cout(line)("\n");
- Andrei Alexandrescu (See Website For Email) (12/26) Mar 22 2007 On my Linux box:
- Andrei Alexandrescu (See Website For Email) (4/48) Mar 22 2007 Oh, but I forgot it's cheating: uses read/write so it's incompatible
- kris (13/15) Mar 22 2007 How can it possibly be "cheating" when the code was in place before you
- Andrei Alexandrescu (See Website For Email) (10/28) Mar 22 2007 "Principle" I guess. That sounds great. My opinion in the matter is
- kris (11/47) Mar 22 2007 Yep. A thousand pardons for my late night spelling mistake. I'll be sure...
- Sean Kelly (5/13) Mar 22 2007 If I understand you correctly, you're saying that all IO packages must
- Andrei Alexandrescu (See Website For Email) (5/18) Mar 22 2007 I think for stdio, going through the standard C library would be very
- Walter Bright (22/35) Mar 21 2007 One problem with C++, as I mentioned before, is that the
- Andrei Alexandrescu (See Website For Email) (4/24) Mar 21 2007 You could tell from this and my (almost identical) post that Walter's
- kris (11/60) Mar 21 2007 tango.io is not even optimized for this case (unlike the new Phobos
- James Dennett (29/37) Mar 21 2007 This kind of simplistic bashing of a language or library
- Andrei Alexandrescu (See Website For Email) (11/52) Mar 22 2007 For the record, I used gcc 4.1.2 20060928 (prerelease) (Ubuntu
- Walter Bright (21/25) Mar 22 2007 Maybe it is a bit of frustration on my part. I often run into people
- James Dennett (25/55) Mar 22 2007 Good answer. (Yes, seriously.)
- Bill Baxter (7/16) Mar 23 2007 I think there is a tendency to assume that APIs and languages which have...
- Walter Bright (11/27) Mar 23 2007 D bucks conventional wisdom in more than one way. There's a current
- James Dennett (13/43) Mar 24 2007 I'm intrigued by your claim that IOStreams is not thread-safe;
- Andrei Alexandrescu (See Website For Email) (10/52) Mar 24 2007 cout << a << b;
- James Dennett (21/44) Mar 24 2007 As you appear to be saying that printf has to flush every
- Walter Bright (22/54) Mar 24 2007 In order for printf to work right it does not need to flush every time
- James Dennett (30/82) Mar 24 2007 That would be true, except that Andrei wrote that
- Walter Bright (12/48) Mar 24 2007 Ok, but since it is typical to do a flush on newline if isatty(), that
- Andrei Alexandrescu (See Website For Email) (8/41) Mar 24 2007 Lines don't have to appear at exact times, they only must not
- James Dennett (17/58) Mar 24 2007 With sufficiently short lines, where the value of
- Sean Kelly (11/18) Mar 24 2007 ...since they obviously don't have to consider thread-safety when
- Andrei Alexandrescu (See Website For Email) (8/26) Mar 24 2007 Good question(s). Might be also that I/O interface is considerably
- Sean Kelly (17/47) Mar 24 2007 The stream could acquire a lock and pass it to a proxy object which
- Walter Bright (13/26) Mar 24 2007 I disagree. It's been working fine for nearly 20 years now. gcc
- Sean Kelly (8/33) Mar 25 2007 True enough. Though I wonder how much of a factor it is that C++ has no...
- Andrei Alexandrescu (See Website For Email) (8/22) Mar 24 2007 Numbers clearly tell the above is wrong. Here's the thing: I write
- James Dennett (17/42) Mar 24 2007 Except that your test wasn't of the right thing; you
- Andrei Alexandrescu (See Website For Email) (4/41) Mar 24 2007 If you did, fine. I take that part of my argument back. I'll also note
- James Dennett (6/8) Mar 24 2007 Trying to defend IOStreams is certainly a challenge.
- Sean Kelly (12/67) Mar 24 2007 stringstream s;
- Walter Bright (19/26) Mar 24 2007 The trouble with that design is people working on subsystems or
- Andrei Alexandrescu (See Website For Email) (8/38) Mar 24 2007 MS does the same now if I remember correctly: all of its libraries are
- Sean Kelly (7/26) Mar 24 2007 Yup. In fact, I just discovered that Visual Studio 2005 doesn't even
- Walter Bright (15/41) Mar 24 2007 gc is a crutch for lazy/sloppy/less capable programmers, gc isn't for
- James Dennett (53/99) Mar 24 2007 I've seen only a minority of those claims made as part
- Walter Bright (28/94) Mar 24 2007 I think we're in agreement, as I said "one or two", and that such claims...
- 0ffh (13/17) Mar 25 2007 I admit I used to think similar to that, a somewhat longer while ago.
- Dan (6/20) Mar 26 2007 I totally agree that GC is a solid way of cutting bad code, which perfor...
- Derek Parnell (29/75) Mar 21 2007 And exactly how often do people need to write this program? I would have
- Andrei Alexandrescu (See Website For Email) (18/79) Mar 21 2007 Of course. It's not about reproducing the input exactly, but about
- Derek Parnell (24/64) Mar 21 2007 Actually you said "stdio also offers a readln() that creates a new line ...
- Andrei Alexandrescu (See Website For Email) (12/42) Mar 21 2007 Fine. It's just not clear what readln does from its signature. In
- Derek Parnell (13/36) Mar 21 2007
- Roberto Mariottini (20/63) Mar 22 2007 I suspect Walter was thinking on something else at the time.
- Andrei Alexandrescu (See Website For Email) (13/89) Mar 22 2007 Very simple. If the file ends with a newline, the code reproduces it. If...
- Roberto Mariottini (51/125) Mar 23 2007 It's not clearly evident for a non-expert programmer that a new-line is
- Vladimir Panteleev (10/19) Mar 22 2007 I'd just like to say that the chosen naming convention seems a bit unint...
- Daniel Keep (31/53) Mar 22 2007 I suppose it is a little, but I think that's more an issue with text IO
- Vladimir Panteleev (11/17) Mar 22 2007 I was actually talking about the complexity of the source, not the effic...
- Daniel Keep (13/31) Mar 22 2007 import std.string;
- =?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= (6/10) Mar 22 2007 Actually it is even four:
- Roberto Mariottini (5/17) Mar 22 2007 I have some of these also. Legacy applications are not the most, but=20
- Andrei Alexandrescu (See Website For Email) (16/38) Mar 22 2007 That's a mistake, simple as that. Pascal has made many other similar
- Vladimir Panteleev (7/12) Mar 22 2007 Ah, yes, missed that one.
- Roberto Mariottini (4/5) Mar 23 2007 Just add a call to chomp to your benchmarks.
- Walter Bright (518/520) Mar 21 2007 Here's the new std.stdio work in progress (doesn't yet include write())....
- Andrei Alexandrescu (See Website For Email) (21/41) Mar 21 2007 [snip]
- Roberto Mariottini (4/7) Mar 22 2007 Nooo!
- Andrei Alexandrescu (See Website For Email) (4/11) Mar 22 2007 Please justify your statements instead of using emotion, rhetoric, and
- Roberto Mariottini (4/6) Mar 22 2007 See my previous post.
- Derek Parnell (12/19) Mar 21 2007 LOL ... That is odd because in nearly every program I ever write that re...
- Sean Kelly (5/16) Mar 22 2007 For what it's worth, I created a Win32 version of the Unix 'time'
- Walter Bright (3/7) Mar 22 2007 Alternatively,
- Kristian Kilpi (13/20) Mar 22 2007 I =
- Walter Bright (1/1) Mar 22 2007 Thanks for the tip, that needs to be fixed.
- torhu (17/21) Mar 23 2007 Looks useful, my own tool just measures 'real' time. But it breaks when...
- Sean Kelly (5/33) Mar 23 2007 Hm, I suspect IO redirection must be a feature of the shell. It's a bit...
- torhu (4/7) Mar 23 2007 I get the same error. My own tool doesn't have such problems, but it
- Sean Kelly (3/11) Mar 23 2007 Yeah, mine uses CreateProcess and then GetProcessTimes. I'll give the
- Bill Baxter (4/21) Apr 20 2008 I was looking for something like this just the other day.
- Sean Kelly (4/24) Apr 21 2008 I switched web hosts and have yet to re-upload all my old content. I'll...
- Sean Kelly (4/26) Apr 21 2008 Okay, I've uploaded it here:
- Lars Ivar Igesund (30/32) Mar 22 2007 I have uploaded a snapshot with prebuilt libraries to
- Andrei Alexandrescu (See Website For Email) (5/33) Mar 22 2007 5.0s tcat
- Lars Ivar Igesund (13/49) Mar 23 2007 Maybe discuss first why stdio compatibility is needed? Is the equivalent
- Andrei Alexandrescu (See Website For Email) (7/15) Mar 23 2007 As long as the global "stdin" symbol is a FILE*, this would be highly
- Lars Ivar Igesund (8/23) Mar 23 2007 May I then suggest that you create a enhancement/wishlist ticket for thi...
- Davidl (5/35) Mar 22 2007 great job!
- Sean Kelly (3/8) Mar 22 2007 Tango is faster, at least for this particular test.
- Dave (3/15) Mar 22 2007 Which of course begs the question -- Could an overload be added so it do...
- Walter Bright (4/20) Mar 22 2007 Since the data has to be buffered anyway, might as well use stdio's
- Roberto Mariottini (65/133) Mar 27 2007 Hi,
- David B. Held (5/171) Mar 27 2007 Your "questions" hardly seem sincere. Were you not simply posturing for...
- Derek Parnell (35/163) Mar 27 2007 One of the small issues I have with 'readln' appending a newline
I've ran a couple of simple tests comparing Perl, D's stdlib (the coming release), and Tango. First, I realize I should make an account on dsource.org and post the following there, but I'll mention here that it's quite disappointing that Tango's idiomatic method of reading a line from the console (Cin.nextLine(line) unless I missed something) chose to chop the newline automatically. The Perl book spends half a page or so explaining why it's _good_ that the newline is included in the line, and I've been thankful for that on numerous occasions when writing Perl. Please put the newline back in the line. Anyhow, here's the code. The D up-and-coming stdio version: import std.stdio; void main() { char[] line; while (readln(line)) { write(line); } } The Tango version: import tango.io.Console; void main() { char[] line; while (Cin.nextLine(line)) { Cout(line).newline; } } (The .newline adds back the information that nextLine promptly lost, sigh.) I'm not sure whether this is the idiomatic way of reading and writing lines in Tango, but tango.io.Stdout seems to say so: "If you don't need formatted output or unicode translation, consider using the module tango.io.Console directly." - which suggests that Console would be the most primitive stdio library. The Perl version: while (<>) { print; } All programs operate in the same exact boring way: read a line from stdin, print it, lather, rinse, repeat. I passed a 31 MB text file (containing a dictionary that I'm using in my research) through each of the programs above. The output was set to /dev/null. I've ran the same program multiple times before the actual test, so everything is cached and the process becomes computationally-bound. Here are the results summed for 10 consecutive runs (averaged over 5 epochs): 13.9s Tango 6.6s Perl 5.0s std.stdio Andrei
Mar 21 2007
Andrei Alexandrescu (See Website For Email) wrote:I've ran a couple of simple tests comparing Perl, D's stdlib (the coming release), and Tango.Can you add a C++ <iostream> to the mix? I think that would be a very useful additional data point.
Mar 21 2007
Walter Bright wrote:Andrei Alexandrescu (See Website For Email) wrote:Obliged. Darn, I had to wait a *lot* longer. #include <string> #include <iostream> int main() { std::string s; while (getline(std::cin, s)) { std::cout << s << '\n'; } } (C++ makes the same mistake wrt newline.) 35.7s cppcat I seem to remember a trick that puts some more wind into iostream's sails, so I tried that as well: #include <string> #include <iostream> using namespace std; int main() { cin.sync_with_stdio(false); cout.sync_with_stdio(false); string s; while (getline(std::cin, s)) { cout << s << '\n'; } } Result: 13.3s cppcat AndreiI've ran a couple of simple tests comparing Perl, D's stdlib (the coming release), and Tango.Can you add a C++ <iostream> to the mix? I think that would be a very useful additional data point.
Mar 21 2007
Andrei Alexandrescu (See Website For Email) wrote:Obliged. Darn, I had to wait a *lot* longer. #include <string> #include <iostream> int main() { std::string s; while (getline(std::cin, s)) { std::cout << s << '\n'; } } (C++ makes the same mistake wrt newline.) 35.7s cppcatThis is awesomely bad. Although it's possible to get very fast code out of C++, it rarely seems to happen when you write straightforward code.I seem to remember a trick that puts some more wind into iostream's sails, so I tried that as well: #include <string> #include <iostream> using namespace std; int main() { cin.sync_with_stdio(false); cout.sync_with_stdio(false); string s; while (getline(std::cin, s)) { cout << s << '\n'; } } Result: 13.3s cppcatTurning off sync is cheating - D's readln does syncing.
Mar 21 2007
Walter Bright wrote:Andrei Alexandrescu (See Website For Email) wrote:I don't know exactly what sync'ing does in C++, but probably it isn't the locking that you are thinking of. AndreiObliged. Darn, I had to wait a *lot* longer. #include <string> #include <iostream> int main() { std::string s; while (getline(std::cin, s)) { std::cout << s << '\n'; } } (C++ makes the same mistake wrt newline.) 35.7s cppcatThis is awesomely bad. Although it's possible to get very fast code out of C++, it rarely seems to happen when you write straightforward code.I seem to remember a trick that puts some more wind into iostream's sails, so I tried that as well: #include <string> #include <iostream> using namespace std; int main() { cin.sync_with_stdio(false); cout.sync_with_stdio(false); string s; while (getline(std::cin, s)) { cout << s << '\n'; } } Result: 13.3s cppcatTurning off sync is cheating - D's readln does syncing.
Mar 21 2007
Andrei Alexandrescu (See Website For Email) wrote:Walter Bright wrote:I think it means bringing the iostream I/O buffer in to sync with the stdio I/O buffer, i.e. you can mix printf and iostream output and it will appear in the same order the calls happen in the code. D's readln is inherently synced in this manner.Turning off sync is cheating - D's readln does syncing.I don't know exactly what sync'ing does in C++, but probably it isn't the locking that you are thinking of.
Mar 21 2007
Walter Bright wrote:Andrei Alexandrescu (See Website For Email) wrote:Aha, so readln is better _and_ more compatible. Great! AndreiWalter Bright wrote:I think it means bringing the iostream I/O buffer in to sync with the stdio I/O buffer, i.e. you can mix printf and iostream output and it will appear in the same order the calls happen in the code. D's readln is inherently synced in this manner.Turning off sync is cheating - D's readln does syncing.I don't know exactly what sync'ing does in C++, but probably it isn't the locking that you are thinking of.
Mar 21 2007
Andrei Alexandrescu (See Website For Email) wrote: [snip](C++ makes the same mistake wrt newline.) 35.7s cppcat I seem to remember a trick that puts some more wind into iostream's sails, so I tried that as well: #include <string> #include <iostream> using namespace std; int main() { cin.sync_with_stdio(false); cout.sync_with_stdio(false); string s; while (getline(std::cin, s)) { cout << s << '\n'; } } Result: 13.3s cppcatOut of interest, how does the currently shipping Phobos fare in this test?
Mar 21 2007
kris wrote:Andrei Alexandrescu (See Website For Email) wrote: [snip]I don't have it anymore. Couldn't write a test anyway, because currently Phobos does not offer readln. Andrei(C++ makes the same mistake wrt newline.) 35.7s cppcat I seem to remember a trick that puts some more wind into iostream's sails, so I tried that as well: #include <string> #include <iostream> using namespace std; int main() { cin.sync_with_stdio(false); cout.sync_with_stdio(false); string s; while (getline(std::cin, s)) { cout << s << '\n'; } } Result: 13.3s cppcatOut of interest, how does the currently shipping Phobos fare in this test?
Mar 21 2007
Andrei Alexandrescu (See Website For Email) wrote:Walter Bright wrote:Try the way IOStreams would be used if you didn't want it to go slowly: #include <string> #include <iostream> int main() { std::ios_base::sync_with_stdio(false); std::cin.tie(NULL); std::string s; while (std::getline(std::cin, s)) { std::cout << s << '\n'; } } (Excuse the lack of a using directive there; I find the code more readable without them. YMMV.) I don't have your sample file or your machine, but for the quick tests I just ran on this one machine, the code above runs move than 60% faster. Without using tie(), each read from standard input causes a flush of standard output (so that, by default, they work appropriately for console I/O). It's certainly true that making efficient use of IOStreams needs some specific knowledge, and that writing an efficient implementation of IOStreams is far from trivial. But if we're comparing to C++, we should probably compare to some reasonably efficient idiomatic C++. -- JamesAndrei Alexandrescu (See Website For Email) wrote:Obliged. Darn, I had to wait a *lot* longer. #include <string> #include <iostream> int main() { std::string s; while (getline(std::cin, s)) { std::cout << s << '\n'; } } (C++ makes the same mistake wrt newline.) 35.7s cppcat I seem to remember a trick that puts some more wind into iostream's sails, so I tried that as well: #include <string> #include <iostream> using namespace std; int main() { cin.sync_with_stdio(false); cout.sync_with_stdio(false); string s; while (getline(std::cin, s)) { cout << s << '\n'; } } Result: 13.3s cppcatI've ran a couple of simple tests comparing Perl, D's stdlib (the coming release), and Tango.Can you add a C++ <iostream> to the mix? I think that would be a very useful additional data point.
Mar 21 2007
James Dennett wrote: <snip>Try the way IOStreams would be used if you didn't want it to go slowly: #include <string> #include <iostream> int main() { std::ios_base::sync_with_stdio(false); std::cin.tie(NULL); std::string s; while (std::getline(std::cin, s)) { std::cout << s << '\n'; } }<snip> I did some tests with a 58 MB file, containing one million lines. I'm on winxp. I ran each test a few times, timing them with a stopwatch. I threw in a naive C version, and a std.cstream version, just out of curiousity. It seems that using cin.tie(NULL) doesn't matter with msvc 7.1, but with mingw it does. Basically, Tango wins hands down on my system. Whether the Tango version flushes after each line or not, doesn't seem to matter much on Windows. Compiled with: dmd -O -release -inline gcc -O2 (mingw 3.4.2) cl /O2 /GX Fastest first: tango.io.Console, no flushing (Andrei's): ca 1.5s C, reusing buffer, gcc & msvc71: ca 3s James' C++, gcc: 3.5s Phobos std.cstream, reused buffer: 11s C w/malloc and free each line, msvc71: 23s Andrei's C++, gcc: 27s C w/malloc and free each line, gcc: 37s Andrei's C++, msvc71: 50s James' C++, msvc: 51s --- // Tango import tango.io.Console; void main() { char[] line; while (Cin.nextLine(line)) { //Cout(line).newline; Cout(line)("\n"); } } --- --- // Phobos std.cstream test import std.cstream; void main() { char[] buf = new char[1000]; char[] line; while (!din.eof()) { line = din.readLine(buf); dout.writeLine(line); } } --- --- /* C, reusing buffer */ #include <stdio.h> #include <stdlib.h> char buf[1000]; int main() { while (fgets(buf, sizeof(buf), stdin)) { fputs(buf, stdout); } return 0; } --- --- /* C test w/malloc and free */ #include <stdio.h> #include <stdlib.h> int main() { char *buf = malloc(1000); while (fgets(buf, sizeof(buf), stdin)) { fputs(buf, stdout); free(buf); buf = malloc(1000); } free(buf); return 0; --- --- // Andrei's #include <string> #include <iostream> int main() { std::string s; while (getline(std::cin, s)) { std::cout << s << '\n'; } return 0; } --- --- // James' #include <string> #include <iostream> int main() { std::ios_base::sync_with_stdio(false); std::cin.tie(NULL); std::string s; while (std::getline(std::cin, s)) { std::cout << s << '\n'; } } ---
Mar 21 2007
torhu wrote: <snip>Fastest first: tango.io.Console, no flushing (Andrei's): ca 1.5s C, reusing buffer, gcc & msvc71: ca 3s James' C++, gcc: 3.5s Phobos std.cstream, reused buffer: 11s C w/malloc and free each line, msvc71: 23s Andrei's C++, gcc: 27s C w/malloc and free each line, gcc: 37s Andrei's C++, msvc71: 50s James' C++, msvc: 51sI've run some of the tests with more accurate timing. Andrei's Tango code uses 0.9 seconds, with no flushing, and 1.6 seconds with flushing. I also tried cat itself, from the gnuwin32 project. cat clocks in at 1.3 seconds.
Mar 22 2007
torhu wrote:torhu wrote: <snip>Just for jollies, a briefly optimized tango.io was tried also: it came in at around 0.7 seconds. On a tripled file-size (3 million lines), that version is around 23% faster than bog-standard tango.io Thanks for giving it a whirl, tohru :) p.s. perhaps Andrei should be using tango for processing those vast files he has?Fastest first: tango.io.Console, no flushing (Andrei's): ca 1.5s C, reusing buffer, gcc & msvc71: ca 3s James' C++, gcc: 3.5s Phobos std.cstream, reused buffer: 11s C w/malloc and free each line, msvc71: 23s Andrei's C++, gcc: 27s C w/malloc and free each line, gcc: 37s Andrei's C++, msvc71: 50s James' C++, msvc: 51sI've run some of the tests with more accurate timing. Andrei's Tango code uses 0.9 seconds, with no flushing, and 1.6 seconds with flushing. I also tried cat itself, from the gnuwin32 project. cat clocks in at 1.3 seconds.
Mar 22 2007
kris wrote:torhu wrote:That's great news!torhu wrote: <snip>Just for jollies, a briefly optimized tango.io was tried also: it came in at around 0.7 seconds. On a tripled file-size (3 million lines), that version is around 23% faster than bog-standard tango.ioFastest first: tango.io.Console, no flushing (Andrei's): ca 1.5s C, reusing buffer, gcc & msvc71: ca 3s James' C++, gcc: 3.5s Phobos std.cstream, reused buffer: 11s C w/malloc and free each line, msvc71: 23s Andrei's C++, gcc: 27s C w/malloc and free each line, gcc: 37s Andrei's C++, msvc71: 50s James' C++, msvc: 51sI've run some of the tests with more accurate timing. Andrei's Tango code uses 0.9 seconds, with no flushing, and 1.6 seconds with flushing. I also tried cat itself, from the gnuwin32 project. cat clocks in at 1.3 seconds.Thanks for giving it a whirl, tohru :) p.s. perhaps Andrei should be using tango for processing those vast files he has?Is it compatible with C's stdio? IOW, would this sequence work? readln(line); int c = getchar(); Is 'c' the first character on the next line? Andrei
Mar 22 2007
Andrei Alexandrescu (See Website For Email) wrote:kris wrote:Nope. Tango is for D, not C. In order to make a arguably better library, one often has to step away from the norm. Both yourself and Walter have been saying "it needs to be fast and simple", and that's exactly what Tango is showing: for those who care deeply about such things, tango.io is shown to be around four times faster than the fastest C implementation tried (for Andrei's test under Win32), and a notable fourteen or fifteen times faster than the shipping phobos equivalent. If "interaction" between D & C on a shared, global file-handle becomes some kind of issue due to buffering (and only if) we'll cross that bridge at that point in time. I'm sure there's a number of solutions that don't involve restricting D to using a lowest common denominator approach. There's lots of smart people here who would be willing to help resolve that if necessary. The tango.io package is intended to be clean, extensible, simple, and a whole lot more coherent than certain others. We feel it meets those goals, and it happens to be quite efficient at the same time. Seems a bit like sour-grapes to start looking for "issues" with that intent, particularly when compared to an implementation that proclaims "It peeks under the hood of C's stdio implementation, meaning it's customized for Digital Mars' stdio, and gcc's stdio" ? Tango is not meant to be a phobos clone; it doesn't make the same claims as phobos and it doesn't follow the same rules as phobos. If you need phobos rules, then use phobos. If you don't like tango.io speed, extensibility and simplicity, without all the special cases of C IO, then use phobos. If you want both then, at some point, we'll consider figuring out how to make your C-oriented corner-cases work with tango.io Walter wrote: "One of my goals with D is to fix that - the straightforward, untuned code should get you most of the possible speed." Andrei wrote: "Just make the clear and simple code fastest. One thing I like about D is that it clearly strives to achieve best performance for simply-written code." That sentiment is very much what Tango itself is about. You began this thread by titling it "stdio and Tango IO performance" and noting the following: "has anyone verified that Tango's I/O performance is up to snuff? I see it imposes the dynamic-polymorphic approach, and unless there was some serious performance work going on, it's possible it's even slower than stdio. " Given the results shown above, I hope we can put that to rest at this time.torhu wrote:That's great news!torhu wrote: <snip>Just for jollies, a briefly optimized tango.io was tried also: it came in at around 0.7 seconds. On a tripled file-size (3 million lines), that version is around 23% faster than bog-standard tango.ioFastest first: tango.io.Console, no flushing (Andrei's): ca 1.5s C, reusing buffer, gcc & msvc71: ca 3s James' C++, gcc: 3.5s Phobos std.cstream, reused buffer: 11s C w/malloc and free each line, msvc71: 23s Andrei's C++, gcc: 27s C w/malloc and free each line, gcc: 37s Andrei's C++, msvc71: 50s James' C++, msvc: 51sI've run some of the tests with more accurate timing. Andrei's Tango code uses 0.9 seconds, with no flushing, and 1.6 seconds with flushing. I also tried cat itself, from the gnuwin32 project. cat clocks in at 1.3 seconds.Thanks for giving it a whirl, tohru :) p.s. perhaps Andrei should be using tango for processing those vast files he has?Is it compatible with C's stdio? IOW, would this sequence work? readln(line); int c = getchar(); Is 'c' the first character on the next line?
Mar 22 2007
kris wrote:Andrei Alexandrescu (See Website For Email) wrote:That's not what my tests show on Linux, where Perl and readln beat Tango by a large margin.kris wrote:Nope. Tango is for D, not C. In order to make a arguably better library, one often has to step away from the norm. Both yourself and Walter have been saying "it needs to be fast and simple", and that's exactly what Tango is showing: for those who care deeply about such things, tango.io is shown to be around four times faster than the fastest C implementation tried (for Andrei's test under Win32), and a notable fourteen or fifteen times faster than the shipping phobos equivalent.torhu wrote:That's great news!torhu wrote: <snip>Just for jollies, a briefly optimized tango.io was tried also: it came in at around 0.7 seconds. On a tripled file-size (3 million lines), that version is around 23% faster than bog-standard tango.ioFastest first: tango.io.Console, no flushing (Andrei's): ca 1.5s C, reusing buffer, gcc & msvc71: ca 3s James' C++, gcc: 3.5s Phobos std.cstream, reused buffer: 11s C w/malloc and free each line, msvc71: 23s Andrei's C++, gcc: 27s C w/malloc and free each line, gcc: 37s Andrei's C++, msvc71: 50s James' C++, msvc: 51sI've run some of the tests with more accurate timing. Andrei's Tango code uses 0.9 seconds, with no flushing, and 1.6 seconds with flushing. I also tried cat itself, from the gnuwin32 project. cat clocks in at 1.3 seconds.Thanks for giving it a whirl, tohru :) p.s. perhaps Andrei should be using tango for processing those vast files he has?Is it compatible with C's stdio? IOW, would this sequence work? readln(line); int c = getchar(); Is 'c' the first character on the next line?If "interaction" between D & C on a shared, global file-handle becomes some kind of issue due to buffering (and only if) we'll cross that bridge at that point in time. I'm sure there's a number of solutions that don't involve restricting D to using a lowest common denominator approach. There's lots of smart people here who would be willing to help resolve that if necessary.Exactly. What I argue for is not adding _gratuitous_ incompatibility. I'm seeing that using read instead of getline on Linux does not add any speed. They why not use getline and be done with it. Everybody would be happy.The tango.io package is intended to be clean, extensible, simple, and a whole lot more coherent than certain others. We feel it meets those goals, and it happens to be quite efficient at the same time. Seems a bit like sour-grapes to start looking for "issues" with that intent, particularly when compared to an implementation that proclaims "It peeks under the hood of C's stdio implementation, meaning it's customized for Digital Mars' stdio, and gcc's stdio" ?I'm not sure understand this. For all it's worth, there's no sour grapes in the mix. I *wanted* to switch to Tango to save me future aggravation.Tango is not meant to be a phobos clone; it doesn't make the same claims as phobos and it doesn't follow the same rules as phobos. If you need phobos rules, then use phobos. If you don't like tango.io speed, extensibility and simplicity, without all the special cases of C IO, then use phobos. If you want both then, at some point, we'll consider figuring out how to make your C-oriented corner-cases work with tango.ioThey aren't C-oriented. They are stream-oriented. It just so happens that the OS opens some streams and serves them to you in FILE* format. I have programs that read standard input and write to standard output. They are extremely easy to combine, parallelize, and run on a cluster. After switching form Perl to D for performance considerations, I was in a position of a net loss. Then I've been to hell and back figuring what the problem was and fixing it. Then I thought, hmmm, maybe I could have avoided all that by switching to Tango. So I tried Tango and it was again a net loss. Perl's I/O beats Tango's Cin.Walter wrote: "One of my goals with D is to fix that - the straightforward, untuned code should get you most of the possible speed." Andrei wrote: "Just make the clear and simple code fastest. One thing I like about D is that it clearly strives to achieve best performance for simply-written code." That sentiment is very much what Tango itself is about. You began this thread by titling it "stdio and Tango IO performance" and noting the following: "has anyone verified that Tango's I/O performance is up to snuff? I see it imposes the dynamic-polymorphic approach, and unless there was some serious performance work going on, it's possible it's even slower than stdio. " Given the results shown above, I hope we can put that to rest at this time.Of course you can, it's your library. You look at the results that please you most, I look at the results of my concrete application. I simply can't afford a 50%+ loss in I/O throughput, so I need to stay with Phobos. Why, I don't understand. Andrei
Mar 22 2007
Andrei Alexandrescu (See Website For Email) wrote:kris wrote:Oh, come now. Yesterday Tango was the "fastest" on your machine, and today it is not. And you're now claiming a 50% loss in throughput? I put it to you that you're not being very forthcoming in allowing for changes in tango.io to address this anomoly in your timings? Yesterday I pointed out where to make the change so that you could try tango without the automatic chomp; you didn't bother to do that. There is a change in SVN implementing your request, but you're not bothering to try that either. Instead, you appear to be using empty rhetoric and exaggeration to pit one library against another. That's hardly being helpful, Andrei. Tango has been shown to be very efficient on Win32, and there's no reason to assert that it can't be so on linux. We've seen that flush() is a no-no for linux, and that it has some impact on Win32 also. That can be rectified, as Walter kindly pointed out. If you're serious about giving Tango a shot, then give it some time for the different platform specifics to be addressed. Is that really too much to ask? Of a beta release?Andrei Alexandrescu (See Website For Email) wrote:That's not what my tests show on Linux, where Perl and readln beat Tango by a large margin.kris wrote:Nope. Tango is for D, not C. In order to make a arguably better library, one often has to step away from the norm. Both yourself and Walter have been saying "it needs to be fast and simple", and that's exactly what Tango is showing: for those who care deeply about such things, tango.io is shown to be around four times faster than the fastest C implementation tried (for Andrei's test under Win32), and a notable fourteen or fifteen times faster than the shipping phobos equivalent.torhu wrote:That's great news!torhu wrote: <snip>Just for jollies, a briefly optimized tango.io was tried also: it came in at around 0.7 seconds. On a tripled file-size (3 million lines), that version is around 23% faster than bog-standard tango.ioFastest first: tango.io.Console, no flushing (Andrei's): ca 1.5s C, reusing buffer, gcc & msvc71: ca 3s James' C++, gcc: 3.5s Phobos std.cstream, reused buffer: 11s C w/malloc and free each line, msvc71: 23s Andrei's C++, gcc: 27s C w/malloc and free each line, gcc: 37s Andrei's C++, msvc71: 50s James' C++, msvc: 51sI've run some of the tests with more accurate timing. Andrei's Tango code uses 0.9 seconds, with no flushing, and 1.6 seconds with flushing. I also tried cat itself, from the gnuwin32 project. cat clocks in at 1.3 seconds.Thanks for giving it a whirl, tohru :) p.s. perhaps Andrei should be using tango for processing those vast files he has?Is it compatible with C's stdio? IOW, would this sequence work? readln(line); int c = getchar(); Is 'c' the first character on the next line?If "interaction" between D & C on a shared, global file-handle becomes some kind of issue due to buffering (and only if) we'll cross that bridge at that point in time. I'm sure there's a number of solutions that don't involve restricting D to using a lowest common denominator approach. There's lots of smart people here who would be willing to help resolve that if necessary.Exactly. What I argue for is not adding _gratuitous_ incompatibility. I'm seeing that using read instead of getline on Linux does not add any speed. They why not use getline and be done with it. Everybody would be happy.The tango.io package is intended to be clean, extensible, simple, and a whole lot more coherent than certain others. We feel it meets those goals, and it happens to be quite efficient at the same time. Seems a bit like sour-grapes to start looking for "issues" with that intent, particularly when compared to an implementation that proclaims "It peeks under the hood of C's stdio implementation, meaning it's customized for Digital Mars' stdio, and gcc's stdio" ?I'm not sure understand this. For all it's worth, there's no sour grapes in the mix. I *wanted* to switch to Tango to save me future aggravation.Tango is not meant to be a phobos clone; it doesn't make the same claims as phobos and it doesn't follow the same rules as phobos. If you need phobos rules, then use phobos. If you don't like tango.io speed, extensibility and simplicity, without all the special cases of C IO, then use phobos. If you want both then, at some point, we'll consider figuring out how to make your C-oriented corner-cases work with tango.ioThey aren't C-oriented. They are stream-oriented. It just so happens that the OS opens some streams and serves them to you in FILE* format. I have programs that read standard input and write to standard output. They are extremely easy to combine, parallelize, and run on a cluster. After switching form Perl to D for performance considerations, I was in a position of a net loss. Then I've been to hell and back figuring what the problem was and fixing it. Then I thought, hmmm, maybe I could have avoided all that by switching to Tango. So I tried Tango and it was again a net loss. Perl's I/O beats Tango's Cin.Walter wrote: "One of my goals with D is to fix that - the straightforward, untuned code should get you most of the possible speed." Andrei wrote: "Just make the clear and simple code fastest. One thing I like about D is that it clearly strives to achieve best performance for simply-written code." That sentiment is very much what Tango itself is about. You began this thread by titling it "stdio and Tango IO performance" and noting the following: "has anyone verified that Tango's I/O performance is up to snuff? I see it imposes the dynamic-polymorphic approach, and unless there was some serious performance work going on, it's possible it's even slower than stdio. " Given the results shown above, I hope we can put that to rest at this time.Of course you can, it's your library. You look at the results that please you most, I look at the results of my concrete application. I simply can't afford a 50%+ loss in I/O throughput, so I need to stay with Phobos. Why, I don't understand.
Mar 22 2007
kris wrote:Oh, come now. Yesterday Tango was the "fastest" on your machine, and today it is not. And you're now claiming a 50% loss in throughput?Probably it's a misunderstanding. Yesterday the Tango that did not output the newlines was fastest. I don't have Tango code to test a version that reads lines including the newline, so I tried the Cout(line)("\n") thing, which was slow. I'd be of course happy to use something that is faster, no matter where it comes from.I put it to you that you're not being very forthcoming in allowing for changes in tango.io to address this anomoly in your timings? Yesterday I pointed out where to make the change so that you could try tango without the automatic chomp; you didn't bother to do that. There is a change in SVN implementing your request, but you're not bothering to try that either.It's not that I didn't bother; just getting my app to link with Tango was hard for me, so recompiling and rebuilding libtango.a was likely to take me a long time. Furthermore, I don't have svn installed nor admin access on the cluster I work on. If you put a libtango.a somewhere to be found with http or ftp, I'd be glad to download it.Instead, you appear to be using empty rhetoric and exaggeration to pit one library against another. That's hardly being helpful, Andrei. Tango has been shown to be very efficient on Win32, and there's no reason to assert that it can't be so on linux. We've seen that flush() is a no-no for linux, and that it has some impact on Win32 also. That can be rectified, as Walter kindly pointed out. If you're serious about giving Tango a shot, then give it some time for the different platform specifics to be addressed. Is that really too much to ask? Of a beta release?Of course this is great news. There's only one guy using rhetoric in this thread, and that's not me :o). Andrei
Mar 22 2007
Andrei Alexandrescu (See Website For Email) wrote:kris wrote:We're in the process of getting an automated nightly snapshot process set up. The scripts are actually written, and we're sorting out hosting and such. I'm sure someone would be willing to put one online somewhere in the interim. I'll do it myself if I can track down the Linux build scripts. SeanI put it to you that you're not being very forthcoming in allowing for changes in tango.io to address this anomoly in your timings? Yesterday I pointed out where to make the change so that you could try tango without the automatic chomp; you didn't bother to do that. There is a change in SVN implementing your request, but you're not bothering to try that either.It's not that I didn't bother; just getting my app to link with Tango was hard for me, so recompiling and rebuilding libtango.a was likely to take me a long time. Furthermore, I don't have svn installed nor admin access on the cluster I work on.
Mar 22 2007
kris wrote:Tango is not meant to be a phobos clone; it doesn't make the same claims as phobos and it doesn't follow the same rules as phobos. If you need phobos rules, then use phobos. If you don't like tango.io speed, extensibility and simplicity, without all the special cases of C IO, then use phobos. If you want both then, at some point, we'll consider figuring out how to make your C-oriented corner-cases work with tango.ioI think you'd make a lot of people happy. Several documented attempts of installing Tango failed for me, so in the end I figured some way to get programs to compile with a special command line and a modification of dmd.conf. I need to modify dmd.conf whenever I switch between Phobos programs and Tango programs. Andrei
Mar 22 2007
Andrei Alexandrescu (See Website For Email) wrote:Several documented attempts of installing Tango failed for me, so in the end I figured some way to get programs to compile with a special command line and a modification of dmd.conf. I need to modify dmd.conf whenever I switch between Phobos programs and Tango programs.This page describes one way to use Tango and Phobos together: http://www.dsource.org/projects/tango/wiki/PhobosTangoCooperation It's Win32-oriented, but the approach should be essentially the same for Linux. One issue with install instructions is that the installation procedure is in flux as we try to simplify/automate it, and some of the documentation is lagging behind. Sean
Mar 22 2007
Sean Kelly wrote:Andrei Alexandrescu (See Website For Email) wrote:Here's what worked for me. The script also allows compiling dmd programs on the fly. For some reason I needed to include libtango.a in the DFLAGS variable. ----------------------------------------- D_BIN=$(dirname $(which dmd)) WHICH=$1 if [ "$WHICH" = "phobos" ]; then DFLAGS="-I$D_BIN/../src/phobos -L-L$D_BIN/../lib -L-L$D_BIN/../../dm/lib" elif [ "$WHICH" = "tango" ]; then DFLAGS="-I$D_BIN/../../tango-0.96-bin -version=Tango -version=Posix" DFLAGS="$DFLAGS -L-L$D_BIN/../../tango-0.96-bin/lib libtango.a" else echo "Please pass either phobos or tango as the first argument" WHICH="" fi if [ ! -z "$WHICH" ]; then shift if [ "$*" != "" ]; then dmd $* else export DFLAGS echo "dmd configured for $WHICH" fi fi ----------------------------------------- AndreiSeveral documented attempts of installing Tango failed for me, so in the end I figured some way to get programs to compile with a special command line and a modification of dmd.conf. I need to modify dmd.conf whenever I switch between Phobos programs and Tango programs.This page describes one way to use Tango and Phobos together: http://www.dsource.org/projects/tango/wiki/PhobosTangoCooperation It's Win32-oriented, but the approach should be essentially the same for Linux. One issue with install instructions is that the installation procedure is in flux as we try to simplify/automate it, and some of the documentation is lagging behind.
Mar 22 2007
Andrei Alexandrescu (See Website For Email) wrote:Sean Kelly wrote:This is intentional, though it may change later based on user feedback. That said, my personal belief is that only the compiler runtime code should be implicitly linked, and the rest should be linked via DFLAGS or by some other means. In Tango parlance, this would mean implicitly linking the compiler runtime (libdmd.a), but not the GC code, the Tango runtime, or Tango user code. This is currently quite possible--it just isn't the default configuration because it's unnecessarily complex for most users. For those who are interested however, the process is outlined here: http://www.dsource.org/projects/tango/wiki/TopicAdvancedConfigurationAndrei Alexandrescu (See Website For Email) wrote:Here's what worked for me. The script also allows compiling dmd programs on the fly. For some reason I needed to include libtango.a in the DFLAGS variable.Several documented attempts of installing Tango failed for me, so in the end I figured some way to get programs to compile with a special command line and a modification of dmd.conf. I need to modify dmd.conf whenever I switch between Phobos programs and Tango programs.This page describes one way to use Tango and Phobos together: http://www.dsource.org/projects/tango/wiki/PhobosTangoCooperation It's Win32-oriented, but the approach should be essentially the same for Linux. One issue with install instructions is that the installation procedure is in flux as we try to simplify/automate it, and some of the documentation is lagging behind.----------------------------------------- D_BIN=$(dirname $(which dmd)) WHICH=$1 if [ "$WHICH" = "phobos" ]; then DFLAGS="-I$D_BIN/../src/phobos -L-L$D_BIN/../lib -L-L$D_BIN/../../dm/lib" elif [ "$WHICH" = "tango" ]; then DFLAGS="-I$D_BIN/../../tango-0.96-bin -version=Tango -version=Posix" DFLAGS="$DFLAGS -L-L$D_BIN/../../tango-0.96-bin/lib libtango.a" else echo "Please pass either phobos or tango as the first argument" WHICH="" fi if [ ! -z "$WHICH" ]; then shift if [ "$*" != "" ]; then dmd $* else export DFLAGS echo "dmd configured for $WHICH" fi fi -----------------------------------------Thanks. I'll look this over and see about adding it to the wiki. Sean
Mar 22 2007
torhu wrote:torhu wrote: <snip>cat is not comparable. Besides, there must be some overhead associated with that cat, because Linux' cat consistently clocks way faster than all line-oriented tests. AndreiFastest first: tango.io.Console, no flushing (Andrei's): ca 1.5s C, reusing buffer, gcc & msvc71: ca 3s James' C++, gcc: 3.5s Phobos std.cstream, reused buffer: 11s C w/malloc and free each line, msvc71: 23s Andrei's C++, gcc: 27s C w/malloc and free each line, gcc: 37s Andrei's C++, msvc71: 50s James' C++, msvc: 51sI've run some of the tests with more accurate timing. Andrei's Tango code uses 0.9 seconds, with no flushing, and 1.6 seconds with flushing. I also tried cat itself, from the gnuwin32 project. cat clocks in at 1.3 seconds.
Mar 22 2007
torhu wrote:torhu wrote: <snip>Couple of more results: ActiveState Perl 5.8.8: 3.8s. Python 2.5: 3.6s. cat.py: --- import sys sys.stdout.writelines(sys.stdin.xreadlines()) #sys.stdout.writelines(do_stuff_with_each_line(sys.stdin.xreadlines())) #sys.stdout.writelines(do_stuff_with_each_line(s) for s in sys.stdin) --- cat.pl: --- while (<>) { print; } --- I guess that's enough benchmarking for now.Fastest first: tango.io.Console, no flushing (Andrei's): ca 1.5s C, reusing buffer, gcc & msvc71: ca 3s James' C++, gcc: 3.5s Phobos std.cstream, reused buffer: 11s C w/malloc and free each line, msvc71: 23s Andrei's C++, gcc: 27s C w/malloc and free each line, gcc: 37s Andrei's C++, msvc71: 50s James' C++, msvc: 51sI've run some of the tests with more accurate timing. Andrei's Tango code uses 0.9 seconds, with no flushing, and 1.6 seconds with flushing. I also tried cat itself, from the gnuwin32 project. cat clocks in at 1.3 seconds.
Mar 22 2007
torhu wrote:James Dennett wrote: <snip>...Try the way IOStreams would be used if you didn't want it to go slowly: #include <string> #include <iostream> int main() { std::ios_base::sync_with_stdio(false); std::cin.tie(NULL); std::string s; while (std::getline(std::cin, s)) { std::cout << s << '\n'; } }<snip> I did some tests with a 58 MB file, containing one million lines. I'm on winxp. I ran each test a few times, timing them with a stopwatch. I threw in a naive C version, and a std.cstream version, just out of curiousity. It seems that using cin.tie(NULL) doesn't matter with msvc 7.1, but with mingw it does. Basically, Tango wins hands down on my system. Whether the Tango version flushes after each line or not, doesn't seem to matter much on Windows.--- // Tango import tango.io.Console; void main() { char[] line; while (Cin.nextLine(line)) { //Cout(line).newline; Cout(line)("\n"); } } ---Oh good. I was hoping someone would test Tango without flushing every line :-) Basically, Tango's 'newline' method is equivalent to C++'s 'endl' mutator function. It should not be used for every carriage return in normal output for performance-critical applications. Rather, it should be used as the trailing newline after writing a block of data that should be displayed immediately ('flush' is another option if no newline is desired). Sean
Mar 22 2007
torhu wrote:--- /* C test w/malloc and free */ #include <stdio.h> #include <stdlib.h> int main() { char *buf = malloc(1000); while (fgets(buf, sizeof(buf), stdin)) { fputs(buf, stdout); free(buf); buf = malloc(1000); } free(buf); return 0; ---Whoops, can anyone spot the bug? When I fixed it, the time it took to run my test went down from about 23 to about 3 seconds.
Mar 24 2007
torhu wrote:torhu wrote:I'm guessing the fact that sizeof(buf) != 1000 ?--- /* C test w/malloc and free */ #include <stdio.h> #include <stdlib.h> int main() { char *buf = malloc(1000); while (fgets(buf, sizeof(buf), stdin)) { fputs(buf, stdout); free(buf); buf = malloc(1000); } free(buf); return 0; ---Whoops, can anyone spot the bug? When I fixed it, the time it took to run my test went down from about 23 to about 3 seconds.
Mar 24 2007
Frits van Bommel wrote:torhu wrote:I think you were the first to post. Go buy yourself a lollipop, you've earned it.Whoops, can anyone spot the bug? When I fixed it, the time it took to run my test went down from about 23 to about 3 seconds.I'm guessing the fact that sizeof(buf) != 1000 ?
Mar 25 2007
torhu wrote:torhu wrote:The fgets(sizeof(buf)) looks like it could affect read performance a tad :-) Sean--- /* C test w/malloc and free */ #include <stdio.h> #include <stdlib.h> int main() { char *buf = malloc(1000); while (fgets(buf, sizeof(buf), stdin)) { fputs(buf, stdout); free(buf); buf = malloc(1000); } free(buf); return 0; ---Whoops, can anyone spot the bug? When I fixed it, the time it took to run my test went down from about 23 to about 3 seconds.
Mar 24 2007
James Dennett wrote:Andrei Alexandrescu (See Website For Email) wrote:With your code pasted and wind from behind: 13.5s cppcatWalter Bright wrote:Try the way IOStreams would be used if you didn't want it to go slowly: #include <string> #include <iostream> int main() { std::ios_base::sync_with_stdio(false); std::cin.tie(NULL); std::string s; while (std::getline(std::cin, s)) { std::cout << s << '\n'; } } (Excuse the lack of a using directive there; I find the code more readable without them. YMMV.)Andrei Alexandrescu (See Website For Email) wrote:Obliged. Darn, I had to wait a *lot* longer. #include <string> #include <iostream> int main() { std::string s; while (getline(std::cin, s)) { std::cout << s << '\n'; } } (C++ makes the same mistake wrt newline.) 35.7s cppcat I seem to remember a trick that puts some more wind into iostream's sails, so I tried that as well: #include <string> #include <iostream> using namespace std; int main() { cin.sync_with_stdio(false); cout.sync_with_stdio(false); string s; while (getline(std::cin, s)) { cout << s << '\n'; } } Result: 13.3s cppcatI've ran a couple of simple tests comparing Perl, D's stdlib (the coming release), and Tango.Can you add a C++ <iostream> to the mix? I think that would be a very useful additional data point.I don't have your sample file or your machine, but for the quick tests I just ran on this one machine, the code above runs move than 60% faster. Without using tie(), each read from standard input causes a flush of standard output (so that, by default, they work appropriately for console I/O). It's certainly true that making efficient use of IOStreams needs some specific knowledge, and that writing an efficient implementation of IOStreams is far from trivial. But if we're comparing to C++, we should probably compare to some reasonably efficient idiomatic C++.The sync_with_stdio and tie tricks are already unknown to most programmers, so it would be an uphill battle to characterize them as idiomatic. They are idiomatic for a small group at best. But, obviously not enough. Perl does way better. (Again: gcc on Linux.) Andrei
Mar 22 2007
Andrei Alexandrescu (See Website For Email) wrote:James Dennett wrote:Blasted weather. Never a hurricane when you need one.Andrei Alexandrescu (See Website For Email) wrote:With your code pasted and wind from behind: 13.5s cppcatWalter Bright wrote:Try the way IOStreams would be used if you didn't want it to go slowly: #include <string> #include <iostream> int main() { std::ios_base::sync_with_stdio(false); std::cin.tie(NULL); std::string s; while (std::getline(std::cin, s)) { std::cout << s << '\n'; } } (Excuse the lack of a using directive there; I find the code more readable without them. YMMV.)Andrei Alexandrescu (See Website For Email) wrote:Obliged. Darn, I had to wait a *lot* longer. #include <string> #include <iostream> int main() { std::string s; while (getline(std::cin, s)) { std::cout << s << '\n'; } } (C++ makes the same mistake wrt newline.) 35.7s cppcat I seem to remember a trick that puts some more wind into iostream's sails, so I tried that as well: #include <string> #include <iostream> using namespace std; int main() { cin.sync_with_stdio(false); cout.sync_with_stdio(false); string s; while (getline(std::cin, s)) { cout << s << '\n'; } } Result: 13.3s cppcatI've ran a couple of simple tests comparing Perl, D's stdlib (the coming release), and Tango.Can you add a C++ <iostream> to the mix? I think that would be a very useful additional data point.IOStreams is a terrible chunk of library design, and its effective use is fiendishly difficult even for fairly trivial tasks. I've implemented large chunks of the C++ standard library, but IOStreams scares me.I don't have your sample file or your machine, but for the quick tests I just ran on this one machine, the code above runs move than 60% faster. Without using tie(), each read from standard input causes a flush of standard output (so that, by default, they work appropriately for console I/O). It's certainly true that making efficient use of IOStreams needs some specific knowledge, and that writing an efficient implementation of IOStreams is far from trivial. But if we're comparing to C++, we should probably compare to some reasonably efficient idiomatic C++.The sync_with_stdio and tie tricks are already unknown to most programmers, so it would be an uphill battle to characterize them as idiomatic. They are idiomatic for a small group at best.But, obviously not enough. Perl does way better. (Again: gcc on Linux.)Most of the time I do large text processing jobs in Perl or inside a database; once in a while I use C++, primarily if I need to do trickier calculations. No good reason D shouldn't be able to handle the jobs I use C++ for in this area (though I'd have to get D working on Solaris, and 64-bit support would probably be necessary). -- James
Mar 22 2007
James Dennett wrote: [snip]Most of the time I do large text processing jobs in Perl or inside a database; once in a while I use C++, primarily if I need to do trickier calculations. No good reason D shouldn't be able to handle the jobs I use C++ for in this area (though I'd have to get D working on Solaris, and 64-bit support would probably be necessary).Indeed. Then you'll be glad to hear that D will soon accommodate smarter string literals and probably here-documents, all with interpolation, which should make scripting jobs a snap. Andrei
Mar 22 2007
Andrei Alexandrescu (See Website For Email) wrote: [...]#include <string> #include <iostream> int main() { std::string s; while (getline(std::cin, s)) { std::cout << s << '\n'; } }The portable way to write a newline in C++ is to use the 'endl' modifier. Your program is not portable, on Windows it will generate Unix text files. Ciao
Mar 22 2007
Roberto Mariottini wrote: <snip>The portable way to write a newline in C++ is to use the 'endl' modifier. Your program is not portable, on Windows it will generate Unix text files. CiaoUnless a file is opened in binary mode, '\n' will be translated into '\r\n' on Windows. And stdin, stdout, stderr is by default in ascii (not binary) mode.
Mar 22 2007
torhu wrote:Unless a file is opened in binary mode, '\n' will be translated into '\r\n' on Windows. And stdin, stdout, stderr is by default in ascii (not binary) mode.But I don't think this is the case in Tango, so Cout(line)("\n") should also be changed for the benchmarks.
Mar 22 2007
Deewiant wrote:torhu wrote:At the behest of andrei, Cin line-parsing now has an option to include the incoming line-terminator. That makes the "\n" somewhat redundant?Unless a file is opened in binary mode, '\n' will be translated into '\r\n' on Windows. And stdin, stdout, stderr is by default in ascii (not binary) mode.But I don't think this is the case in Tango, so Cout(line)("\n") should also be changed for the benchmarks.
Mar 22 2007
kris wrote:Deewiant wrote:Only if you've got the latest SVN revision of Tango. If not, use tango.io.FileConst.NewlineString (side note: for easier access, perhaps Print.Eol should be public and assigned to this) in place of "\n".torhu wrote:At the behest of andrei, Cin line-parsing now has an option to include the incoming line-terminator. That makes the "\n" somewhat redundant?Unless a file is opened in binary mode, '\n' will be translated into '\r\n' on Windows. And stdin, stdout, stderr is by default in ascii (not binary) mode.But I don't think this is the case in Tango, so Cout(line)("\n") should also be changed for the benchmarks.
Mar 22 2007
Roberto Mariottini wrote:Andrei Alexandrescu (See Website For Email) wrote: [...]Wrong. Newline translation will be correct on both systems. Andrei#include <string> #include <iostream> int main() { std::string s; while (getline(std::cin, s)) { std::cout << s << '\n'; } }The portable way to write a newline in C++ is to use the 'endl' modifier. Your program is not portable, on Windows it will generate Unix text files.
Mar 22 2007
Andrei Alexandrescu (See Website For Email) wrote:Roberto Mariottini wrote:It depends on how you open the file: 'endl' works even with files open in binary mode (the default on most platforms, the default on the average programmer). Or else, say that 'endl' is yet another design error in C++. CiaoThe portable way to write a newline in C++ is to use the 'endl' modifier. Your program is not portable, on Windows it will generate Unix text files.Wrong. Newline translation will be correct on both systems.
Mar 23 2007
Roberto Mariottini wrote:Andrei Alexandrescu (See Website For Email) wrote:The difference between '\n' and std::endl in C++ is only that std::endl flushes the stream after writing a newline (well, and uses widen to convert to the character type of the stream, but binary mode makes no difference to that, it's a property of the template parameters of the stream type to which you are writing). C++ doesn't default to binary mode, though on many platforms that's of academic concern only as there is no distinction between text and binary modes. And this is somewhat off-topic for d.D, I think, except in that we'd like D's IO to be better than C++'s. -- JamesRoberto Mariottini wrote:It depends on how you open the file: 'endl' works even with files open in binary mode (the default on most platforms, the default on the average programmer). Or else, say that 'endl' is yet another design error in C++. CiaoThe portable way to write a newline in C++ is to use the 'endl' modifier. Your program is not portable, on Windows it will generate Unix text files.Wrong. Newline translation will be correct on both systems.
Mar 23 2007
Andrei Alexandrescu (See Website For Email) wrote:I've ran a couple of simple tests comparing Perl, D's stdlib (the coming release), and Tango. First, I realize I should make an account on dsource.org and post the following there, but I'll mention here that it's quite disappointing that Tango's idiomatic method of reading a line from the console (Cin.nextLine(line) unless I missed something) chose to chop the newline automatically. The Perl book spends half a page or so explaining why it's _good_ that the newline is included in the line, and I've been thankful for that on numerous occasions when writing Perl. Please put the newline back in the line. Anyhow, here's the code. The D up-and-coming stdio version: import std.stdio; void main() { char[] line; while (readln(line)) { write(line); } } The Tango version: import tango.io.Console; void main() { char[] line; while (Cin.nextLine(line)) { Cout(line).newline; } } (The .newline adds back the information that nextLine promptly lost, sigh.) I'm not sure whether this is the idiomatic way of reading and writing lines in Tango, but tango.io.Stdout seems to say so: "If you don't need formatted output or unicode translation, consider using the module tango.io.Console directly." - which suggests that Console would be the most primitive stdio library. The Perl version: while (<>) { print; } All programs operate in the same exact boring way: read a line from stdin, print it, lather, rinse, repeat. I passed a 31 MB text file (containing a dictionary that I'm using in my research) through each of the programs above. The output was set to /dev/null. I've ran the same program multiple times before the actual test, so everything is cached and the process becomes computationally-bound. Here are the results summed for 10 consecutive runs (averaged over 5 epochs): 13.9s Tango 6.6s Perl 5.0s std.stdioThere's a couple of things to look at here: 1) if there's an idiom in tango.io, it would be rewriting the example like this: Cout.conduit.copy (Cin.conduit) 2) the output.newline on each line will cause a flush ~ this may or may not have something to do with it 3) the test would appear to be stressing the parsing of lines just as much (if not more) than the io system itself. All part-and-parcel to a degree, but it may be worth investigating In order to track this down, we'd be interested to see the results of: a) Cout.conduit.copy (Cin.conduit); b) foregoing the output .newline, purely as an experiment c) on Linux, tango.io uses the c-lib posix.read/write functions. Is that what phobos uses also? (on Win32, Tango uses direct Win32 calls instead) Just a head's up: Console is not the lowest IO level. It wraps both a streaming-buffer and console idioms around the raw IO. Raw IO in tango is based around two virtual methods: read(void[]) and write(void[])
Mar 21 2007
kris wrote:Andrei Alexandrescu (See Website For Email) wrote:The test code assumed taking a look at each line before printing it, so speed of line reading and writing was deemed as important, not speed of raw I/O, which we all know how to get.13.9s Tango 6.6s Perl 5.0s std.stdioThere's a couple of things to look at here: 1) if there's an idiom in tango.io, it would be rewriting the example like this: Cout.conduit.copy (Cin.conduit)2) the output.newline on each line will cause a flush ~ this may or may not have something to do with itProbably.3) the test would appear to be stressing the parsing of lines just as much (if not more) than the io system itself. All part-and-parcel to a degree, but it may be worth investigatingI don't understand this.In order to track this down, we'd be interested to see the results of: a) Cout.conduit.copy (Cin.conduit);The program wouldn't be comparable with the others.b) foregoing the output .newline, purely as an experiment4.7s tcatc) on Linux, tango.io uses the c-lib posix.read/write functions. Is that what phobos uses also? (on Win32, Tango uses direct Win32 calls instead)Then probably that could be filed as a bug in Tango. The nextLine function should lock the file only once, thus giving each thread an entire line, not a portion of a line. Also, using block-oriented read for reading lines makes Tango incompatible with standard C usage (Tango might read more than one line into its buffers; if a C-level function tries to read from the file, it will be too late). Unfortunately there's no a public API for such stuff so system-specific approaches must be taken. readln on Linux uses Gnu's getline(), which locks the file only once per line. See: http://www.gnu.org/software/libc/manual/html_node/Line-Input.html Unfortunately there's one extra copy going on - from the mallocated buffer into D's gc'd array. That copy could be optimized away by using Gnu's malloc hooks: http://www.gnu.org/software/libc/manual/html_node/Hooks-for-Malloc.html Andrei
Mar 21 2007
Andrei Alexandrescu (See Website For Email) wrote:kris wrote:Yep, just trying to isolate thingsAndrei Alexandrescu (See Website For Email) wrote:The test code assumed taking a look at each line before printing it, so speed of line reading and writing was deemed as important, not speed of raw I/O, which we all know how to get.13.9s Tango 6.6s Perl 5.0s std.stdioThere's a couple of things to look at here: 1) if there's an idiom in tango.io, it would be rewriting the example like this: Cout.conduit.copy (Cin.conduit)Just suggesting that the scanning for [\r]\n patterns is likely a good chunk of the CPU time3) the test would appear to be stressing the parsing of lines just as much (if not more) than the io system itself. All part-and-parcel to a degree, but it may be worth investigatingI don't understand this.Thanks. If tango.io were to retain CR on readln, then it would come out ahead of everything else in this particular test Can you distill the benefits of retaining CR on a readline, please?b) foregoing the output .newline, purely as an experiment4.7s tcat
Mar 21 2007
kris wrote:Andrei Alexandrescu (See Website For Email) wrote:Well probably but must be tested. Newlines comprise about 3% of the file size.kris wrote:Yep, just trying to isolate thingsAndrei Alexandrescu (See Website For Email) wrote:The test code assumed taking a look at each line before printing it, so speed of line reading and writing was deemed as important, not speed of raw I/O, which we all know how to get.13.9s Tango 6.6s Perl 5.0s std.stdioThere's a couple of things to look at here: 1) if there's an idiom in tango.io, it would be rewriting the example like this: Cout.conduit.copy (Cin.conduit)Just suggesting that the scanning for [\r]\n patterns is likely a good chunk of the CPU time3) the test would appear to be stressing the parsing of lines just as much (if not more) than the io system itself. All part-and-parcel to a degree, but it may be worth investigatingI don't understand this.Thanks. If tango.io were to retain CR on readln, then it would come out ahead of everything else in this particular testb) foregoing the output .newline, purely as an experiment4.7s tcatCan you distill the benefits of retaining CR on a readline, please?I am pasting fragments from an email to Walter. He suggested this at a point, and I managed to persuade him to keep the newline in there. Essentially it's about information. The naive loop: while (readln(line)) { write(line); } is guaranteed 100% to produce an accurate copy of its input. The version that chops lines looks like: while (readln(line)) { writeln(line); } This may or may not add a newline to the output, possibly creating a file larger by one byte. This is the kind of imprecision that makes the difference between a well-designed API and an almost-good one. Moreover, with the automated chopping it is basically impossible to write a program that exactly reproduces its input because readln essentially loses information. Also, stdio also offers a readln() that creates a new line on every call. That is useful if you want fresh lines every read: char[] line; while ((line = readln()).length > 0) { ++dictionary[line]; } The code _just works_ because an empty line means _precisely_ and without the shadow of a doubt that the file has ended. (An I/O error throws an exception, and does NOT return an empty line; that is another important point.) An API that uses automated chopping should not offer such a function because an empty line may mean that an empty line was read, or that it's eof time. So the API would force people to write convoluted code. In the couple of years I've used Perl I've thanked the Perl folks for their readline decision numerous times. Ever tried to do cin or fscanf? You can't do any intelligent input with them because they skip whitespace and newlines like it's out of style. All of my C++ applications use getline() or fgets() (both of which thankfully do include the newline) and then process the line in-situ. Andrei
Mar 21 2007
Andrei Alexandrescu (See Website For Email) wrote: [snip]Yeah, I can imagine. Module tango.io.Console at line 119 should have a slice in it ... if you change 'j' to be 'i+1' instead, that should remove the chop Tango should still come out in front, although I have to say that benchmarks don't really tell very much in general i.e. doesn't mean much of anything important whether tango "wins" this or not (IMO) Having said that, I'm very glad you ran this since it shows how much overhead there is in a flush operation (on *nix) that's very useful to knowWell probably but must be tested. Newlines comprise about 3% of the file size.4.7s tcatThanks. If tango.io were to retain CR on readln, then it would come out ahead of everything else in this particular testThat's a valid point [snip]Can you distill the benefits of retaining CR on a readline, please?I am pasting fragments from an email to Walter. He suggested this at a point, and I managed to persuade him to keep the newline in there. Essentially it's about information. The naive loop: while (readln(line)) { write(line); } is guaranteed 100% to produce an accurate copy of its input. The version that chops lines looks like: while (readln(line)) { writeln(line); } This may or may not add a newline to the output, possibly creating a file larger by one byte. This is the kind of imprecision that makes the difference between a well-designed API and an almost-good one. Moreover, with the automated chopping it is basically impossible to write a program that exactly reproduces its input because readln essentially loses information.
Mar 21 2007
kris wrote:Having said that, I'm very glad you ran this since it shows how much overhead there is in a flush operation (on *nix) that's very useful to knowThe flush on newline should only be done if isatty() returns !=0.
Mar 21 2007
Walter Bright wrote:kris wrote:yep; if you were to submit a ticket for that, it would be appreciated :) http://www.dsource.org/projects/tango/newticketHaving said that, I'm very glad you ran this since it shows how much overhead there is in a flush operation (on *nix) that's very useful to knowThe flush on newline should only be done if isatty() returns !=0.
Mar 21 2007
kris wrote:Andrei Alexandrescu (See Website For Email) wrote: [snip]Yum.Yeah, I can imagine. Module tango.io.Console at line 119 should have a slice in it ... if you change 'j' to be 'i+1' instead, that should remove the chopWell probably but must be tested. Newlines comprise about 3% of the file size.4.7s tcatThanks. If tango.io were to retain CR on readln, then it would come out ahead of everything else in this particular testTango should still come out in front, although I have to say that benchmarks don't really tell very much in general i.e. doesn't mean much of anything important whether tango "wins" this or not (IMO)Why not? Programs using the standard input and output are ubiquitous, efficient, and extremely easy to combine. I write them all the time for processing huge amounts of data. I didn't run the tests willy-nilly. I had a Perl script that took a night to run (it scrambles through some 20 GB of data), so I decided to give D a shot. The D equivalent was two times slower. With the new readln, it takes 98 minutes; parallelized, it is hand over fist another five times faster (which was impossible in the previous version because it used 98% CPU). I was actually surprised that nobody noticed phobos' low I/O speed in years. It's a maker or breaker for me and many others. If there's any chance that automated chopping could be removed from Tango, that would be awesome. Also it would be great to fix the incompatibility created by using read/write instead of getline. Andrei
Mar 21 2007
On Wed, 21 Mar 2007 17:11:56 -0700, Andrei Alexandrescu (See Website For Email) wrote:I was actually surprised that nobody noticed phobos' low I/O speed in years. It's a maker or breaker for me and many others.Most programs I run that do lots of I/O only take seconds to run, so if they run 50% slower or faster, not only wouldn't I notice, I wouldn't care. Taking a sip of coffee takes longer than that. That is why I haven't noticed. (Maybe I should continue working on my mini-DataBase library project and give it a good "real world" workout <G>) By the way, I do appreciate you doing this performance comparison and improving Phobos' I/O routine. -- Derek (skype: derek.j.parnell) Melbourne, Australia "Justice for David Hicks!" 22/03/2007 11:19:16 AM
Mar 21 2007
u r working on database? i have a feeling that SQL ain't really suitable for databse-related development, any better idea?On Wed, 21 Mar 2007 17:11:56 -0700, Andrei Alexandrescu (See Website For Email) wrote:I was actually surprised that nobody noticed phobos' low I/O speed in years. It's a maker or breaker for me and many others.Most programs I run that do lots of I/O only take seconds to run, so if they run 50% slower or faster, not only wouldn't I notice, I wouldn't care. Taking a sip of coffee takes longer than that. That is why I haven't noticed. (Maybe I should continue working on my mini-DataBase library project and give it a good "real world" workout <G>) By the way, I do appreciate you doing this performance comparison and improving Phobos' I/O routine.
Mar 21 2007
On Thu, 22 Mar 2007 10:22:28 +0800, Davidl wrote:u r working on database? i have a feeling that SQL ain't really suitable for databse-related development, any better idea?Yep. A light-weight, single-user D/B suitable for "home" applications. It has its own DSL so I'm hoping to eventually use some of D's new mixin goodies to help generate optimal code from high-level Database statements. -- Derek (skype: derek.j.parnell) Melbourne, Australia "Justice for David Hicks!" 22/03/2007 1:49:41 PM
Mar 21 2007
Andrei Alexandrescu (See Website For Email) wrote:kris wrote:[snip]If tango were terribly terribly slow instead, then it would be cause for concern. If I have some program that needs to run faster I'll find a way to do just that; another reason why tango.io is fairly modular [snip]Tango should still come out in front, although I have to say that benchmarks don't really tell very much in general i.e. doesn't mean much of anything important whether tango "wins" this or not (IMO)Why not?I was actually surprised that nobody noticed phobos' low I/O speed in years. It's a maker or breaker for me and many others.That assumes IO performance wasn't brought up as an issue before ;)If there's any chance that automated chopping could be removed from Tango, that would be awesome. Also it would be great to fix the incompatibility created by using read/write instead of getline.Sure; could you submit a ticket for it, please, lest it fall by the wayside? http://www.dsource.org/projects/tango/newticket
Mar 21 2007
kris wrote:Andrei Alexandrescu (See Website For Email) wrote:That's great, but by and large, the attitude that "this is the simple version; if you want performance, you gotta work for it" is precisely what I don't like about certain languages and APIs. This is, for example, why not everybody really condemns C++ iostreams in spite of them being a pinnacle of counter-performance in any contest, be it beauty, size, or speed. People know that C++ can do fast I/O and are driven by the attitude that you gotta work for it - there's no other way. Just make the clear and simple code fastest. One thing I like about D is that it clearly strives to achieve best performance for simply-written code.kris wrote:[snip]If tango were terribly terribly slow instead, then it would be cause for concern. If I have some program that needs to run faster I'll find a way to do just that; another reason why tango.io is fairly modularTango should still come out in front, although I have to say that benchmarks don't really tell very much in general i.e. doesn't mean much of anything important whether tango "wins" this or not (IMO)Why not?[snip]For the \n, read/write, or both? :o) AndreiI was actually surprised that nobody noticed phobos' low I/O speed in years. It's a maker or breaker for me and many others.That assumes IO performance wasn't brought up as an issue before ;)If there's any chance that automated chopping could be removed from Tango, that would be awesome. Also it would be great to fix the incompatibility created by using read/write instead of getline.Sure; could you submit a ticket for it, please, lest it fall by the wayside? http://www.dsource.org/projects/tango/newticket
Mar 21 2007
Andrei Alexandrescu (See Website For Email) wrote:kris wrote:Oh, if there's any implication that Tango ought to be "faster" than it is, then I suspect you're being unjust, Andrei. You'll be hard pressed to find, for example, some routine that hits the heap where that should be avoided. The library was built to avoid such pitfalls That aside, tango.io appears to be fast enough and simple enough. The fastest in this case, even, assuming we do something useful about the CR chop, .newline is adjusted, or "\n" is used instead ;) [snip]Andrei Alexandrescu (See Website For Email) wrote:That's great, but by and large, the attitude that "this is the simple version; if you want performance, you gotta work for it" is precisely what I don't like about certain languages and APIs. This is, for example, why not everybody really condemns C++ iostreams in spite of them being a pinnacle of counter-performance in any contest, be it beauty, size, or speed. People know that C++ can do fast I/O and are driven by the attitude that you gotta work for it - there's no other way. Just make the clear and simple code fastest. One thing I like about D is that it clearly strives to achieve best performance for simply-written code.kris wrote:[snip]If tango were terribly terribly slow instead, then it would be cause for concern. If I have some program that needs to run faster I'll find a way to do just that; another reason why tango.io is fairly modularTango should still come out in front, although I have to say that benchmarks don't really tell very much in general i.e. doesn't mean much of anything important whether tango "wins" this or not (IMO)Why not?Both, if you prefer?Sure; could you submit a ticket for it, please, lest it fall by the wayside? http://www.dsource.org/projects/tango/newticketFor the \n, read/write, or both? :o)
Mar 21 2007
kris wrote:Andrei Alexandrescu (See Website For Email) wrote:Do it and let's test. Andreikris wrote:Oh, if there's any implication that Tango ought to be "faster" than it is, then I suspect you're being unjust, Andrei. You'll be hard pressed to find, for example, some routine that hits the heap where that should be avoided. The library was built to avoid such pitfalls That aside, tango.io appears to be fast enough and simple enough. The fastest in this case, even, assuming we do something useful about the CR chop, .newline is adjusted, or "\n" is used instead ;)Andrei Alexandrescu (See Website For Email) wrote:That's great, but by and large, the attitude that "this is the simple version; if you want performance, you gotta work for it" is precisely what I don't like about certain languages and APIs. This is, for example, why not everybody really condemns C++ iostreams in spite of them being a pinnacle of counter-performance in any contest, be it beauty, size, or speed. People know that C++ can do fast I/O and are driven by the attitude that you gotta work for it - there's no other way. Just make the clear and simple code fastest. One thing I like about D is that it clearly strives to achieve best performance for simply-written code.kris wrote:[snip]If tango were terribly terribly slow instead, then it would be cause for concern. If I have some program that needs to run faster I'll find a way to do just that; another reason why tango.io is fairly modularTango should still come out in front, although I have to say that benchmarks don't really tell very much in general i.e. doesn't mean much of anything important whether tango "wins" this or not (IMO)Why not?
Mar 22 2007
Andrei Alexandrescu (See Website For Email) wrote:kris wrote:[snip]you can try it right now with a Cout(line)("\n"); The option to eschew the chop is checked in also. You'll perhaps see from the Win32 tests that tango.io is pretty darned fast anyway?That aside, tango.io appears to be fast enough and simple enough. The fastest in this case, even, assuming we do something useful about the CR chop, .newline is adjusted, or "\n" is used instead ;)Do it and let's test.
Mar 22 2007
kris wrote:Andrei Alexandrescu (See Website For Email) wrote:On my Linux box: import tango.io.Console; void main() { char[] line; while (Cin.nextLine(line)) { Cout(line)("\n"); } } 7.8s tcat Andreikris wrote:[snip]you can try it right now with a Cout(line)("\n"); The option to eschew the chop is checked in also. You'll perhaps see from the Win32 tests that tango.io is pretty darned fast anyway?That aside, tango.io appears to be fast enough and simple enough. The fastest in this case, even, assuming we do something useful about the CR chop, .newline is adjusted, or "\n" is used instead ;)Do it and let's test.
Mar 22 2007
kris wrote:Andrei Alexandrescu (See Website For Email) wrote:Oh, but I forgot it's cheating: uses read/write so it's incompatible with C's stdio, which phobos is. Andreikris wrote:Oh, if there's any implication that Tango ought to be "faster" than it is, then I suspect you're being unjust, Andrei. You'll be hard pressed to find, for example, some routine that hits the heap where that should be avoided. The library was built to avoid such pitfalls That aside, tango.io appears to be fast enough and simple enough. The fastest in this case, even, assuming we do something useful about the CR chop, .newline is adjusted, or "\n" is used instead ;)Andrei Alexandrescu (See Website For Email) wrote:That's great, but by and large, the attitude that "this is the simple version; if you want performance, you gotta work for it" is precisely what I don't like about certain languages and APIs. This is, for example, why not everybody really condemns C++ iostreams in spite of them being a pinnacle of counter-performance in any contest, be it beauty, size, or speed. People know that C++ can do fast I/O and are driven by the attitude that you gotta work for it - there's no other way. Just make the clear and simple code fastest. One thing I like about D is that it clearly strives to achieve best performance for simply-written code.kris wrote:[snip]If tango were terribly terribly slow instead, then it would be cause for concern. If I have some program that needs to run faster I'll find a way to do just that; another reason why tango.io is fairly modularTango should still come out in front, although I have to say that benchmarks don't really tell very much in general i.e. doesn't mean much of anything important whether tango "wins" this or not (IMO)Why not?
Mar 22 2007
Andrei Alexandrescu (See Website For Email) wrote: [snip]Oh, but I forgot it's cheating: uses read/write so it's incompatible with C's stdio, which phobos is.How can it possibly be "cheating" when the code was in place before you contrived this test ;) I think you have to stretch a bit to find some /common/ and truly valid cases where what you refer to is important enough to warrant such attention. If it truly were to become an issue (people actually run into problems with it on a regular basis) then tango.io could be changed to special-case this kind of thing; but at this time we prefer to avoid such things and adhere to the KISS principal instead. FWIW, tango.io could trivially be sped up significantly on this 'test' -- as it stands, the implementation is quite pedestrian in nature ;)
Mar 22 2007
kris wrote:Andrei Alexandrescu (See Website For Email) wrote: [snip]"Principle" I guess. That sounds great. My opinion in the matter is simple - D's stdio use C's FILE*, stdio lib, and all. Moreover, it gives the programmer full access to them. It would be only nice, if it does not cost too much, to not be gratuitously incompatible with them. That's all. If you want to take the other route, you better disable access to C's getchar et al.Oh, but I forgot it's cheating: uses read/write so it's incompatible with C's stdio, which phobos is.How can it possibly be "cheating" when the code was in place before you contrived this test ;) I think you have to stretch a bit to find some /common/ and truly valid cases where what you refer to is important enough to warrant such attention. If it truly were to become an issue (people actually run into problems with it on a regular basis) then tango.io could be changed to special-case this kind of thing; but at this time we prefer to avoid such things and adhere to the KISS principal instead.FWIW, tango.io could trivially be sped up significantly on this 'test' -- as it stands, the implementation is quite pedestrian in nature ;)The 'test' is not a 'test', it's a test deriving from my attempts to find the bottleneck in a real D program. Andrei
Mar 22 2007
Andrei Alexandrescu (See Website For Email) wrote:kris wrote:Yep. A thousand pardons for my late night spelling mistake. I'll be sure to reciprocate in future also, if that would be helpful?Andrei Alexandrescu (See Website For Email) wrote: [snip]"Principle" I guess.Oh, but I forgot it's cheating: uses read/write so it's incompatible with C's stdio, which phobos is.How can it possibly be "cheating" when the code was in place before you contrived this test ;) I think you have to stretch a bit to find some /common/ and truly valid cases where what you refer to is important enough to warrant such attention. If it truly were to become an issue (people actually run into problems with it on a regular basis) then tango.io could be changed to special-case this kind of thing; but at this time we prefer to avoid such things and adhere to the KISS principal instead.That sounds great. My opinion in the matter is simple - D's stdio use C's FILE*, stdio lib, and all. Moreover, it gives the programmer full access to them. It would be only nice, if it does not cost too much, to not be gratuitously incompatible with them. That's all. If you want to take the other route, you better disable access to C's getchar et al.Yes, thanks for that option. It is certainly one approach that has been considered before, and a trivial one to implement. We'll probably cross that bridge when we reach it. It's worth noting, however, that Tango is focused for usage with D programs; not C BTW: the use of gratuitous here is wholly out of context; some might interpret the usage as an implication that Tango is based upon a whim ;)It's being referred to as a "benchmark" Andrei; I was trying to be somewhat less political by calling it a 'test'. Many pardonsFWIW, tango.io could trivially be sped up significantly on this 'test' -- as it stands, the implementation is quite pedestrian in nature ;)The 'test' is not a 'test', it's a test deriving from my attempts to find the bottleneck in a real D program.
Mar 22 2007
Andrei Alexandrescu (See Website For Email) wrote:kris wrote:If I understand you correctly, you're saying that all IO packages must go through the standard C library so they stay in sync with the C IO routines? What is the point of read/write, ReadFile/WriteFile, etc, then? SeanThat aside, tango.io appears to be fast enough and simple enough. The fastest in this case, even, assuming we do something useful about the CR chop, .newline is adjusted, or "\n" is used instead ;)Oh, but I forgot it's cheating: uses read/write so it's incompatible with C's stdio, which phobos is.
Mar 22 2007
Sean Kelly wrote:Andrei Alexandrescu (See Website For Email) wrote:I think for stdio, going through the standard C library would be very advisable. If, on the other hand, a library chooses to implement a file abstraction not exposing FILE*, it could use whichever means. Andreikris wrote:If I understand you correctly, you're saying that all IO packages must go through the standard C library so they stay in sync with the C IO routines? What is the point of read/write, ReadFile/WriteFile, etc, then?That aside, tango.io appears to be fast enough and simple enough. The fastest in this case, even, assuming we do something useful about the CR chop, .newline is adjusted, or "\n" is used instead ;)Oh, but I forgot it's cheating: uses read/write so it's incompatible with C's stdio, which phobos is.
Mar 22 2007
kris wrote:Andrei Alexandrescu (See Website For Email) wrote:One problem with C++, as I mentioned before, is that the straightforward, out of the box coding techniques don't get you fast code. One of my goals with D is to fix that - the straightforward, untuned code should get you most of the possible speed. I think the wc benchmark shows this off. Having to recode one's programs to speed them up is a big productivity sapper. (The most egregious examples of this are people forced to recode bits of their python/java/ruby app in C++.) What makes stdio so worth the effort to speed up is because the payoff is evident in 80-90% of the programs out there. Optimizing your own program speeds up only your own program - optimizing the library speeds everyone up. Tango doesn't need to be terribly, terribly slow to be a cause for concern. It only needs to be slower than C++/Perl/Java to be a problem, because then it is a convenient excuse for people to not switch to D. The conventional wisdom with C++ is that: 1) C++ code is inherently faster than in any other language 2) iostream has a great design 3) iostream is uber fast because it uses templates to inline everything Andrei's benchmark blows that out of the water. Even interpreted Perl beats the pants off of C++ iostreams.kris wrote:[snip]If tango were terribly terribly slow instead, then it would be cause for concern. If I have some program that needs to run faster I'll find a way to do just that; another reason why tango.io is fairly modularTango should still come out in front, although I have to say that benchmarks don't really tell very much in general i.e. doesn't mean much of anything important whether tango "wins" this or not (IMO)Why not?
Mar 21 2007
Walter Bright wrote:kris wrote:You could tell from this and my (almost identical) post that Walter's propaganda got me thoroughly brainwashed :o). AndreiAndrei Alexandrescu (See Website For Email) wrote:One problem with C++, as I mentioned before, is that the straightforward, out of the box coding techniques don't get you fast code. One of my goals with D is to fix that - the straightforward, untuned code should get you most of the possible speed. I think the wc benchmark shows this off.kris wrote:[snip]If tango were terribly terribly slow instead, then it would be cause for concern. If I have some program that needs to run faster I'll find a way to do just that; another reason why tango.io is fairly modularTango should still come out in front, although I have to say that benchmarks don't really tell very much in general i.e. doesn't mean much of anything important whether tango "wins" this or not (IMO)Why not?
Mar 21 2007
Walter Bright wrote:kris wrote:tango.io is not even optimized for this case (unlike the new Phobos code), and yet it is still faster than all others once the flush() is removed? The earlier point is only that optimization can easily be premature and misguided; typically better to get a flexible and effective design instead. This should not have given anyone cause to assume, assert, or imply that tango is in any way inefficient -- apparently that needs to be clarified ;) For the record, my perspective of "terribly, terribly slow" is pretty much where C++ landed in this particular caseAndrei Alexandrescu (See Website For Email) wrote:One problem with C++, as I mentioned before, is that the straightforward, out of the box coding techniques don't get you fast code. One of my goals with D is to fix that - the straightforward, untuned code should get you most of the possible speed. I think the wc benchmark shows this off. Having to recode one's programs to speed them up is a big productivity sapper. (The most egregious examples of this are people forced to recode bits of their python/java/ruby app in C++.) What makes stdio so worth the effort to speed up is because the payoff is evident in 80-90% of the programs out there. Optimizing your own program speeds up only your own program - optimizing the library speeds everyone up. Tango doesn't need to be terribly, terribly slow to be a cause for concern. It only needs to be slower than C++/Perl/Java to be a problem, because then it is a convenient excuse for people to not switch to D. The conventional wisdom with C++ is that: 1) C++ code is inherently faster than in any other language 2) iostream has a great design 3) iostream is uber fast because it uses templates to inline everything Andrei's benchmark blows that out of the water. Even interpreted Perl beats the pants off of C++ iostreams.kris wrote:[snip]If tango were terribly terribly slow instead, then it would be cause for concern. If I have some program that needs to run faster I'll find a way to do just that; another reason why tango.io is fairly modularTango should still come out in front, although I have to say that benchmarks don't really tell very much in general i.e. doesn't mean much of anything important whether tango "wins" this or not (IMO)Why not?
Mar 21 2007
Walter Bright wrote: [snip]The conventional wisdom with C++ is that: 1) C++ code is inherently faster than in any other language 2) iostream has a great design 3) iostream is uber fast because it uses templates to inline everything Andrei's benchmark blows that out of the water. Even interpreted Perl beats the pants off of C++ iostreams.This kind of simplistic bashing of a language or library design based on testing of some unnamed implementation(s) of that library doesn't give D a good image. There are other benchmarks that show C++ IOStreams beating C's stdio on performance. Those are also meaningless out of context. There are real issues with some of the design of IOStreams. There are very real problems with many implementations of IOStreams. There are also good implementations that perform pretty well, but overall IOStreams is not widely viewed in the C++ community as "having a great design", just as having a design that's OK, and a lot safer and more cleanly extensible than C's stdio. Of course C++ code isn't inherently faster than any other language, and I've not come across anyone saying that it is. And one of the main problems with IOStreams is that it makes excessive use of virtual functions in ways that inhibit inlining, particularly in typical implementations which drag in locale support even for programs that do not use it. The C++ community recognizes these problems, and the C++ committee has addressed some of them (through exposition) in its Technical Report on C++ performance. I'm at a loss to understand why you would write what you did. It seems to be a straw man, but maybe there was something else to it -- frustration that people assume that D must be slower than C++? -- James
Mar 21 2007
James Dennett wrote:Walter Bright wrote: [snip]For the record, I used gcc 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5).The conventional wisdom with C++ is that: 1) C++ code is inherently faster than in any other language 2) iostream has a great design 3) iostream is uber fast because it uses templates to inline everything Andrei's benchmark blows that out of the water. Even interpreted Perl beats the pants off of C++ iostreams.This kind of simplistic bashing of a language or library design based on testing of some unnamed implementation(s) of that library doesn't give D a good image. There are other benchmarks that show C++ IOStreams beating C's stdio on performance. Those are also meaningless out of context.There are real issues with some of the design of IOStreams. There are very real problems with many implementations of IOStreams. There are also good implementations that perform pretty well, but overall IOStreams is not widely viewed in the C++ community as "having a great design", just as having a design that's OK, and a lot safer and more cleanly extensible than C's stdio. Of course C++ code isn't inherently faster than any other language, and I've not come across anyone saying that it is. And one of the main problems with IOStreams is that it makes excessive use of virtual functions in ways that inhibit inlining, particularly in typical implementations which drag in locale support even for programs that do not use it. The C++ community recognizes these problems, and the C++ committee has addressed some of them (through exposition) in its Technical Report on C++ performance. I'm at a loss to understand why you would write what you did. It seems to be a straw man, but maybe there was something else to it -- frustration that people assume that D must be slower than C++?I don't know why he wrote that, but my perception is that iostreams have always been "on the verge of an efficient implementation" for eight years now. What I've seen repeatedly year after year whenever I sat down to run a test was performance that make iostream practically unusable for any serious coding. I'd be faster at moving molasses upstream on a cold day. I am amazed how iostreams managed to maintain this clout for so long. If they were a guy, I'd love to know his trick. :o) Andrei
Mar 22 2007
James Dennett wrote:I'm at a loss to understand why you would write what you did. It seems to be a straw man, but maybe there was something else to it -- frustration that people assume that D must be slower than C++?Maybe it is a bit of frustration on my part. I often run into people who, when faced with benchmarks showing that conventional D runs code faster than conventional C++, tell me in various ways that it can't be true. I must have: 1) written bad C++ code 2) lied 3) used a sabotaged C++ compiler 4) written some magic optimization that only works on that carefully crafted benchmark So, I have some justification in saying what I did about the conventional wisdom of C++. I also know that the top tier of experienced C++ programmers are well aware such conventional wisdom is not true. I have a lot of experience in making C++ code run fast. It doesn't come easy, it takes a lot of work back and forth with a profiler. It usually involves going around the C++ runtime library. That experience has certainly strongly influenced the design of D. I don't wish to have to write custom I/O just to get good I/O performance. I don't wish to keep doing all the clever string hacks trying to make 0 terminated strings fast. I want the natural, straightforward D code to be (at least close to) the best performing way to implement an algorithm.
Mar 22 2007
Walter Bright wrote:James Dennett wrote:Good answer. (Yes, seriously.) It's certainly true that for code doing large amounts of I/O where performance was an issue, I've always avoided IOStreams in these situations; no implementation I've used has been anywhere near fast enough. IOStreams is also a pain where robustness is required. It's most useful for simple tools that are used in tame environments. The last time I had to get out a profiler to optimize C++ code, it turned out to mostly be an exercise in avoiding (a) a terribly inefficient implementation of std::string, and (b) a mind-bogglingly inefficient implementation of strftime. Which I guess illustrates how important it is that the out-of-the-box, natural ways to write code should have performance that is not too far removed from optimal. It might be harsh, but not entirely unjustified, to say that the "conventional wisdom" of many communities of programmers is a long, long way from being wise. As the community behind a language grows larger, there is a natural tendency for it not to have some a density of experts; if D amasses a million users it's a safe bet than most of them won't be as sharp as the average D user is today. -- JamesI'm at a loss to understand why you would write what you did. It seems to be a straw man, but maybe there was something else to it -- frustration that people assume that D must be slower than C++?Maybe it is a bit of frustration on my part. I often run into people who, when faced with benchmarks showing that conventional D runs code faster than conventional C++, tell me in various ways that it can't be true. I must have: 1) written bad C++ code 2) lied 3) used a sabotaged C++ compiler 4) written some magic optimization that only works on that carefully crafted benchmark So, I have some justification in saying what I did about the conventional wisdom of C++. I also know that the top tier of experienced C++ programmers are well aware such conventional wisdom is not true. I have a lot of experience in making C++ code run fast. It doesn't come easy, it takes a lot of work back and forth with a profiler. It usually involves going around the C++ runtime library. That experience has certainly strongly influenced the design of D. I don't wish to have to write custom I/O just to get good I/O performance. I don't wish to keep doing all the clever string hacks trying to make 0 terminated strings fast. I want the natural, straightforward D code to be (at least close to) the best performing way to implement an algorithm.
Mar 22 2007
James Dennett wrote:Walter Bright wrote:It might be harsh, but not entirely unjustified, to say that the "conventional wisdom" of many communities of programmers is a long, long way from being wise. As the community behind a language grows larger, there is a natural tendency for it not to have some a density of experts; if D amasses a million users it's a safe bet than most of them won't be as sharp as the average D user is today.I think there is a tendency to assume that APIs and languages which have (A) been around a long time and (B) been used by millions of people will probably be close to optimal. It just makes sense that that would be the case. Unfortunately, it's all too often just not true. --bb
Mar 23 2007
Bill Baxter wrote:James Dennett wrote:D bucks conventional wisdom in more than one way. There's a current debate going on among people involved in the next C++ standardization effort about whether to include garbage collection or not. The people involved are arguably the top tier of C++ programmers. But still, there are one or two that repeat the conventional (and wrong) wisdom about garbage collection. Such conventional wisdom is much more common among the general population of C++ programmers.Walter Bright wrote:It might be harsh, but not entirely unjustified, to say that the "conventional wisdom" of many communities of programmers is a long, long way from being wise. As the community behind a language grows larger, there is a natural tendency for it not to have some a density of experts; if D amasses a million users it's a safe bet than most of them won't be as sharp as the average D user is today.I think there is a tendency to assume that APIs and languages which have (A) been around a long time and (B) been used by millions of people will probably be close to optimal. It just makes sense that that would be the case. Unfortunately, it's all too often just not true.I just find it strange that C++, a language meant for building speedy applications, would incorporate iostreams, which is slow, not thread safe, and not exception safe.
Mar 23 2007
Walter Bright wrote:Bill Baxter wrote:Which "wrong" assertions are those?James Dennett wrote:D bucks conventional wisdom in more than one way. There's a current debate going on among people involved in the next C++ standardization effort about whether to include garbage collection or not. The people involved are arguably the top tier of C++ programmers. But still, there are one or two that repeat the conventional (and wrong) wisdom about garbage collection. Such conventional wisdom is much more common among the general population of C++ programmers.Walter Bright wrote:It might be harsh, but not entirely unjustified, to say that the "conventional wisdom" of many communities of programmers is a long, long way from being wise. As the community behind a language grows larger, there is a natural tendency for it not to have some a density of experts; if D amasses a million users it's a safe bet than most of them won't be as sharp as the average D user is today.I'm intrigued by your claim that IOStreams is not thread-safe; the IOStreams framework is thread-safe in the same way that the STL is thread-safe. The one minor difference is that IOStreams exposes some global variables, which is unfortunate as they can easily be used in inappropriate ways in a multi-threaded environment. Then again, that is unsurprising as C++ does not yet officially incorporate support for multi-threading. Is there something deeper in IOStreams that you consider to be thread-unsafe, or is it just the matter of its global variables? -- JamesI think there is a tendency to assume that APIs and languages which have (A) been around a long time and (B) been used by millions of people will probably be close to optimal. It just makes sense that that would be the case. Unfortunately, it's all too often just not true.I just find it strange that C++, a language meant for building speedy applications, would incorporate iostreams, which is slow, not thread safe, and not exception safe.
Mar 24 2007
James Dennett wrote:Walter Bright wrote:cout << a << b; can't guarantee that a and b will be adjacent in the output. In contrast, printf(format, a, b); does give that guarantee. Moreover, that guarantee is not between separate threads in the same process, it's between whole processes! Guess which of the two is usable :o). Btw, does tango provide such a guarantee for code such as Cout(a)(b)? From the construct, my understanding is that it doesn't. AndreiBill Baxter wrote:Which "wrong" assertions are those?James Dennett wrote:D bucks conventional wisdom in more than one way. There's a current debate going on among people involved in the next C++ standardization effort about whether to include garbage collection or not. The people involved are arguably the top tier of C++ programmers. But still, there are one or two that repeat the conventional (and wrong) wisdom about garbage collection. Such conventional wisdom is much more common among the general population of C++ programmers.Walter Bright wrote: It might be harsh, but not entirely unjustified, to say that the "conventional wisdom" of many communities of programmers is a long, long way from being wise. As the community behind a language grows larger, there is a natural tendency for it not to have some a density of experts; if D amasses a million users it's a safe bet than most of them won't be as sharp as the average D user is today.I'm intrigued by your claim that IOStreams is not thread-safe; the IOStreams framework is thread-safe in the same way that the STL is thread-safe. The one minor difference is that IOStreams exposes some global variables, which is unfortunate as they can easily be used in inappropriate ways in a multi-threaded environment. Then again, that is unsurprising as C++ does not yet officially incorporate support for multi-threading. Is there something deeper in IOStreams that you consider to be thread-unsafe, or is it just the matter of its global variables?I think there is a tendency to assume that APIs and languages which have (A) been around a long time and (B) been used by millions of people will probably be close to optimal. It just makes sense that that would be the case. Unfortunately, it's all too often just not true.I just find it strange that C++, a language meant for building speedy applications, would incorporate iostreams, which is slow, not thread safe, and not exception safe.
Mar 24 2007
Andrei Alexandrescu (See Website For Email) wrote:James Dennett wrote:[snip]As you appear to be saying that printf has to flush every time it's used, I'd guess that it's unusable for performance reasons alone. It's also really hard to implement such a guarantee on most platforms without using some kind of process-shared mutex, file lock, or similar. Does printf really incur that kind of overhead every time something is written to a stream, or does its implementation make use of platform-specific knowledge on which writes are atomic at the OS level? Within a process, this level of safety could be achieved with only a little (usually redundant) synchronization. Which is useful for debugging or simplistic logging,but not for anything else I've seen. (IOStreams has this wrong, in different ways: it's not just the order of output that's ill-defined if a stream is used concurrently across multiple threads. Nasal demons are also possible, I hear.)I'm intrigued by your claim that IOStreams is not thread-safe; the IOStreams framework is thread-safe in the same way that the STL is thread-safe. The one minor difference is that IOStreams exposes some global variables, which is unfortunate as they can easily be used in inappropriate ways in a multi-threaded environment. Then again, that is unsurprising as C++ does not yet officially incorporate support for multi-threading. Is there something deeper in IOStreams that you consider to be thread-unsafe, or is it just the matter of its global variables?cout << a << b; can't guarantee that a and b will be adjacent in the output. In contrast, printf(format, a, b); does give that guarantee. Moreover, that guarantee is not between separate threads in the same process, it's between whole processes! Guess which of the two is usable :o).Btw, does tango provide such a guarantee for code such as Cout(a)(b)? From the construct, my understanding is that it doesn't.I'll leave that for the Tango experts to answer. -- James
Mar 24 2007
James Dennett wrote:Andrei Alexandrescu (See Website For Email) wrote:In order for printf to work right it does not need to flush every time (you're right in that would lead to terrible performance). The usual thing that printf does is only do a flush if isatty() comes back with true. In fact, flushing the output at the end of each printf would not mitigate multithreading problems at all. In order for printf to be thread safe, all that's necessary is for it to acquire/release the C stream lock (C's implementation of stdio has a lock associated with each stream). D's implementation of writef does the same thing. D's writef also wraps the whole thing in a try-finally, making it exception safe. Iostreams' cout << a << b; results in the equivalent of: (cout->out(a))->out(b); The trouble is, there's no place to hang the lock acquire/release, nor the try-finally. It's a fundamental design problem.cout << a << b; can't guarantee that a and b will be adjacent in the output. In contrast, printf(format, a, b); does give that guarantee. Moreover, that guarantee is not between separate threads in the same process, it's between whole processes! Guess which of the two is usable :o).As you appear to be saying that printf has to flush every time it's used, I'd guess that it's unusable for performance reasons alone.It's also really hard to implement such a guarantee on most platforms without using some kind of process-shared mutex, file lock, or similar. Does printf really incur that kind of overhead every time something is written to a stream,It does exactly one lock acquire/release for each printf, not for each character written.or does its implementation make use of platform-specific knowledge on which writes are atomic at the OS level? Within a process, this level of safety could be achieved with only a little (usually redundant) synchronization.The problem is such synchronization would be invented and added on by the user, making it impossible to combine disparate libraries that write to stderr, for example, in a multithreading environment.Which is useful for debugging or simplistic logging,but not for anything else I've seen. (IOStreams has this wrong, in different ways: it's not just the order of output that's ill-defined if a stream is used concurrently across multiple threads. Nasal demons are also possible, I hear.)
Mar 24 2007
Walter Bright wrote:James Dennett wrote:That would be true, except that Andrei wrote that the guarantee applied to separate processes, and that can only be guaranteed if you both use some kind of synchronization between the processes *and* flush the stream. Andrei's claim went beyond mere thread-safety, and that was what I responded to.Andrei Alexandrescu (See Website For Email) wrote:In order for printf to work right it does not need to flush every time (you're right in that would lead to terrible performance). The usual thing that printf does is only do a flush if isatty() comes back with true. In fact, flushing the output at the end of each printf would not mitigate multithreading problems at all. In order for printf to be thread safe, all that's necessary is for it to acquire/release the C stream lock (C's implementation of stdio has a lock associated with each stream).cout << a << b; can't guarantee that a and b will be adjacent in the output. In contrast, printf(format, a, b); does give that guarantee. Moreover, that guarantee is not between separate threads in the same process, it's between whole processes! Guess which of the two is usable :o).As you appear to be saying that printf has to flush every time it's used, I'd guess that it's unusable for performance reasons alone.D's implementation of writef does the same thing. D's writef also wraps the whole thing in a try-finally, making it exception safe. Iostreams' cout << a << b; results in the equivalent of: (cout->out(a))->out(b); The trouble is, there's no place to hang the lock acquire/release, nor the try-finally. It's a fundamental design problem.There's a place: locked(cout) << a << b; can be made do the job, using RAII to lock at the start of the expression and unlock at the end.Right. I certainly did not intend to imply that any serious design would be silly enough to lock for each character written (which would be fairly useless synchronization in any case).It's also really hard to implement such a guarantee on most platforms without using some kind of process-shared mutex, file lock, or similar. Does printf really incur that kind of overhead every time something is written to a stream,It does exactly one lock acquire/release for each printf, not for each character written.Most libraries ought not to do so; coding dependencies on globals into libraries is generally poor design. The problem is not that users would have to write synchronization. Usually they need to do that. A problem would be if some low-level locking inside the I/O subsystems gave the impression that the user did *not* need to synchronize their own code. It's not quite as simple as this. One (possibly killer) argument for building synchronization into low-level libraries is to reduce the cost of dealing with support issues from bemused users who expected not to have to consider thread-safety when sharing streams between threads. -- Jamesor does its implementation make use of platform-specific knowledge on which writes are atomic at the OS level? Within a process, this level of safety could be achieved with only a little (usually redundant) synchronization.The problem is such synchronization would be invented and added on by the user, making it impossible to combine disparate libraries that write to stderr, for example, in a multithreading environment.
Mar 24 2007
James Dennett wrote:Walter Bright wrote: That would be true, except that Andrei wrote that the guarantee applied to separate processes, and that can only be guaranteed if you both use some kind of synchronization between the processes *and* flush the stream. Andrei's claim went beyond mere thread-safety, and that was what I responded to.Ok, but since it is typical to do a flush on newline if isatty(), that seems to resolve these inter-process problems.There's a place: locked(cout) << a << b; can be made do the job, using RAII to lock at the start of the expression and unlock at the end.I don't think it is that easy, see: http://docs.sun.com/source/819-3690/Multithread.html and http://www.atnf.csiro.au/computing/software/sol2docs/manuals/c++/lib_ref/MT.htmlRight. I certainly did not intend to imply that any serious design would be silly enough to lock for each character written (which would be fairly useless synchronization in any case).It's needed if only to avoid corrupting the I/O buffer itself.I think it is unreasonable to tell users they cannot use standard cin/cout/cerr in standard ways in their library code.The problem is such synchronization would be invented and added on by the user, making it impossible to combine disparate libraries that write to stderr, for example, in a multithreading environment.Most libraries ought not to do so; coding dependencies on globals into libraries is generally poor design.The problem is not that users would have to write synchronization. Usually they need to do that. A problem would be if some low-level locking inside the I/O subsystems gave the impression that the user did *not* need to synchronize their own code. It's not quite as simple as this. One (possibly killer) argument for building synchronization into low-level libraries is to reduce the cost of dealing with support issues from bemused users who expected not to have to consider thread-safety when sharing streams between threads.I think it is a killer argument. Multithreaded programming is hard enough without heaping more burdens on the user.
Mar 24 2007
James Dennett wrote:Walter Bright wrote:Lines don't have to appear at exact times, they only must not interleave. So printf does not have to flush often. I've used printf-level atomicity for a long time on various systems and it works perfectly. Is a system-dependent assumption? I don't know. It sure is there and is very helpful on all systems I used it with. AndreiJames Dennett wrote:That would be true, except that Andrei wrote that the guarantee applied to separate processes, and that can only be guaranteed if you both use some kind of synchronization between the processes *and* flush the stream. Andrei's claim went beyond mere thread-safety, and that was what I responded to.Andrei Alexandrescu (See Website For Email) wrote:In order for printf to work right it does not need to flush every time (you're right in that would lead to terrible performance). The usual thing that printf does is only do a flush if isatty() comes back with true. In fact, flushing the output at the end of each printf would not mitigate multithreading problems at all. In order for printf to be thread safe, all that's necessary is for it to acquire/release the C stream lock (C's implementation of stdio has a lock associated with each stream).cout << a << b; can't guarantee that a and b will be adjacent in the output. In contrast, printf(format, a, b); does give that guarantee. Moreover, that guarantee is not between separate threads in the same process, it's between whole processes! Guess which of the two is usable :o).As you appear to be saying that printf has to flush every time it's used, I'd guess that it's unusable for performance reasons alone.
Mar 24 2007
Andrei Alexandrescu (See Website For Email) wrote:James Dennett wrote:With sufficiently short lines, where the value of "sufficiently" depends on which platform and which kind of file descriptor you're writing to. printf is likely to end up calling write with no locking; write isn't atomic past a certain (or uncertain) size, and has no reason to make the boundary coincide with the end of a line.Walter Bright wrote:Lines don't have to appear at exact times, they only must not interleave. So printf does not have to flush often. I've used printf-level atomicity for a long time on various systems and it works perfectly.James Dennett wrote:That would be true, except that Andrei wrote that the guarantee applied to separate processes, and that can only be guaranteed if you both use some kind of synchronization between the processes *and* flush the stream. Andrei's claim went beyond mere thread-safety, and that was what I responded to.Andrei Alexandrescu (See Website For Email) wrote:In order for printf to work right it does not need to flush every time (you're right in that would lead to terrible performance). The usual thing that printf does is only do a flush if isatty() comes back with true. In fact, flushing the output at the end of each printf would not mitigate multithreading problems at all. In order for printf to be thread safe, all that's necessary is for it to acquire/release the C stream lock (C's implementation of stdio has a lock associated with each stream).cout << a << b; can't guarantee that a and b will be adjacent in the output. In contrast, printf(format, a, b); does give that guarantee. Moreover, that guarantee is not between separate threads in the same process, it's between whole processes! Guess which of the two is usable :o).As you appear to be saying that printf has to flush every time it's used, I'd guess that it's unusable for performance reasons alone.Is a system-dependent assumption? I don't know. It sure is there and is very helpful on all systems I used it with.Can you name one specific system where this is documented as working reliably, or where it can be shown to do so? I've *seen* interleaving between processes, and lived with it in debugging code for performance reasons, but for reliable output have used other mechanisms. I understood this to be a widely known problem with printf, write et al. -- James
Mar 24 2007
James Dennett wrote:It's not quite as simple as this. One (possibly killer) argument for building synchronization into low-level libraries is to reduce the cost of dealing with support issues from bemused users who expected not to have to consider thread-safety when sharing streams between threads....since they obviously don't have to consider thread-safety when sharing other objects between threads. I'll admit that a global output object might be seen as somehow magic to those who don't really understand what 'cout' represents, for example, how much of a problem would this really be? The argument against building locking into C++ containers seems fairly well-settled, so why does there seem to be so much contention about output? Is it that producing predictable behavior is easier or that the cost of locking is less of an issue since IO is expensive anyway? Sean
Mar 24 2007
Sean Kelly wrote:James Dennett wrote:Good question(s). Might be also that I/O interface is considerably simpler than container interface. The classic example of failure of method-level synchronization with containers is if (!cont.empty()) cont.pop(); With I/O, most of the time, covert synchronization at the call level is all you need. AndreiIt's not quite as simple as this. One (possibly killer) argument for building synchronization into low-level libraries is to reduce the cost of dealing with support issues from bemused users who expected not to have to consider thread-safety when sharing streams between threads....since they obviously don't have to consider thread-safety when sharing other objects between threads. I'll admit that a global output object might be seen as somehow magic to those who don't really understand what 'cout' represents, for example, how much of a problem would this really be? The argument against building locking into C++ containers seems fairly well-settled, so why does there seem to be so much contention about output? Is it that producing predictable behavior is easier or that the cost of locking is less of an issue since IO is expensive anyway?
Mar 24 2007
Walter Bright wrote:D's implementation of writef does the same thing. D's writef also wraps the whole thing in a try-finally, making it exception safe. Iostreams' cout << a << b; results in the equivalent of: (cout->out(a))->out(b); The trouble is, there's no place to hang the lock acquire/release, nor the try-finally. It's a fundamental design problem.The stream could acquire a lock and pass it to a proxy object which closes the lock on destruction. This would work fine in C++ where the lifetime of such objects is deterministic, but the design is incredibly awkward.This is still far too granular for most uses. About the only time I actually use output without explicit synchronization are for throw-away debug output.It's also really hard to implement such a guarantee on most platforms without using some kind of process-shared mutex, file lock, or similar. Does printf really incur that kind of overhead every time something is written to a stream,It does exactly one lock acquire/release for each printf, not for each character written.This is a valid point, but how often is it actually used in practice? Libraries generally do not perform error output of their own, and applications typically have a coherent approach for output. In my time as a programmer, I can't think of a single instance where default synchronization to an output device actually mattered. I can certainly appreciate this for its predictable behavior, but I don't know how often that predictability would actually matter to me.or does its implementation make use of platform-specific knowledge on which writes are atomic at the OS level? Within a process, this level of safety could be achieved with only a little (usually redundant) synchronization.The problem is such synchronization would be invented and added on by the user, making it impossible to combine disparate libraries that write to stderr, for example, in a multithreading environment.Exactly. SeanWhich is useful for debugging or simplistic logging,but not for anything else I've seen.
Mar 24 2007
Sean Kelly wrote:I disagree. It's been working fine for nearly 20 years now. gcc implements it the same way, and it's hardly unusable for most uses.It does exactly one lock acquire/release for each printf, not for each character written.This is still far too granular for most uses.It apparently comes up often enough in C++ to merit 59,000 hits on "multithreaded iostreams" and many web pages outlining attempts to solve the problem. It is a problem that is solved by every C stdio for multithreaded environments, although the C standard does not mention the word "thread". Multithreading threatens to become far more common, not less, as we move to multicore machines. If that isn't compelling, ok, but I suggest at a minimum that Tango not lock into a design that *precludes* adding thread synchronization without changing user code.The problem is such synchronization would be invented and added on by the user, making it impossible to combine disparate libraries that write to stderr, for example, in a multithreading environment.This is a valid point, but how often is it actually used in practice? Libraries generally do not perform error output of their own, and applications typically have a coherent approach for output. In my time as a programmer, I can't think of a single instance where default synchronization to an output device actually mattered. I can certainly appreciate this for its predictable behavior, but I don't know how often that predictability would actually matter to me.
Mar 24 2007
Walter Bright wrote:Sean Kelly wrote:True enough. Though I wonder how much of a factor it is that C++ has no built-in support for multithreading, and if this has a positive or negative effect on the number of questions.It apparently comes up often enough in C++ to merit 59,000 hits on "multithreaded iostreams" and many web pages outlining attempts to solve the problem.The problem is such synchronization would be invented and added on by the user, making it impossible to combine disparate libraries that write to stderr, for example, in a multithreading environment.This is a valid point, but how often is it actually used in practice? Libraries generally do not perform error output of their own, and applications typically have a coherent approach for output. In my time as a programmer, I can't think of a single instance where default synchronization to an output device actually mattered. I can certainly appreciate this for its predictable behavior, but I don't know how often that predictability would actually matter to me.It is a problem that is solved by every C stdio for multithreaded environments, although the C standard does not mention the word "thread". Multithreading threatens to become far more common, not less, as we move to multicore machines. If that isn't compelling, ok, but I suggest at a minimum that Tango not lock into a design that *precludes* adding thread synchronization without changing user code.True enough. I suppose that if nothing else, the option for synchronized output to stdout, stderr, and stdlog should be somehow available without user changes, as you say. Sean
Mar 25 2007
James Dennett wrote:As you appear to be saying that printf has to flush every time it's used, I'd guess that it's unusable for performance reasons alone.Numbers clearly tell the above is wrong. Here's the thing: I write programs that write lines to files. If I use cout, they don't work. If I use fprintf, the do work, and 10 times faster. And that's that.It's also really hard to implement such a guarantee on most platforms without using some kind of process-shared mutex, file lock, or similar. Does printf really incur that kind of overhead every time something is written to a stream, or does its implementation make use of platform-specific knowledge on which writes are atomic at the OS level?The C standard library takes care of it without me having to do anything in particular.Within a process, this level of safety could be achieved with only a little (usually redundant) synchronization. Which is useful for debugging or simplistic logging,but not for anything else I've seen.I do not concur. Andrei
Mar 24 2007
Andrei Alexandrescu (See Website For Email) wrote:James Dennett wrote:Only if they apply to the above.As you appear to be saying that printf has to flush every time it's used, I'd guess that it's unusable for performance reasons alone.Numbers clearly tell the above is wrong.Here's the thing: I write programs that write lines to files. If I use cout, they don't work. If I use fprintf, the do work, and 10 times faster. And that's that.Except that your test wasn't of the right thing; you probably didn't test code that guaranteed atomicity of writes between different processes.I've never seen a C library that guarantees atomicity of writes between processes on a Unix-like system. The documentation of some systems does guarantee atomicity of sufficiently small writes to certain types of file descriptors, but I've not seen any Unix-like system that guarantees atomicity for writes of unlimited sizes; in some cases they can even be interrupted before the full amount is written. I've certainly seen the result of C's *printf *not* being synchronized between processes on a wide variety of systems.It's also really hard to implement such a guarantee on most platforms without using some kind of process-shared mutex, file lock, or similar. Does printf really incur that kind of overhead every time something is written to a stream, or does its implementation make use of platform-specific knowledge on which writes are atomic at the OS level?The C standard library takes care of it without me having to do anything in particular.With my description of my own experience? ;) -- JamesWithin a process, this level of safety could be achieved with only a little (usually redundant) synchronization. Which is useful for debugging or simplistic logging,but not for anything else I've seen.I do not concur.
Mar 24 2007
James Dennett wrote:Andrei Alexandrescu (See Website For Email) wrote:If you did, fine. I take that part of my argument back. I'll also note that that doesn't make iostreams any more defensible :o). AndreiJames Dennett wrote:Only if they apply to the above.As you appear to be saying that printf has to flush every time it's used, I'd guess that it's unusable for performance reasons alone.Numbers clearly tell the above is wrong.Here's the thing: I write programs that write lines to files. If I use cout, they don't work. If I use fprintf, the do work, and 10 times faster. And that's that.Except that your test wasn't of the right thing; you probably didn't test code that guaranteed atomicity of writes between different processes.I've never seen a C library that guarantees atomicity of writes between processes on a Unix-like system. The documentation of some systems does guarantee atomicity of sufficiently small writes to certain types of file descriptors, but I've not seen any Unix-like system that guarantees atomicity for writes of unlimited sizes; in some cases they can even be interrupted before the full amount is written. I've certainly seen the result of C's *printf *not* being synchronized between processes on a wide variety of systems.It's also really hard to implement such a guarantee on most platforms without using some kind of process-shared mutex, file lock, or similar. Does printf really incur that kind of overhead every time something is written to a stream, or does its implementation make use of platform-specific knowledge on which writes are atomic at the OS level?The C standard library takes care of it without me having to do anything in particular.
Mar 24 2007
Andrei Alexandrescu (See Website For Email) wrote: [snip]I'll also note that that doesn't make iostreams any more defensible :o).Trying to defend IOStreams is certainly a challenge. I think I've tried enough, given what a sick puppy it is, and now should leave it to suffer in peace. -- James
Mar 24 2007
Andrei Alexandrescu (See Website For Email) wrote:James Dennett wrote:stringstream s; s << a << b; cout << s.str(); ;-)Walter Bright wrote:cout << a << b; can't guarantee that a and b will be adjacent in the output. In contrast, printf(format, a, b); does give that guarantee. Moreover, that guarantee is not between separate threads in the same process, it's between whole processes! Guess which of the two is usable :o).Bill Baxter wrote:Which "wrong" assertions are those?James Dennett wrote:D bucks conventional wisdom in more than one way. There's a current debate going on among people involved in the next C++ standardization effort about whether to include garbage collection or not. The people involved are arguably the top tier of C++ programmers. But still, there are one or two that repeat the conventional (and wrong) wisdom about garbage collection. Such conventional wisdom is much more common among the general population of C++ programmers.Walter Bright wrote: It might be harsh, but not entirely unjustified, to say that the "conventional wisdom" of many communities of programmers is a long, long way from being wise. As the community behind a language grows larger, there is a natural tendency for it not to have some a density of experts; if D amasses a million users it's a safe bet than most of them won't be as sharp as the average D user is today.I'm intrigued by your claim that IOStreams is not thread-safe; the IOStreams framework is thread-safe in the same way that the STL is thread-safe. The one minor difference is that IOStreams exposes some global variables, which is unfortunate as they can easily be used in inappropriate ways in a multi-threaded environment. Then again, that is unsurprising as C++ does not yet officially incorporate support for multi-threading. Is there something deeper in IOStreams that you consider to be thread-unsafe, or is it just the matter of its global variables?I think there is a tendency to assume that APIs and languages which have (A) been around a long time and (B) been used by millions of people will probably be close to optimal. It just makes sense that that would be the case. Unfortunately, it's all too often just not true.I just find it strange that C++, a language meant for building speedy applications, would incorporate iostreams, which is slow, not thread safe, and not exception safe.Btw, does tango provide such a guarantee for code such as Cout(a)(b)? From the construct, my understanding is that it doesn't.No. There really isn't any way to do automatic locking with chained opCall barring the use of proxy objects or something equally nasty. Also, it hurts efficiency to always lock regardless of whether the user is performing IO in multiple threads. The preferred method here is: synchronized( Cout ) Cout( a )( b )( c )(); Sean
Mar 24 2007
Sean Kelly wrote:There really isn't any way to do automatic locking with chained opCall barring the use of proxy objects or something equally nasty. Also, it hurts efficiency to always lock regardless of whether the user is performing IO in multiple threads. The preferred method here is: synchronized( Cout ) Cout( a )( b )( c )();The trouble with that design is people working on subsystems or libraries, which will be combined by others into a working whole. Since it is extra work to add the synchronized statement, odds are pretty good it won't happen. Then, the whole gets erratic multithreading performance. Ideally, things should be inverted so that thread safety is the default behavior, and the extra-efficiency-dammit-I-know-what-I'm-doing is the extra work. One way to solve this problem is to use variadic templates as outlined in http://www.digitalmars.com/d/variadic-function-templates.html Back in the early days of Windows NT, when multithreaded programming was introduced to a mass platform, C compilers typically shipped with two runtime libraries - a single threaded one "for efficiency", and a multithreaded one. Also, to do multithreaded code, one had to predefine _MT or throw a command line switch. Inevitably, this was overlooked, and endless bugs consumed endless time. I made the decision early on to only ship threadsafe libraries, and have _MT always on. I've never regretted it, I'm sure it saved me a lot of tech support time, and avoided the perception that the compiler didn't work with multithreading.
Mar 24 2007
Walter Bright wrote:Sean Kelly wrote:MS does the same now if I remember correctly: all of its libraries are MT by default. I agree with Walter's sentiment that Cout(a)(b) is a design mistake. Fortunately, now we have compile-time variadic functions, which will make it easy to correct the design - Cout(a, b) can be made just as good without having to chase typeinfo's at runtime. AndreiThere really isn't any way to do automatic locking with chained opCall barring the use of proxy objects or something equally nasty. Also, it hurts efficiency to always lock regardless of whether the user is performing IO in multiple threads. The preferred method here is: synchronized( Cout ) Cout( a )( b )( c )();The trouble with that design is people working on subsystems or libraries, which will be combined by others into a working whole. Since it is extra work to add the synchronized statement, odds are pretty good it won't happen. Then, the whole gets erratic multithreading performance. Ideally, things should be inverted so that thread safety is the default behavior, and the extra-efficiency-dammit-I-know-what-I'm-doing is the extra work. One way to solve this problem is to use variadic templates as outlined in http://www.digitalmars.com/d/variadic-function-templates.html Back in the early days of Windows NT, when multithreaded programming was introduced to a mass platform, C compilers typically shipped with two runtime libraries - a single threaded one "for efficiency", and a multithreaded one. Also, to do multithreaded code, one had to predefine _MT or throw a command line switch. Inevitably, this was overlooked, and endless bugs consumed endless time. I made the decision early on to only ship threadsafe libraries, and have _MT always on. I've never regretted it, I'm sure it saved me a lot of tech support time, and avoided the perception that the compiler didn't work with multithreading.
Mar 24 2007
Andrei Alexandrescu (See Website For Email) wrote:Walter Bright wrote:Yup. In fact, I just discovered that Visual Studio 2005 doesn't even provide a single-threaded build option any more. In some ways it's a relief because it's allowed me to drop two build options and remove a bunch of #if defined(_MT) clauses.Back in the early days of Windows NT, when multithreaded programming was introduced to a mass platform, C compilers typically shipped with two runtime libraries - a single threaded one "for efficiency", and a multithreaded one. Also, to do multithreaded code, one had to predefine _MT or throw a command line switch. Inevitably, this was overlooked, and endless bugs consumed endless time. I made the decision early on to only ship threadsafe libraries, and have _MT always on. I've never regretted it, I'm sure it saved me a lot of tech support time, and avoided the perception that the compiler didn't work with multithreading.MS does the same now if I remember correctly: all of its libraries are MT by default.I agree with Walter's sentiment that Cout(a)(b) is a design mistake. Fortunately, now we have compile-time variadic functions, which will make it easy to correct the design - Cout(a, b) can be made just as good without having to chase typeinfo's at runtime.Agreed. Sean
Mar 24 2007
James Dennett wrote:Walter Bright wrote:gc is a crutch for lazy/sloppy/less capable programmers, gc isn't for mission critical industrial apps, gc is for academic unusable languages, etc.But still, there are one or two that repeat the conventional (and wrong) wisdom about garbage collection. Such conventional wisdom is much more common among the general population of C++ programmers.Which "wrong" assertions are those?Note the reliance here on global state that is neither thread nor exception safe: std::ios_base::fmtflags flags_save = std::cout.flags(); std::cout << 123 << '|' << std::left << std::setw(8) << 456 << "|" << 789 << std::endl; std::cout.flags(flags_save);I'm intrigued by your claim that IOStreams is not thread-safe; the IOStreams framework is thread-safe in the same way that the STL is thread-safe. The one minor difference is that IOStreams exposes some global variables, which is unfortunate as they can easily be used in inappropriate ways in a multi-threaded environment.I think there is a tendency to assume that APIs and languages which have (A) been around a long time and (B) been used by millions of people will probably be close to optimal. It just makes sense that that would be the case. Unfortunately, it's all too often just not true.I just find it strange that C++, a language meant for building speedy applications, would incorporate iostreams, which is slow, not thread safe, and not exception safe.Then again, that is unsurprising as C++ does not yet officially incorporate support for multi-threading.That's not an excuse, as 1) multithreading was common long before C++98 was written and 2) multithreading and exception safety was thought about and accounted for in much of the rest of the library design, despite threading not being official.Is there something deeper in IOStreams that you consider to be thread-unsafe, or is it just the matter of its global variables?All I can do is point to the example above.
Mar 24 2007
Walter Bright wrote:James Dennett wrote:I've seen only a minority of those claims made as part of the C++ committee discussions of GC. However: GC *is* often used as a crutch by programmers who cannot or do not want to take time to make a design in which ownership is clear. GC is unsuitable for *some* types of mission critical applications. These are true. It's also true that: Effective use of GC is not restricted to lazy/sloppy/ less capable programmers, and can be used by experts to produce software that is more reliable in certain ways. GC is suitable for some types of mission critical applications. GC can affect performance, either positively or negatively. GC can affect memory footprint. Working with GC can cause resource management issues because many programmers are often tempted to think less carefully about these issues when there is a garbage collector to mitigate some of the damage. Almost all discussions of the pros and cons of GC are simplistic and unbalanced.Walter Bright wrote:gc is a crutch for lazy/sloppy/less capable programmers, gc isn't for mission critical industrial apps, gc is for academic unusable languages, etc.But still, there are one or two that repeat the conventional (and wrong) wisdom about garbage collection. Such conventional wisdom is much more common among the general population of C++ programmers.Which "wrong" assertions are those?True, exception-safety is an issue. There is not a threading issue in the code above *except* that it uses a global variable without synchronization; you explicitly coded reliance on global state, by using a global variable. Unfortunately that's easily done with the IOStreams interface.Note the reliance here on global state that is neither thread nor exception safe: std::ios_base::fmtflags flags_save = std::cout.flags(); std::cout << 123 << '|' << std::left << std::setw(8) << 456 << "|" << 789 << std::endl; std::cout.flags(flags_save);I'm intrigued by your claim that IOStreams is not thread-safe; the IOStreams framework is thread-safe in the same way that the STL is thread-safe. The one minor difference is that IOStreams exposes some global variables, which is unfortunate as they can easily be used in inappropriate ways in a multi-threaded environment.I think there is a tendency to assume that APIs and languages which have (A) been around a long time and (B) been used by millions of people will probably be close to optimal. It just makes sense that that would be the case. Unfortunately, it's all too often just not true.I just find it strange that C++, a language meant for building speedy applications, would incorporate iostreams, which is slow, not thread safe, and not exception safe.I wasn't aiming to make an excuse. I was merely noting that it's not surprising. IOStreams was old before the 1998 standard was published; this was a case of the standards committee doing what it was supposed to do, i.e., standardizing existing practice.Then again, that is unsurprising as C++ does not yet officially incorporate support for multi-threading.That's not an excuse, as 1) multithreading was common long before C++98 was written and 2) multithreading and exception safety was thought about and accounted for in much of the rest of the library design, despite threading not being official.I see exception-safety issues, but no threading issue apart from *if* your code fails to synchronize access to a global variable. So far as I can tell, there are not thread-safety issues unless multiple threads share a stream without synchronization (which is just as much of a defect as if they shared a container without synchronization). Automatic synchronization tends to be at the wrong level, just as in the case of containers etc. Most often in robust code it's redundant to make a stream synchronize itself. Anyway, I was just hoping to find out something I didn't already know. One thing we do know is that IOStreams is not the gold standard for I/O interfaces, though it does have strengths in extensibility and type-safety compared to the alternatives in most C-like languages. -- JamesIs there something deeper in IOStreams that you consider to be thread-unsafe, or is it just the matter of its global variables?All I can do is point to the example above.
Mar 24 2007
James Dennett wrote:Walter Bright wrote:I think we're in agreement, as I said "one or two", and that such claims are not made in general by the top tier of C++ programmers.James Dennett wrote:I've seen only a minority of those claims made as part of the C++ committee discussions of GC.Walter Bright wrote:gc is a crutch for lazy/sloppy/less capable programmers, gc isn't for mission critical industrial apps, gc is for academic unusable languages, etc.But still, there are one or two that repeat the conventional (and wrong) wisdom about garbage collection. Such conventional wisdom is much more common among the general population of C++ programmers.Which "wrong" assertions are those?Almost all discussions of the pros and cons of GC are simplistic and unbalanced.It's not humanly possible to mention every pro and every con in every discussion. Nobody is making a claim of absolutes, either. For every example, sure, you can find a counter-example. That doesn't mean one cannot have a meaningful discussion about the pros and cons of adding gc, and it doesn't mean we can't dismiss certain arguments against gc, like it being a crutch for lazy programmers.That's the design of iostreams - reliance on global state with no multithreading protection. Using std::left is not a mistake on my part, it is a feature of iostreams. Also, cout << a << b; has multithreading problems as well, as if two threads are writing to stdout, the output of a and b can be interleaved with the other thread's output. Note that: writefln(a, b); is both exception safe and thread safe - there will be no interleaving of output.True, exception-safety is an issue. There is not a threading issue in the code above *except* that it uses a global variable without synchronization; you explicitly coded reliance on global state, by using a global variable. Unfortunately that's easily done with the IOStreams interface.I'm intrigued by your claim that IOStreams is not thread-safe; the IOStreams framework is thread-safe in the same way that the STL is thread-safe. The one minor difference is that IOStreams exposes some global variables, which is unfortunate as they can easily be used in inappropriate ways in a multi-threaded environment.Note the reliance here on global state that is neither thread nor exception safe: std::ios_base::fmtflags flags_save = std::cout.flags(); std::cout << 123 << '|' << std::left << std::setw(8) << 456 << "|" << 789 << std::endl; std::cout.flags(flags_save);Iostreams was substantially redesigned for C++98. Iostreams has undergone two major, incompatible overhauls since it originally debuted. You can see the old ones in DMC++'s <iostream.h> and <oldstr/stream.h>.I wasn't aiming to make an excuse. I was merely noting that it's not surprising. IOStreams was old before the 1998 standard was published; this was a case of the standards committee doing what it was supposed to do, i.e., standardizing existing practice.Then again, that is unsurprising as C++ does not yet officially incorporate support for multi-threading.That's not an excuse, as 1) multithreading was common long before C++98 was written and 2) multithreading and exception safety was thought about and accounted for in much of the rest of the library design, despite threading not being official.I see exception-safety issues, but no threading issue apart from *if* your code fails to synchronize access to a global variable. So far as I can tell, there are not thread-safety issues unless multiple threads share a stream without synchronization (which is just as much of a defect as if they shared a container without synchronization).You can use C's stdio and D's stdio (and even mix them) without exception safety problems or need for the user to supply any synchronization.Automatic synchronization tends to be at the wrong level, just as in the case of containers etc. Most often in robust code it's redundant to make a stream synchronize itself. Anyway, I was just hoping to find out something I didn't already know. One thing we do know is that IOStreams is not the gold standard for I/O interfaces, though it does have strengths in extensibility and type-safety compared to the alternatives in most C-like languages.I agree it has strengths in extensibility and type-safety. But I set that against its poor performance, exception unsafety, and threading problems, and conclude it is not a design that should be emulated.
Mar 24 2007
Walter Bright wrote:I admit I used to think similar to that, a somewhat longer while ago. What made me change my mind was that Greenspun's Tenth Rule also includes GC: I find that doing the dynamic memory management myself results not only in bigger and more fragile source code, but also may perform worse than GC unless I go about it very warily. I think it is just not efficient to put a lot of work into that with every application - it's much more efficient if somebody solves the problem *once*, and properly, and that's that. Happy hacking, Frank p.s. Thanks for your work... ;-)Which "wrong" assertions are those?gc is a crutch for lazy/sloppy/less capable programmers, gc isn't for mission critical industrial apps, gc is for academic unusable languages, etc.
Mar 25 2007
Walter Bright wrote:0ffh Wrote:Which "wrong" assertions are those?gc is a crutch for lazy/sloppy/less capable programmers, gc isn't for mission critical industrial apps, gc is for academic unusable languages, etc.I admit I used to think similar to that, a somewhat longer while ago. What made me change my mind was that Greenspun's Tenth Rule also includes GC: I find that doing the dynamic memory management myself results not only in bigger and more fragile source code, but also may perform worse than GC unless I go about it very warily. I think it is just not efficient to put a lot of work into that with every application - it's much more efficient if somebody solves the problem *once*, and properly, and that's that.I totally agree that GC is a solid way of cutting bad code, which performs far worse than the usually trivial overhead of having a GC. I do think though that it should be somewhat easier to declare something as not being under the gc's influence so that when we want to be wary and we're scratching for an extra 10% performance in a loop, we can do so more readily. ~~ At first I was astonished to see my 26kb source compiled to a whopping 82kb. I was wondering if it imported all of phobos... Now I've realized that that extra mass did all the dynamic array stuff, associative array stuff, gc and phobos. Things that would have taken me just as much in source to write...
Mar 26 2007
On Wed, 21 Mar 2007 16:40:15 -0700, Andrei Alexandrescu (See Website For Email) wrote:And exactly how often do people need to write this program? I would have thought that the need to exactly reproduce the input is kind of rare, because most programs read stuff to manipulate or deduce things from it, and not to replicate it.Can you distill the benefits of retaining CR on a readline, please?I am pasting fragments from an email to Walter. He suggested this at a point, and I managed to persuade him to keep the newline in there. Essentially it's about information. The naive loop: while (readln(line)) { write(line); } is guaranteed 100% to produce an accurate copy of its input. The version that chops lines looks like: while (readln(line)) { writeln(line); } This may or may not add a newline to the output, possibly creating a file larger by one byte. This is the kind of imprecision that makes the difference between a well-designed API and an almost-good one. Moreover, with the automated chopping it is basically impossible to write a program that exactly reproduces its input because readln essentially loses information.Also, stdio also offers a readln() that creates a new line on every call. That is useful if you want fresh lines every read: char[] line; while ((line = readln()).length > 0) { ++dictionary[line]; } The code _just works_ because an empty line means _precisely_ and without the shadow of a doubt that the file has ended. (An I/O error throws an exception, and does NOT return an empty line; that is another important point.) An API that uses automated chopping should not offer such a function because an empty line may mean that an empty line was read, or that it's eof time. So the API would force people to write convoluted code.By "convoluted", you mean something like this ... char[] line; while ( io.readln(line) == io.Success ) { ++dictionary[line]; }In the couple of years I've used Perl I've thanked the Perl folks for their readline decision numerous times.And yet my code nearly always looks like ... line = trim_right(readln()); because I then have to parse the data contained in the line and white space (blank, tab and new line) at the end of a line is just usually cruft. On the other hand, as I have to trim the line anyhow, I guess it doesn't matter if the routine ensures a new line or not. Another interesting twist is that some text files omit the new-line on the last line in the file.Ever tried to do cin or fscanf? You can't do any intelligent input with them because they skip whitespace and newlines like it's out of style. All of my C++ applications use getline() or fgets() (both of which thankfully do include the newline) and then process the line in-situ.I conclude that we tend to write different types of apps. -- Derek (skype: derek.j.parnell) Melbourne, Australia "Justice for David Hicks!" 22/03/2007 10:55:43 AM
Mar 21 2007
Derek Parnell wrote:On Wed, 21 Mar 2007 16:40:15 -0700, Andrei Alexandrescu (See Website For Email) wrote:Of course. It's not about reproducing the input exactly, but about having all of the information in the input available to the program.And exactly how often do people need to write this program? I would have thought that the need to exactly reproduce the input is kind of rare, because most programs read stuff to manipulate or deduce things from it, and not to replicate it.Can you distill the benefits of retaining CR on a readline, please?I am pasting fragments from an email to Walter. He suggested this at a point, and I managed to persuade him to keep the newline in there. Essentially it's about information. The naive loop: while (readln(line)) { write(line); } is guaranteed 100% to produce an accurate copy of its input. The version that chops lines looks like: while (readln(line)) { writeln(line); } This may or may not add a newline to the output, possibly creating a file larger by one byte. This is the kind of imprecision that makes the difference between a well-designed API and an almost-good one. Moreover, with the automated chopping it is basically impossible to write a program that exactly reproduces its input because readln essentially loses information.I said that the API would force people to write convoluted code if it wanted to offer char[] readln(). Consequently, your code is buggy in the likely case io.readln overwrites its buffer, which is mute testimony to the validity of my point :o).Also, stdio also offers a readln() that creates a new line on every call. That is useful if you want fresh lines every read: char[] line; while ((line = readln()).length > 0) { ++dictionary[line]; } The code _just works_ because an empty line means _precisely_ and without the shadow of a doubt that the file has ended. (An I/O error throws an exception, and does NOT return an empty line; that is another important point.) An API that uses automated chopping should not offer such a function because an empty line may mean that an empty line was read, or that it's eof time. So the API would force people to write convoluted code.By "convoluted", you mean something like this ... char[] line; while ( io.readln(line) == io.Success ) { ++dictionary[line]; }I often do that too. And I'm glad I can remove information I don't need, because clearly I couldn't add back information I've lost. It should be pointed out that my point generalizes to more than newlines. I plan to add to phobos two routines that efficiently and atomically implement the following: read_delim(FILE*, char[] buf, dchar delim); and read_delim(FILE*, char[] buf, char delim[]); For such functions, particularly the last one, it is vital that the delimiter is KEPT in the resulting buffer. AndreiIn the couple of years I've used Perl I've thanked the Perl folks for their readline decision numerous times.And yet my code nearly always looks like ... line = trim_right(readln());
Mar 21 2007
On Wed, 21 Mar 2007 17:21:40 -0700, Andrei Alexandrescu (See Website For Email) wrote:Derek Parnell wrote:Actually you said "stdio also offers a readln() that creates a new line on every call" and so does my fictious "io.readln(line)". It can not overwrite its buffer because it creates the buffer. io.Status readln(out char[] pBuffer) { pBuffer.length = io.FirstGuessLength; // Note: This routine expand/contracts the buffer as required. fill_the_buffer_with_chars_until_EOL_or_EOF(pBuffer); // If I get this far then the low-level I/O system didn't fail me. return io.Success; }I said that the API would force people to write convoluted code if it wanted to offer char[] readln(). Consequently, your code is buggy in the likely case io.readln overwrites its buffer, which is mute testimony to the validity of my point :o).Also, stdio also offers a readln() that creates a new line on every call. That is useful if you want fresh lines every read: char[] line; while ((line = readln()).length > 0) { ++dictionary[line]; } The code _just works_ because an empty line means _precisely_ and without the shadow of a doubt that the file has ended. (An I/O error throws an exception, and does NOT return an empty line; that is another important point.) An API that uses automated chopping should not offer such a function because an empty line may mean that an empty line was read, or that it's eof time. So the API would force people to write convoluted code.By "convoluted", you mean something like this ... char[] line; while ( io.readln(line) == io.Success ) { ++dictionary[line]; }It should be pointed out that my point generalizes to more than newlines. I plan to add to phobos two routines that efficiently and atomically implement the following: read_delim(FILE*, char[] buf, dchar delim); and read_delim(FILE*, char[] buf, char delim[]); For such functions, particularly the last one, it is vital that the delimiter is KEPT in the resulting buffer.And that would be because it stops at the leftmost 'delim' that is contained in "char[] delim" so the caller needs to know which one stopped the input stream? I presume that this would support Unicode characters too? -- Derek (skype: derek.j.parnell) Melbourne, Australia "Justice for David Hicks!" 22/03/2007 11:26:34 AM
Mar 21 2007
Derek Parnell wrote:Actually you said "stdio also offers a readln() that creates a new line on every call" and so does my fictious "io.readln(line)". It can not overwrite its buffer because it creates the buffer. io.Status readln(out char[] pBuffer) { pBuffer.length = io.FirstGuessLength; // Note: This routine expand/contracts the buffer as required. fill_the_buffer_with_chars_until_EOL_or_EOF(pBuffer); // If I get this far then the low-level I/O system didn't fail me. return io.Success; }Fine. It's just not clear what readln does from its signature. In contrast, stdio offers size_t readln(char[]) and char[] readln(), with clear semantics.It's the other way around: you read til the _last_ character of the delimiter, and you look back in the buffer. If the buffer has the delimiter as suffix, you're done. Otherwise, repeat (while appending to the buffer). This should work with Unicode streams too, although I'm not an expert in the matter. My point is that at end-of-file you may want to know whether the delimiter was correctly present, as is required in certain protocols. AndreiIt should be pointed out that my point generalizes to more than newlines. I plan to add to phobos two routines that efficiently and atomically implement the following: read_delim(FILE*, char[] buf, dchar delim); and read_delim(FILE*, char[] buf, char delim[]); For such functions, particularly the last one, it is vital that the delimiter is KEPT in the resulting buffer.And that would be because it stops at the leftmost 'delim' that is contained in "char[] delim" so the caller needs to know which one stopped the input stream? I presume that this would support Unicode characters too?
Mar 21 2007
On Wed, 21 Mar 2007 17:57:51 -0700, Andrei Alexandrescu (See Website For Email) wrote:Derek Parnell wrote:Actually you said "stdio also offers a readln() that creates a new line on every call" and so does my fictious "io.readln(line)". It can not overwrite its buffer because it creates the buffer. io.Status readln(out char[] pBuffer) { pBuffer.length = io.FirstGuessLength; // Note: This routine expand/contracts the buffer as required. fill_the_buffer_with_chars_until_EOL_or_EOF(pBuffer); // If I get this far then the low-level I/O system didn't fail me. return io.Success; }Fine. It's just not clear what readln does from its signature. In contrast, stdio offers size_t readln(char[]) and char[] readln(), with clear semantics.read_delim(FILE*, char[] buf, char delim[]);It's the other way around:Right ... it was the "from its signature ... with clear semantics" that had me fooled.My point is that at end-of-file you may want to know whether the delimiter was correctly present, as is required in certain protocols.Yes. A very good point indeed. -- Derek (skype: derek.j.parnell) Melbourne, Australia "Justice for David Hicks!" 22/03/2007 12:07:34 PM
Mar 21 2007
Andrei Alexandrescu (See Website For Email) wrote: [...]I suspect Walter was thinking on something else at the time.Can you distill the benefits of retaining CR on a readline, please?I am pasting fragments from an email to Walter. He suggested this at a point, and I managed to persuade him to keep the newline in there.Essentially it's about information. The naive loop: while (readln(line)) { write(line); }I'm completely against that awful mess of code.is guaranteed 100% to produce an accurate copy of its input. The version that chops lines looks like: while (readln(line)) { writeln(line); } This may or may not add a newline to the output, possibly creating a file larger by one byte.Are you sure? Can you elaborate more on this?This is the kind of imprecision that makes the difference between a well-designed API and an almost-good one. Moreover, with the automated chopping it is basically impossible to write a program that exactly reproduces its input because readln essentially loses information.Same here.Also, stdio also offers a readln() that creates a new line on every call. That is useful if you want fresh lines every read: char[] line; while ((line = readln()).length > 0) { ++dictionary[line]; }This way you'll get two different dictionaries on Windows and on Unix. Wrong, very wrong.The code _just works_ because an empty line means _precisely_ and without the shadow of a doubt that the file has ended. (An I/O error throws an exception, and does NOT return an empty line; that is another important point.) An API that uses automated chopping should not offer such a function because an empty line may mean that an empty line was read, or that it's eof time. So the API would force people to write convoluted code.What is your definition of "convolute"? I find your code 'convolute', 'unclear', 'buggy' and 'unportable'.In the couple of years I've used Perl I've thanked the Perl folks for their readline decision numerous times.Per is something the world should get rid of, quickly. Per is wrong, Perl is evil, Perl is useless. You don't need Perl, try to cease using it. The fact that this narrow-minded idea comes from Perl is not surprising.Ever tried to do cin or fscanf? You can't do any intelligent input with them because they skip whitespace and newlines like it's out of style.I use them, and I find them very comfortable. Again your definition of 'intelligent' is particular. If you find Perl 'intelligent', this say a lot.All of my C++ applications use getline() or fgets() (both of which thankfully do include the newline) and then process the line in-situ.You obviously program only for one single platform. Being portable is way more complex than this. Ciao
Mar 22 2007
Roberto Mariottini wrote:Andrei Alexandrescu (See Website For Email) wrote: [...] >What exactly would be bad about it?I suspect Walter was thinking on something else at the time.Can you distill the benefits of retaining CR on a readline, please?I am pasting fragments from an email to Walter. He suggested this at a point, and I managed to persuade him to keep the newline in there.Essentially it's about information. The naive loop: while (readln(line)) { write(line); }I'm completely against that awful mess of code.Very simple. If the file ends with a newline, the code reproduces it. If not, the code gratuitously appends a newline.is guaranteed 100% to produce an accurate copy of its input. The version that chops lines looks like: while (readln(line)) { writeln(line); } This may or may not add a newline to the output, possibly creating a file larger by one byte.Are you sure? Can you elaborate more on this?> This is the kind of imprecision that makes theYes, wrong, very wrong. Except it's not me who's wrong :o).difference between a well-designed API and an almost-good one. Moreover, with the automated chopping it is basically impossible to write a program that exactly reproduces its input because readln essentially loses information.Same here.Also, stdio also offers a readln() that creates a new line on every call. That is useful if you want fresh lines every read: char[] line; while ((line = readln()).length > 0) { ++dictionary[line]; }This way you'll get two different dictionaries on Windows and on Unix. Wrong, very wrong.You are objectively wrong. The code is portable. Newline translation takes care of it. Just try it.The code _just works_ because an empty line means _precisely_ and without the shadow of a doubt that the file has ended. (An I/O error throws an exception, and does NOT return an empty line; that is another important point.) An API that uses automated chopping should not offer such a function because an empty line may mean that an empty line was read, or that it's eof time. So the API would force people to write convoluted code.What is your definition of "convolute"? I find your code 'convolute', 'unclear', 'buggy' and 'unportable'.What can I say? Thanks! I'm enlightened!In the couple of years I've used Perl I've thanked the Perl folks for their readline decision numerous times.Per is something the world should get rid of, quickly. Per is wrong, Perl is evil, Perl is useless. You don't need Perl, try to cease using it. The fact that this narrow-minded idea comes from Perl is not surprising.To each their own :o). Oh, probably you could explain how I can read a string containing spaces, followed by ":" and a number with scanf. Takes one line in Perl and D's readfln (not yet distributed).Ever tried to do cin or fscanf? You can't do any intelligent input with them because they skip whitespace and newlines like it's out of style.I use them, and I find them very comfortable. Again your definition of 'intelligent' is particular. If you find Perl 'intelligent', this say a lot.Yep, I saw that :o). AndreiAll of my C++ applications use getline() or fgets() (both of which thankfully do include the newline) and then process the line in-situ.You obviously program only for one single platform. Being portable is way more complex than this.
Mar 22 2007
Andrei Alexandrescu (See Website For Email) wrote:Roberto Mariottini wrote:[...]It's not clearly evident for a non-expert programmer that a new-line is appended at each line. Take any programmer from any language of your choice and ask what this snippets is supposed to do. This is against immediate comprehension of code.What exactly would be bad about it?Essentially it's about information. The naive loop: while (readln(line)) { write(line); }I'm completely against that awful mess of code.A newline is two bytes here.Very simple. If the file ends with a newline, the code reproduces it. If not, the code gratuitously appends a newline.is guaranteed 100% to produce an accurate copy of its input. The version that chops lines looks like: while (readln(line)) { writeln(line); } This may or may not add a newline to the output, possibly creating a file larger by one byte.Are you sure? Can you elaborate more on this?A text file is not a binary file. A newline at end of file is completely irrelevant. On the other side, no code should break if the last newline is there or not. The problem with your code is that the last line comes different from the others.Moreover, with the automated chopping it is basically impossible to write a program that exactly reproduces its input because readln essentially loses information.Ehm, can you elaborate how good is to put a '\n' at the end of any string when working with: - databases - communication programs - interprocess communication - distributed computingYes, wrong, very wrong. Except it's not me who's wrong :o).Also, stdio also offers a readln() that creates a new line on every call. That is useful if you want fresh lines every read: char[] line; while ((line = readln()).length > 0) { ++dictionary[line]; }This way you'll get two different dictionaries on Windows and on Unix. Wrong, very wrong.Say 'subjectively'. Assignments in boolean expressions should be avoided. The average programmer knows something about this magic, but fears to touch it, and never completely understand it. Still, any programmer from any language would think that this code ends at the first empty line. Here is one of the many possible non-convoluted versions: char[] line = readln(); while (line.length > 0) { ++dictionary[chomp(line)]; line = readln(); } And this is how it should be: char[] line = readln(); while (line != null) { ++dictionary[line]; line = readln(); }You are objectively wrong.The code _just works_ because an empty line means _precisely_ and without the shadow of a doubt that the file has ended. (An I/O error throws an exception, and does NOT return an empty line; that is another important point.) An API that uses automated chopping should not offer such a function because an empty line may mean that an empty line was read, or that it's eof time. So the API would force people to write convoluted code.What is your definition of "convolute"? I find your code 'convolute', 'unclear', 'buggy' and 'unportable'.The code is portable. Newline translation takes care of it. Just try it.Newline translation is an old problem with C, C++ and now with D. Nothing can be resolved with newline translation. Opening a file in binary mode on Unix and treating it like a text file works only as long as the program is run on Unix. Newline translation is prone to portability errors, thus non-portable. In my experience, newline translations pose more portability problems than it solves.You'll be more enlightened if you had to work with big CGI scripts written in Perl, and eventually had to convert them to JSP to make the average (available) programmers able to work on them. Sure, with Perl you can do many things in less than 10 lines. But keep it less than 10 lines, or you are in troubles.What can I say? Thanks! I'm enlightened!In the couple of years I've used Perl I've thanked the Perl folks for their readline decision numerous times.Per is something the world should get rid of, quickly. Per is wrong, Perl is evil, Perl is useless. You don't need Perl, try to cease using it. The fact that this narrow-minded idea comes from Perl is not surprising.scanf(" :%d", &i); CiaoTo each their own :o). Oh, probably you could explain how I can read a string containing spaces, followed by ":" and a number with scanf. Takes one line in Perl and D's readfln (not yet distributed).Ever tried to do cin or fscanf? You can't do any intelligent input with them because they skip whitespace and newlines like it's out of style.I use them, and I find them very comfortable. Again your definition of 'intelligent' is particular. If you find Perl 'intelligent', this say a lot.
Mar 23 2007
On Thu, 22 Mar 2007 01:40:15 +0200, Andrei Alexandrescu (See Website For Email) <SeeWebsiteForEmail erdani.org> wrote:Essentially it's about information. The naive loop: while (readln(line)) { write(line); } is guaranteed 100% to produce an accurate copy of its input. The version that chops lines looks like: while (readln(line)) { writeln(line); }I'd just like to say that the chosen naming convention seems a bit unintuitive to me out of the following reasons: 1) it seems odd that what you read with readln(), you need to write with write() and not writeln(). 2) Pascal/Delphi/etc. have the ReadLn and WriteLn functions, but Pascal's ReadLn doesn't preserve line endings. 3) in my personal experience (of a number of smaller and larger console applications), it's much more often that I need to work with the contents of lines (without line endings), rather than with. If you need to copy data while preserving line endings, I would recommend using binary buffers for files - and I've no idea why would you use standard input/output for binary data anyway. 4) it's much easier to add a line ending than to remove it. Based on the above reasons, I would like to suggest to let readln() chop line endings, and perhaps have another function (getline?) which keeps them. -- Best regards, Vladimir mailto:thecybershadow gmail.com
Mar 22 2007
Vladimir Panteleev wrote:On Thu, 22 Mar 2007 01:40:15 +0200, Andrei Alexandrescu (See Website For Email) <SeeWebsiteForEmail erdani.org> wrote:I suppose it is a little, but I think that's more an issue with text IO in general; for instance, even *if* readln discarded the line ending, readln and writeln wouldn't be symmetric anyway! If you expect them to be, then you're in for a nasty surprise :PEssentially it's about information. The naive loop: while (readln(line)) { write(line); } is guaranteed 100% to produce an accurate copy of its input. The version that chops lines looks like: while (readln(line)) { writeln(line); }I'd just like to say that the chosen naming convention seems a bit unintuitive to me out of the following reasons: 1) it seems odd that what you read with readln(), you need to write with write() and not writeln().2) Pascal/Delphi/etc. have the ReadLn and WriteLn functions, but Pascal's ReadLn doesn't preserve line endings.Well, that's Pascal/Delphi/etc., not D.3) in my personal experience (of a number of smaller and larger console applications), it's much more often that I need to work with the contents of lines (without line endings), rather than with. If you need to copy data while preserving line endings, I would recommend using binary buffers for files - and I've no idea why would you use standard input/output for binary data anyway.That's a valid point; I rarely need the line endings, that said, see [1] :)4) it's much easier to add a line ending than to remove it.Actually, it's not. Removing a line ending is as simple as slicing the string. *Adding* a line ending could involve a heap allocation, at least a full copy. What's more, how can you be sure there was a line-ending there at all? What if it's the last line and it didn't have a line ending before EOF?Based on the above reasons, I would like to suggest to let readln() chop line endings, and perhaps have another function (getline?) which keeps them.[1] There have been a few times I've needed the line-ending, and it's a major pain when your IO library simply refuses to give it to you. It should be that the call gives you the whole line *including* line-endings, but since stripping the line of its ending is so common there should be either another function to do that, or a nice shortcut to get it done. Maybe we need readln and readlt for "read line and trim"... </2c> -- Daniel -- int getRandomNumber() { return 4; // chosen by fair dice roll. // guaranteed to be random. } http://xkcd.com/ v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
Mar 22 2007
On Thu, 22 Mar 2007 11:03:12 +0200, Daniel Keep <daniel.keep.lists gmail.com> wrote:I was actually talking about the complexity of the source, not the efficiency of the generated code. When readln gives you the line with a line ending, you have three cases: 1) a CR/LF line ending (Windows) 2) LF line ending (Unix) 3) no line ending at all (EOF) You'd need to account for every of these when removing the line endings - and write this code every time you're writing an app which just needs the contents of lines from standard input - which, as you have agreed, is quite common.4) it's much easier to add a line ending than to remove it.Actually, it's not. Removing a line ending is as simple as slicing the string. *Adding* a line ending could involve a heap allocation, at least a full copy.What's more, how can you be sure there was a line-ending there at all? What if it's the last line and it didn't have a line ending before EOF?IMHO, most tools which work with standard input don't really need to know if the last line has a line break at the end :) -- Best regards, Vladimir mailto:thecybershadow gmail.com
Mar 22 2007
Vladimir Panteleev wrote:On Thu, 22 Mar 2007 11:03:12 +0200, Daniel Keep <daniel.keep.lists gmail.com> wrote:import std.string; auto line = readln().chomp(); :)I was actually talking about the complexity of the source, not the efficiency of the generated code. When readln gives you the line with a line ending, you have three cases: 1) a CR/LF line ending (Windows) 2) LF line ending (Unix) 3) no line ending at all (EOF) You'd need to account for every of these when removing the line endings - and write this code every time you're writing an app which just needs the contents of lines from standard input - which, as you have agreed, is quite common.4) it's much easier to add a line ending than to remove it.Actually, it's not. Removing a line ending is as simple as slicing the string. *Adding* a line ending could involve a heap allocation, at least a full copy.-- int getRandomNumber() { return 4; // chosen by fair dice roll. // guaranteed to be random. } http://xkcd.com/ v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/What's more, how can you be sure there was a line-ending there at all? What if it's the last line and it didn't have a line ending before EOF?IMHO, most tools which work with standard input don't really need to know if the last line has a line break at the end :)
Mar 22 2007
Vladimir Panteleev wrote:When readln gives you the line with a line ending, you have three cases: 1) a CR/LF line ending (Windows) 2) LF line ending (Unix) 3) no line ending at all (EOF)Actually it is even four: 4) CR line ending (Mac) But that's just for files coming from the old Mac OS (9), normally Mac OS X uses Unix linefeeds for line endings... --anders
Mar 22 2007
Anders F Bj=C3=B6rklund wrote:Vladimir Panteleev wrote: =20s:When readln gives you the line with a line ending, you have three case=I have some of these also. Legacy applications are not the most, but=20 they work, and for me that's it. Ciao1) a CR/LF line ending (Windows) 2) LF line ending (Unix) 3) no line ending at all (EOF)=20 Actually it is even four: 4) CR line ending (Mac) =20 But that's just for files coming from the old Mac OS (9), normally Mac OS X uses Unix linefeeds for line endings...
Mar 22 2007
Vladimir Panteleev wrote:On Thu, 22 Mar 2007 01:40:15 +0200, Andrei Alexandrescu (See Website For Email) <SeeWebsiteForEmail erdani.org> wrote:"Read a line. Write what you've read. Rinse. Lather. Repeat."Essentially it's about information. The naive loop: while (readln(line)) { write(line); } is guaranteed 100% to produce an accurate copy of its input. The version that chops lines looks like: while (readln(line)) { writeln(line); }I'd just like to say that the chosen naming convention seems a bit unintuitive to me out of the following reasons: 1) it seems odd that what you read with readln(), you need to write with write() and not writeln().2) Pascal/Delphi/etc. have the ReadLn and WriteLn functions, but Pascal's ReadLn doesn't preserve line endings.That's a mistake, simple as that. Pascal has made many other similar mistakes, see http://www.lysator.liu.se/c/bwk-on-pascal.html.3) in my personal experience (of a number of smaller and larger console applications), it's much more often that I need to work with the contents of lines (without line endings), rather than with. If you need to copy data while preserving line endings, I would recommend using binary buffers for files - and I've no idea why would you use standard input/output for binary data anyway.I understand that. But again, getting rid of information when you have it is a much better proposition than regaining information when you irremediably lost. Think that a file is produced by a utility or transmission that sends messages separated by a single-char or multi-char separator. If your reading primitive omits the separator, you don't know whether the last line is a fragment of a broken transmission or a valid line. "Just call chomp."4) it's much easier to add a line ending than to remove it.It's been already said: it's cheaper to remove it in all circumstances.Based on the above reasons, I would like to suggest to let readln() chop line endings, and perhaps have another function (getline?) which keeps them.For more balkanization, cognitive load, and confusion? "Just call chomp." Andrei
Mar 22 2007
On Thu, 22 Mar 2007 18:14:14 +0200, Andrei Alexandrescu (See Website For Email) <SeeWebsiteForEmail erdani.org> wrote:Vladimir Panteleev wrote:<offtopic> That article has been written a quarter of a century ago, and doesn't really represent the state of the latest Pascal versions/implementations out there (the most prominent being Borland Delphi and FreePascal). That said, switching from Pascal to D is still quite a great experience for me nevertheless. </offtopic>2) Pascal/Delphi/etc. have the ReadLn and WriteLn functions, but Pascal's ReadLn doesn't preserve line endings.That's a mistake, simple as that. Pascal has made many other similar mistakes, see http://www.lysator.liu.se/c/bwk-on-pascal.html."Just call chomp."Ah, yes, missed that one. <nitpick> But even so, you'd have to check for line endings twice - when reading the stdin stream, and when calling chomp ;) </nitpick> -- Best regards, Vladimir mailto:thecybershadow gmail.com
Mar 22 2007
Andrei Alexandrescu (See Website For Email) wrote: [...]"Just call chomp."Just add a call to chomp to your benchmarks. Ciao
Mar 23 2007
kris wrote:c) on Linux, tango.io uses the c-lib posix.read/write functions. Is that what phobos uses also? (on Win32, Tango uses direct Win32 calls instead)Here's the new std.stdio work in progress (doesn't yet include write()). Free free to leverage it as you see fit for Tango. Some features of note: 1) It peeks under the hood of C's stdio implementation, meaning it's customized for Digital Mars' stdio, and gcc's stdio. 2) It throws on I/O errors. 3) Unlike C's stdio, it can handle streams of either wide or regular chars. 4) It does not go as far as directly using Posix read/write functions or Windows API functions. We wished to avoid that in the interests of interoperability with C's stdio. 5) It is fully interoperable with, and is synced with, C's stdio. 6) Note how nicely scope(exit) makes the code more readable! ---------------------------------------- // Written in the D programming language. /* Written by Walter Bright and Andrei Alexandrescu * www.digitalmars.com * Placed in the Public Domain. */ /******************************** * Standard I/O functions that extend $(B std.c.stdio). * $(B std.c.stdio) is automatically imported when importing * $(B std.stdio). * Macros: * WIKI=Phobos/StdStdio */ module std.stdio; public import std.c.stdio; import std.format; import std.utf; import std.string; import std.gc; import std.c.stdlib; import std.c.string; import std.c.stddef; version (DigitalMars) { version (Windows) { // Specific to the way Digital Mars C does stdio version = DIGITAL_MARS_STDIO; } } version (DIGITAL_MARS_STDIO) { } else { // Specific to the way Gnu C does stdio version = GCC_IO; import std.c.linux.linux; } version (DIGITAL_MARS_STDIO) { extern (C) { /* ** * Digital Mars under-the-hood C I/O functions */ int _fputc_nlock(int, FILE*); int _fputwc_nlock(int, FILE*); int _fgetc_nlock(FILE*); int _fgetwc_nlock(FILE*); int __fp_lock(FILE*); void __fp_unlock(FILE*); } alias _fputc_nlock FPUTC; alias _fputwc_nlock FPUTWC; alias _fgetc_nlock FGETC; alias _fgetwc_nlock FGETWC; alias __fp_lock FLOCK; alias __fp_unlock FUNLOCK; } else version (GCC_IO) { /* ** * Gnu under-the-hood C I/O functions; see * http://www.gnu.org/software/libc/manual/html_node/I_002fO-on-Streams.html#I_002fO-on-Streams */ extern (C) { int fputc_unlocked(int, FILE*); int fputwc_unlocked(wchar_t, FILE*); int fgetc_unlocked(FILE*); int fgetwc_unlocked(FILE*); void flockfile(FILE*); void funlockfile(FILE*); ssize_t getline(char**, size_t*, FILE*); ssize_t getdelim (char**, size_t*, int, FILE*); } alias fputc_unlocked FPUTC; alias fputwc_unlocked FPUTWC; alias fgetc_unlocked FGETC; alias fgetwc_unlocked FGETWC; alias flockfile FLOCK; alias funlockfile FUNLOCK; } else { static assert(0, "unsupported C I/O system"); } /********************* * Thrown if I/O errors happen. */ class StdioException : Exception { uint errno; // operating system error code this(char[] msg) { super(msg); } this(uint errno) { char* s = strerror(errno); super(std.string.toString(s).dup); } static void opCall(char[] msg) { throw new StdioException(msg); } static void opCall() { throw new StdioException(getErrno()); } } private void writefx(FILE* fp, TypeInfo[] arguments, void* argptr, int newline=false) { int orientation; orientation = fwide(fp, 0); /* Do the file stream locking at the outermost level * rather than character by character. */ FLOCK(fp); scope(exit) FUNLOCK(fp); if (orientation <= 0) // byte orientation or no orientation { void putc(dchar c) { if (c <= 0x7F) { FPUTC(c, fp); } else { char[4] buf; char[] b; b = std.utf.toUTF8(buf, c); for (size_t i = 0; i < b.length; i++) FPUTC(b[i], fp); } } std.format.doFormat(&putc, arguments, argptr); if (newline) FPUTC('\n', fp); } else if (orientation > 0) // wide orientation { version (Windows) { void putcw(dchar c) { assert(isValidDchar(c)); if (c <= 0xFFFF) { FPUTWC(c, fp); } else { wchar[2] buf; buf[0] = cast(wchar) ((((c - 0x10000) >> 10) & 0x3FF) + 0xD800); buf[1] = cast(wchar) (((c - 0x10000) & 0x3FF) + 0xDC00); FPUTWC(buf[0], fp); FPUTWC(buf[1], fp); } } } else version (linux) { void putcw(dchar c) { FPUTWC(c, fp); } } else { static assert(0); } std.format.doFormat(&putcw, arguments, argptr); if (newline) FPUTWC('\n', fp); } } /*********************************** * Arguments are formatted per the * $(LINK2 std_format.html#format-string, format strings) * and written to $(B stdout). */ void writef(...) { writefx(stdout, _arguments, _argptr, 0); } /*********************************** * Same as $(B writef), but a newline is appended * to the output. */ void writefln(...) { writefx(stdout, _arguments, _argptr, 1); } /*********************************** * Same as $(B writef), but output is sent to the * stream fp instead of $(B stdout). */ void fwritef(FILE* fp, ...) { writefx(fp, _arguments, _argptr, 0); } /*********************************** * Same as $(B writefln), but output is sent to the * stream fp instead of $(B stdout). */ void fwritefln(FILE* fp, ...) { writefx(fp, _arguments, _argptr, 1); } /********************************** * Read line from stream fp. * Returns: * null for end of file, * char[] for line read from fp, including terminating '\n' * Params: * fp = input stream * Throws: * $(B StdioException) on error * Example: * Reads $(B stdin) and writes it to $(B stdout). --- import std.stdio; int main() { char[] buf; while ((buf = readln()) != null) writef("%s", buf); return 0; } --- */ char[] readln(FILE* fp = stdin) { char[] buf; readln(fp, buf); return buf; } /********************************** * Read line from stream fp and write it to buf[], * including terminating '\n'. * * This is often faster than readln(FILE*) because the buffer * is reused each call. Note that reusing the buffer means that * the previous contents of it need to be copied if needed. * Params: * fp = input stream * buf = buffer used to store the resulting line data. buf * is resized as necessary. * Returns: * 0 for end of file, otherwise * number of characters read * Throws: * $(B StdioException) on error * Example: * Reads $(B stdin) and writes it to $(B stdout). --- import std.stdio; int main() { char[] buf; while (readln(stdin, buf)) writef("%s", buf); return 0; } --- */ size_t readln(FILE* fp, inout char[] buf) { version (DIGITAL_MARS_STDIO) { FLOCK(fp); scope(exit) FUNLOCK(fp); if (__fhnd_info[fp._file] & FHND_WCHAR) { /* Stream is in wide characters. * Read them and convert to chars. */ static assert(wchar_t.sizeof == 2); buf.length = 0; int c2; for (int c; (c = FGETWC(fp)) != -1; ) { if ((c & ~0x7F) == 0) { buf ~= c; if (c == '\n') break; } else { if (c >= 0xD800 && c <= 0xDBFF) { if ((c2 == FGETWC(fp)) != -1 || c2 < 0xDC00 && c2 > 0xDFFF) { StdioException("unpaired UTF-16 surrogate"); } c = ((c - 0xD7C0) << 10) + (c2 - 0xDC00); } std.utf.encode(buf, c); } } if (ferror(fp)) StdioException(); return buf.length; } auto sz = std.gc.capacity(buf.ptr); //auto sz = buf.length; buf = buf.ptr[0 .. sz]; if (fp._flag & _IONBF) { /* Use this for unbuffered I/O, when running * across buffer boundaries, or for any but the common * cases. */ L1: char *p; if (sz) { p = buf.ptr; } else { sz = 64; p = cast(char*) std.gc.malloc(sz); std.gc.hasNoPointers(p); buf = p[0 .. sz]; } size_t i = 0; for (int c; (c = FGETC(fp)) != -1; ) { if ((p[i] = c) != '\n') { i++; if (i < sz) continue; buf = p[0 .. i] ~ readln(fp); return buf.length; } else { buf = p[0 .. i + 1]; return i + 1; } } if (ferror(fp)) StdioException(); buf = p[0 .. i]; return i; } else { int u = fp._cnt; char* p = fp._ptr; int i; if (fp._flag & _IOTRAN) { /* Translated mode ignores \r and treats ^Z as end-of-file */ char c; while (1) { if (i == u) // if end of buffer goto L1; // give up c = p[i]; i++; if (c != '\r') { if (c == '\n') break; if (c != 0x1A) continue; goto L1; } else { if (i != u && p[i] == '\n') break; goto L1; } } if (i > sz) { buf = cast(char[])std.gc.malloc(i); std.gc.hasNoPointers(buf.ptr); } if (i - 1) memcpy(buf.ptr, p, i - 1); buf[i - 1] = '\n'; if (c == '\r') i++; } else { while (1) { if (i == u) // if end of buffer goto L1; // give up auto c = p[i]; i++; if (c == '\n') break; } if (i > sz) { buf = cast(char[])std.gc.malloc(i); std.gc.hasNoPointers(buf.ptr); } memcpy(buf.ptr, p, i); } fp._cnt -= i; fp._ptr += i; buf = buf[0 .. i]; return i; } } else version (GCC_IO) { if (fwide(fp, 0) > 0) { /* Stream is in wide characters. * Read them and convert to chars. */ FLOCK(fp); scope(exit) FUNLOCK(fp); version (Windows) { buf.length = 0; int c2; for (int c; (c = FGETWC(fp)) != -1; ) { if ((c & ~0x7F) == 0) { buf ~= c; if (c == '\n') break; } else { if (c >= 0xD800 && c <= 0xDBFF) { if ((c2 == FGETWC(fp)) != -1 || c2 < 0xDC00 && c2 > 0xDFFF) { StdioException("unpaired UTF-16 surrogate"); } c = ((c - 0xD7C0) << 10) + (c2 - 0xDC00); } std.utf.encode(buf, c); } } if (ferror(fp)) StdioException(); return buf.length; } else version (linux) { buf.length = 0; for (int c; (c = FGETWC(fp)) != -1; ) { if ((c & ~0x7F) == 0) buf ~= c; else std.utf.encode(buf, cast(dchar)c); if (c == '\n') break; } if (ferror(fp)) StdioException(); return buf.length; } else { static assert(0); } } char *lineptr = null; size_t n = 0; auto s = getdelim(&lineptr, &n, '\n', fp); if (s < 0) { if (ferror(fp)) StdioException(); buf.length = 0; // end of file return 0; } scope(exit) free(lineptr); buf = buf.ptr[0 .. std.gc.capacity(buf.ptr)]; if (s <= buf.length) { buf.length = s; buf[] = lineptr[0 .. s]; } else { buf = lineptr[0 .. s].dup; } return s; } else { static assert(0); } } /** ditto */ size_t readln(inout char[] buf) { return readln(stdin, buf); }
Mar 21 2007
Walter Bright wrote:kris wrote:snip]c) on Linux, tango.io uses the c-lib posix.read/write functions. Is that what phobos uses also? (on Win32, Tango uses direct Win32 calls instead)Here's the new std.stdio work in progress (doesn't yet include write()). Free free to leverage it as you see fit for Tango. Some features of note: 1) It peeks under the hood of C's stdio implementation, meaning it's customized for Digital Mars' stdio, and gcc's stdio. 2) It throws on I/O errors. 3) Unlike C's stdio, it can handle streams of either wide or regular chars. 4) It does not go as far as directly using Posix read/write functions or Windows API functions. We wished to avoid that in the interests of interoperability with C's stdio. 5) It is fully interoperable with, and is synced with, C's stdio. 6) Note how nicely scope(exit) makes the code more readable!private void writefx(FILE* fp, TypeInfo[] arguments, void* argptr, int newline=false)[snip] Oh, I meant to say that a while ago: some experiments I've done show that doing formatting with templates and direct calls is significantly faster than the way writefx does it (with a delegate). Probably that should be changed. writefln is still very slow. Changing the loop to: void main() { char[] line; while (readln(line)) { writef("%s", line); } } yields: 17.8s dcat Fortunately, up-and-coming template features will allow the library to detect statically known format strings and parse them to render the most efficient writing method. And reading, too. I already have a prototype readfln function that statically figures out the correctness of its format string. Andrei
Mar 21 2007
Walter Bright wrote:/********************************** * Read line from stream fp and write it to buf[], * including terminating '\n'.Nooo! Please get rid of such a awful Perl-ish hack! Ciao
Mar 22 2007
Roberto Mariottini wrote:Walter Bright wrote:Please justify your statements instead of using emotion, rhetoric, and implied assumptions. Andrei/********************************** * Read line from stream fp and write it to buf[], * including terminating '\n'.Nooo! Please get rid of such a awful Perl-ish hack!
Mar 22 2007
Andrei Alexandrescu (See Website For Email) wrote: [...]Please justify your statements instead of using emotion, rhetoric, and implied assumptions.See my previous post. Ciao
Mar 22 2007
On Wed, 21 Mar 2007 14:36:10 -0700, Andrei Alexandrescu (See Website For Email) wrote:I'll mention here that it's quite disappointing that Tango's idiomatic method of reading a line from the console (Cin.nextLine(line) unless I missed something) chose to chop the newline automatically. The Perl book spends half a page or so explaining why it's _good_ that the newline is included in the line, and I've been thankful for that on numerous occasions when writing Perl.LOL ... That is odd because in nearly every program I ever write that reads text lines, the first thing I need to do after I read in the line is to strip off the bloody newline character.Please put the newline back in the line.... but leave in the option of reading it without a newline attached. -- Derek (skype: derek.j.parnell) Melbourne, Australia "Justice for David Hicks!" 22/03/2007 10:12:54 AM
Mar 21 2007
Andrei Alexandrescu (See Website For Email) wrote:I passed a 31 MB text file (containing a dictionary that I'm using in my research) through each of the programs above. The output was set to /dev/null. I've ran the same program multiple times before the actual test, so everything is cached and the process becomes computationally-bound. Here are the results summed for 10 consecutive runs (averaged over 5 epochs): 13.9s Tango 6.6s Perl 5.0s std.stdioFor what it's worth, I created a Win32 version of the Unix 'time' command recently. Not too complicated, but if anyone is interested, I have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip It's a quick and dirty implementation, but works for how I typically use it.
Mar 22 2007
Sean Kelly wrote:For what it's worth, I created a Win32 version of the Unix 'time' command recently. Not too complicated, but if anyone is interested, I have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip It's a quick and dirty implementation, but works for how I typically use it.Alternatively, http://www.digitalmars.com/techtips/timing_code.html
Mar 22 2007
On Thu, 22 Mar 2007 19:45:16 +0200, Walter Bright = <newshound digitalmars.com> wrote:Sean Kelly wrote:For what it's worth, I created a Win32 version of the Unix 'time' =I =command recently. Not too complicated, but if anyone is interested, =a =have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip It's =quick and dirty implementation, but works for how I typically use it.=Alternatively, http://www.digitalmars.com/techtips/timing_code.htmlBTW, the following line (printed in bold in 'timing_code.html'): auto Timer t =3D new Timer(); uses 'auto' instead of 'scope'. There is also another identical line at = = the bottom of the page. I think I should also mention that DMD v1.007 uses 'auto' in the error = message 'Error: variable XXX reference to auto class must be auto' = (happens when a scope class object is declared without the 'scope' = keyword). Is this minor glitch corrected in DMD v1.009?
Mar 22 2007
Thanks for the tip, that needs to be fixed.
Mar 22 2007
Sean Kelly wrote: <snip>For what it's worth, I created a Win32 version of the Unix 'time' command recently. Not too complicated, but if anyone is interested, I have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip It's a quick and dirty implementation, but works for how I typically use it.Looks useful, my own tool just measures 'real' time. But it breaks when using redirection, either way: redirect stdin: --- c:\prog\test\linetest>ptime cat <test.txt cat: -: Bad file descriptor real 0m0.000s user 0m0.010s sys 0m0.000s --- redirect stdout: --- c:\prog\test\linetest>ptime cat test.txt >NUL --- The last one outputs nothing. Printing to stderr would fix that.
Mar 23 2007
torhu wrote:Sean Kelly wrote: <snip>Hm, I suspect IO redirection must be a feature of the shell. It's a bit of a hack, but this may work "ptime cmd /c cat < test.txt." I'll see how complicated a real fix would be. SeanFor what it's worth, I created a Win32 version of the Unix 'time' command recently. Not too complicated, but if anyone is interested, I have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip It's a quick and dirty implementation, but works for how I typically use it.Looks useful, my own tool just measures 'real' time. But it breaks when using redirection, either way: redirect stdin: --- c:\prog\test\linetest>ptime cat <test.txt cat: -: Bad file descriptor real 0m0.000s user 0m0.010s sys 0m0.000s --- redirect stdout: --- c:\prog\test\linetest>ptime cat test.txt >NUL --- The last one outputs nothing. Printing to stderr would fix that.
Mar 23 2007
Sean Kelly wrote:Hm, I suspect IO redirection must be a feature of the shell. It's a bit of a hack, but this may work "ptime cmd /c cat < test.txt." I'll see how complicated a real fix would be.I get the same error. My own tool doesn't have such problems, but it only uses the standard C system() function. Which might be too limited for what your tool does.
Mar 23 2007
torhu wrote:Sean Kelly wrote:Yeah, mine uses CreateProcess and then GetProcessTimes. I'll give the docs a look later and see if I can figure out why it's not working.Hm, I suspect IO redirection must be a feature of the shell. It's a bit of a hack, but this may work "ptime cmd /c cat < test.txt." I'll see how complicated a real fix would be.I get the same error. My own tool doesn't have such problems, but it only uses the standard C system() function. Which might be too limited for what your tool does.
Mar 23 2007
Sean Kelly wrote:Andrei Alexandrescu (See Website For Email) wrote:I was looking for something like this just the other day. Link seems to be dead these days. Is there a new URL for it? --bbI passed a 31 MB text file (containing a dictionary that I'm using in my research) through each of the programs above. The output was set to /dev/null. I've ran the same program multiple times before the actual test, so everything is cached and the process becomes computationally-bound. Here are the results summed for 10 consecutive runs (averaged over 5 epochs): 13.9s Tango 6.6s Perl 5.0s std.stdioFor what it's worth, I created a Win32 version of the Unix 'time' command recently. Not too complicated, but if anyone is interested, I have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip It's a quick and dirty implementation, but works for how I typically use it.
Apr 20 2008
== Quote from Bill Baxter (dnewsgroup billbaxter.com)'s articleSean Kelly wrote:I switched web hosts and have yet to re-upload all my old content. I'll see about getting this zipfile up in the next few days. SeanAndrei Alexandrescu (See Website For Email) wrote:I was looking for something like this just the other day. Link seems to be dead these days. Is there a new URL for it?I passed a 31 MB text file (containing a dictionary that I'm using in my research) through each of the programs above. The output was set to /dev/null. I've ran the same program multiple times before the actual test, so everything is cached and the process becomes computationally-bound. Here are the results summed for 10 consecutive runs (averaged over 5 epochs): 13.9s Tango 6.6s Perl 5.0s std.stdioFor what it's worth, I created a Win32 version of the Unix 'time' command recently. Not too complicated, but if anyone is interested, I have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip It's a quick and dirty implementation, but works for how I typically use it.
Apr 21 2008
Sean Kelly wrote:== Quote from Bill Baxter (dnewsgroup billbaxter.com)'s articleOkay, I've uploaded it here: http://invisibleduck.org/sean/tmp/ptime.zip SeanSean Kelly wrote:I switched web hosts and have yet to re-upload all my old content. I'll see about getting this zipfile up in the next few days.Andrei Alexandrescu (See Website For Email) wrote:I was looking for something like this just the other day. Link seems to be dead these days. Is there a new URL for it?I passed a 31 MB text file (containing a dictionary that I'm using in my research) through each of the programs above. The output was set to /dev/null. I've ran the same program multiple times before the actual test, so everything is cached and the process becomes computationally-bound. Here are the results summed for 10 consecutive runs (averaged over 5 epochs): 13.9s Tango 6.6s Perl 5.0s std.stdioFor what it's worth, I created a Win32 version of the Unix 'time' command recently. Not too complicated, but if anyone is interested, I have it here: http://www.invisibleduck.org/~sean/tmp/ptime.zip It's a quick and dirty implementation, but works for how I typically use it.
Apr 21 2008
Andrei Alexandrescu (See Website For Email) wrote:I've ran a couple of simple tests comparing Perl, D's stdlib (the coming release), and Tango.I have uploaded a snapshot with prebuilt libraries to http://larsivi.net/files/tango-SNAPSHOT-20070322.tar.gz The prebuilt libraries are in the lib/ folder. Install libphobos.a as usual, and add libtango.a to your compile command. The test program (io.d below) should be compiled using the line dmd -O -release -inline io.d libtango.a io.d ------- import tango.io.Console; void main() { char[] line; // true means that newlines are retained while (Cin.nextLine(line, true)) Cout(line); } -------- For the sake of reference, I created a file with 1.8 million (equal) lines, at a total of 133 Megabytes. I ran it through the above program, and your Perl program. System is a PentiumM-1.86GHz, 1.5GB RAM, Kubuntu 7.04, DMD 1.009. Average times perl program : 1.65 seconds (real), 1.45 seconds (user) Average times tango program: 1.08 seconds (real), 0.91 seconds (user) Note that I also tried without the optimization flags to DMD, which resulted in times that were about 10% faster than Perl. -- Lars Ivar Igesund blog at http://larsivi.net DSource, #d.tango & #D: larsivi Dancing the Tango
Mar 22 2007
Lars Ivar Igesund wrote:Andrei Alexandrescu (See Website For Email) wrote:5.0s tcat Neat! Now that we got the performance problem out of the way, let's discuss stdio compatibility. I suggest you use getline on GNU platforms. AndreiI've ran a couple of simple tests comparing Perl, D's stdlib (the coming release), and Tango.I have uploaded a snapshot with prebuilt libraries to http://larsivi.net/files/tango-SNAPSHOT-20070322.tar.gz The prebuilt libraries are in the lib/ folder. Install libphobos.a as usual, and add libtango.a to your compile command. The test program (io.d below) should be compiled using the line dmd -O -release -inline io.d libtango.a io.d ------- import tango.io.Console; void main() { char[] line; // true means that newlines are retained while (Cin.nextLine(line, true)) Cout(line); } --------
Mar 22 2007
Andrei Alexandrescu (See Website For Email) wrote:Lars Ivar Igesund wrote:Maybe discuss first why stdio compatibility is needed? Is the equivalent functionality missing in Tango, and if so, would implementing it in Tango remove this need for compatibility? Then consider the hypothetical situation where all of libc functionality (including posix functionality currently used in Tango, system calls, etc) is exchanged with an equivalent libd. Somewhat depending on answer above, would same reasoning apply? -- Lars Ivar Igesund blog at http://larsivi.net DSource, #d.tango & #D: larsivi Dancing the TangoAndrei Alexandrescu (See Website For Email) wrote:5.0s tcat Neat! Now that we got the performance problem out of the way, let's discuss stdio compatibility. I suggest you use getline on GNU platforms. AndreiI've ran a couple of simple tests comparing Perl, D's stdlib (the coming release), and Tango.I have uploaded a snapshot with prebuilt libraries to http://larsivi.net/files/tango-SNAPSHOT-20070322.tar.gz The prebuilt libraries are in the lib/ folder. Install libphobos.a as usual, and add libtango.a to your compile command. The test program (io.d below) should be compiled using the line dmd -O -release -inline io.d libtango.a io.d ------- import tango.io.Console; void main() { char[] line; // true means that newlines are retained while (Cin.nextLine(line, true)) Cout(line); } --------
Mar 23 2007
Lars Ivar Igesund wrote:As long as the global "stdin" symbol is a FILE*, this would be highly recommendable. And given that phobos does offer stdin as a FILE*, stdio compatibility is important for programs that want to use phobos and tango simultaneously (e.g., a library using phobos linked with another one using tango). AndreiNeat! Now that we got the performance problem out of the way, let's discuss stdio compatibility. I suggest you use getline on GNU platforms. AndreiMaybe discuss first why stdio compatibility is needed? Is the equivalent functionality missing in Tango, and if so, would implementing it in Tango remove this need for compatibility?
Mar 23 2007
Andrei Alexandrescu (See Website For Email) wrote:Lars Ivar Igesund wrote:May I then suggest that you create a enhancement/wishlist ticket for this? Thanks :) -- Lars Ivar Igesund blog at http://larsivi.net DSource, #d.tango & #D: larsivi Dancing the TangoAs long as the global "stdin" symbol is a FILE*, this would be highly recommendable. And given that phobos does offer stdin as a FILE*, stdio compatibility is important for programs that want to use phobos and tango simultaneously (e.g., a library using phobos linked with another one using tango).Neat! Now that we got the performance problem out of the way, let's discuss stdio compatibility. I suggest you use getline on GNU platforms. AndreiMaybe discuss first why stdio compatibility is needed? Is the equivalent functionality missing in Tango, and if so, would implementing it in Tango remove this need for compatibility?
Mar 23 2007
great job! i didn't know I/O performance could variate in such a great range. and thanks for the great job from tango team. heh, now d's I/O is as fast as c ? or tango is even faster than C's I/O?Andrei Alexandrescu (See Website For Email) wrote:I've ran a couple of simple tests comparing Perl, D's stdlib (the coming release), and Tango.I have uploaded a snapshot with prebuilt libraries to http://larsivi.net/files/tango-SNAPSHOT-20070322.tar.gz The prebuilt libraries are in the lib/ folder. Install libphobos.a as usual, and add libtango.a to your compile command. The test program (io.d below) should be compiled using the line dmd -O -release -inline io.d libtango.a io.d ------- import tango.io.Console; void main() { char[] line; // true means that newlines are retained while (Cin.nextLine(line, true)) Cout(line); } -------- For the sake of reference, I created a file with 1.8 million (equal) lines, at a total of 133 Megabytes. I ran it through the above program, and your Perl program. System is a PentiumM-1.86GHz, 1.5GB RAM, Kubuntu 7.04, DMD 1.009. Average times perl program : 1.65 seconds (real), 1.45 seconds (user) Average times tango program: 1.08 seconds (real), 0.91 seconds (user) Note that I also tried without the optimization flags to DMD, which resulted in times that were about 10% faster than Perl.
Mar 22 2007
Davidl wrote:great job! i didn't know I/O performance could variate in such a great range. and thanks for the great job from tango team. heh, now d's I/O is as fast as c ? or tango is even faster than C's I/O?Tango is faster, at least for this particular test. Sean
Mar 22 2007
Walter Bright Wrote:Andrei Alexandrescu (See Website For Email) wrote:That's exactly what it does... Quite a few times I've had to 'optimize' C++ iostream code using sync_with_stdio().Walter Bright wrote:I think it means bringing the iostream I/O buffer in to sync with the stdio I/O buffer, i.e. you can mix printf and iostream output and it will appear in the same order the calls happen in the code.Turning off sync is cheating - D's readln does syncing.I don't know exactly what sync'ing does in C++, but probably it isn't the locking that you are thinking of.D's readln is inherently synced in this manner.Which of course begs the question -- Could an overload be added so it doesn't sync (not the default)? Might be worth a test, and if the difference is significant keep it.
Mar 22 2007
Dave wrote:Walter Bright Wrote:Since the data has to be buffered anyway, might as well use stdio's buffer. I don't know why iostream felt the need to reimplement the buffers - certainly it isn't for performance <g>.Andrei Alexandrescu (See Website For Email) wrote:That's exactly what it does... Quite a few times I've had to 'optimize' C++ iostream code using sync_with_stdio().Walter Bright wrote:I think it means bringing the iostream I/O buffer in to sync with the stdio I/O buffer, i.e. you can mix printf and iostream output and it will appear in the same order the calls happen in the code.Turning off sync is cheating - D's readln does syncing.I don't know exactly what sync'ing does in C++, but probably it isn't the locking that you are thinking of.D's readln is inherently synced in this manner.Which of course begs the question -- Could an overload be added so it doesn't sync (not the default)? Might be worth a test, and if the difference is significant keep it.
Mar 22 2007
Hi, I have got no reply to my questions. Can somebody answer them? Ciao -------- Original Message -------- Subject: Re: stdio performance in tango, stdlib, and perl Date: Fri, 23 Mar 2007 10:08:24 +0100 From: Roberto Mariottini <rmariottini mail.com> Organization: Digital Mars Newsgroups: digitalmars.D References: <4601A54A.8050307 erdani.org> <etsbup$2c5t$1 digitalmars.com> <4601B819.6080001 erdani.org> <etse2m$2fa2$1 digitalmars.com> <4601C25F.9050107 erdani.org> <ettem8$qgl$1 digitalmars.com> <4602C66E.4020100 erdani.org> Andrei Alexandrescu (See Website For Email) wrote:Roberto Mariottini wrote:[...]It's not clearly evident for a non-expert programmer that a new-line is appended at each line. Take any programmer from any language of your choice and ask what this snippets is supposed to do. This is against immediate comprehension of code.What exactly would be bad about it?Essentially it's about information. The naive loop: while (readln(line)) { write(line); }I'm completely against that awful mess of code.A newline is two bytes here.Very simple. If the file ends with a newline, the code reproduces it. If not, the code gratuitously appends a newline.is guaranteed 100% to produce an accurate copy of its input. The version that chops lines looks like: while (readln(line)) { writeln(line); } This may or may not add a newline to the output, possibly creating a file larger by one byte.Are you sure? Can you elaborate more on this?A text file is not a binary file. A newline at end of file is completely irrelevant. On the other side, no code should break if the last newline is there or not. The problem with your code is that the last line comes different from the others.Moreover, with the automated chopping it is basically impossible to write a program that exactly reproduces its input because readln essentially loses information.Ehm, can you elaborate how good is to put a '\n' at the end of any string when working with: - databases - communication programs - interprocess communication - distributed computingYes, wrong, very wrong. Except it's not me who's wrong :o).Also, stdio also offers a readln() that creates a new line on every call. That is useful if you want fresh lines every read: char[] line; while ((line = readln()).length > 0) { ++dictionary[line]; }This way you'll get two different dictionaries on Windows and on Unix. Wrong, very wrong.Say 'subjectively'. Assignments in boolean expressions should be avoided. The average programmer knows something about this magic, but fears to touch it, and never completely understand it. Still, any programmer from any language would think that this code ends at the first empty line. Here is one of the many possible non-convoluted versions: char[] line = readln(); while (line.length > 0) { ++dictionary[chomp(line)]; line = readln(); } And this is how it should be: char[] line = readln(); while (line != null) { ++dictionary[line]; line = readln(); }You are objectively wrong.The code _just works_ because an empty line means _precisely_ and without the shadow of a doubt that the file has ended. (An I/O error throws an exception, and does NOT return an empty line; that is another important point.) An API that uses automated chopping should not offer such a function because an empty line may mean that an empty line was read, or that it's eof time. So the API would force people to write convoluted code.What is your definition of "convolute"? I find your code 'convolute', 'unclear', 'buggy' and 'unportable'.The code is portable. Newline translation takes care of it. Just try it.Newline translation is an old problem with C, C++ and now with D. Nothing can be resolved with newline translation. Opening a file in binary mode on Unix and treating it like a text file works only as long as the program is run on Unix. Newline translation is prone to portability errors, thus non-portable. In my experience, newline translations pose more portability problems than it solves.You'll be more enlightened if you had to work with big CGI scripts written in Perl, and eventually had to convert them to JSP to make the average (available) programmers able to work on them. Sure, with Perl you can do many things in less than 10 lines. But keep it less than 10 lines, or you are in troubles.What can I say? Thanks! I'm enlightened!In the couple of years I've used Perl I've thanked the Perl folks for their readline decision numerous times.Per is something the world should get rid of, quickly. Per is wrong, Perl is evil, Perl is useless. You don't need Perl, try to cease using it. The fact that this narrow-minded idea comes from Perl is not surprising.scanf(" :%d", &i); CiaoTo each their own :o). Oh, probably you could explain how I can read a string containing spaces, followed by ":" and a number with scanf. Takes one line in Perl and D's readfln (not yet distributed).Ever tried to do cin or fscanf? You can't do any intelligent input with them because they skip whitespace and newlines like it's out of style.I use them, and I find them very comfortable. Again your definition of 'intelligent' is particular. If you find Perl 'intelligent', this say a lot.
Mar 27 2007
Roberto Mariottini wrote:Hi, I have got no reply to my questions. Can somebody answer them?Your "questions" hardly seem sincere. Were you not simply posturing for your position? Or do you want to see endless debate on chomp() vs. no chomp()? Dave-------- Original Message -------- Subject: Re: stdio performance in tango, stdlib, and perl Date: Fri, 23 Mar 2007 10:08:24 +0100 From: Roberto Mariottini <rmariottini mail.com> Organization: Digital Mars Newsgroups: digitalmars.D References: <4601A54A.8050307 erdani.org> <etsbup$2c5t$1 digitalmars.com> <4601B819.6080001 erdani.org> <etse2m$2fa2$1 digitalmars.com> <4601C25F.9050107 erdani.org> <ettem8$qgl$1 digitalmars.com> <4602C66E.4020100 erdani.org> Andrei Alexandrescu (See Website For Email) wrote: > Roberto Mariottini wrote: [...] >>> Essentially it's about information. The naive loop: >>> >>> while (readln(line)) { >>> write(line); >>> } >> >> I'm completely against that awful mess of code. > > What exactly would be bad about it? It's not clearly evident for a non-expert programmer that a new-line is appended at each line. Take any programmer from any language of your choice and ask what this snippets is supposed to do. This is against immediate comprehension of code. >>> is guaranteed 100% to produce an accurate copy of its input. The >>> version that chops lines looks like: >>> >>> while (readln(line)) { >>> writeln(line); >>> } >>> >>> This may or may not add a newline to the output, possibly creating a >>> file larger by one byte. >> >> Are you sure? Can you elaborate more on this? > > Very simple. If the file ends with a newline, the code reproduces it. If > not, the code gratuitously appends a newline. A newline is two bytes here. >>> Moreover, with the automated chopping it is basically impossible to >>> write a program that exactly reproduces its input because readln >>> essentially loses information. A text file is not a binary file. A newline at end of file is completely irrelevant. On the other side, no code should break if the last newline is there or not. The problem with your code is that the last line comes different from the others. >>> Also, stdio also offers a readln() that creates a new line on every >>> call. That is useful if you want fresh lines every read: >>> >>> char[] line; >>> while ((line = readln()).length > 0) { >>> ++dictionary[line]; >>> } >> >> This way you'll get two different dictionaries on Windows and on Unix. >> Wrong, very wrong. > > Yes, wrong, very wrong. Except it's not me who's wrong :o). Ehm, can you elaborate how good is to put a '\n' at the end of any string when working with: - databases - communication programs - interprocess communication - distributed computing >>> The code _just works_ because an empty line means _precisely_ and >>> without the shadow of a doubt that the file has ended. (An I/O error >>> throws an exception, and does NOT return an empty line; that is >>> another important point.) An API that uses automated chopping should >>> not offer such a function because an empty line may mean that an >>> empty line was read, or that it's eof time. So the API would force >>> people to write convoluted code. >> >> What is your definition of "convolute"? >> I find your code 'convolute', 'unclear', 'buggy' and 'unportable'. > > You are objectively wrong. Say 'subjectively'. Assignments in boolean expressions should be avoided. The average programmer knows something about this magic, but fears to touch it, and never completely understand it. Still, any programmer from any language would think that this code ends at the first empty line. Here is one of the many possible non-convoluted versions: char[] line = readln(); while (line.length > 0) { ++dictionary[chomp(line)]; line = readln(); } And this is how it should be: char[] line = readln(); while (line != null) { ++dictionary[line]; line = readln(); } > The code is portable. Newline translation > takes care of it. Just try it. Newline translation is an old problem with C, C++ and now with D. Nothing can be resolved with newline translation. Opening a file in binary mode on Unix and treating it like a text file works only as long as the program is run on Unix. Newline translation is prone to portability errors, thus non-portable. In my experience, newline translations pose more portability problems than it solves. >>> In the couple of years I've used Perl I've thanked the Perl folks for >>> their readline decision numerous times. >> >> Per is something the world should get rid of, quickly. >> Per is wrong, Perl is evil, Perl is useless. >> You don't need Perl, try to cease using it. >> >> The fact that this narrow-minded idea comes from Perl is not surprising. > > What can I say? Thanks! I'm enlightened! You'll be more enlightened if you had to work with big CGI scripts written in Perl, and eventually had to convert them to JSP to make the average (available) programmers able to work on them. Sure, with Perl you can do many things in less than 10 lines. But keep it less than 10 lines, or you are in troubles. >>> Ever tried to do cin or fscanf? You can't do any intelligent input >>> with them because they skip whitespace and newlines like it's out of >>> style. >> >> I use them, and I find them very comfortable. >> Again your definition of 'intelligent' is particular. >> If you find Perl 'intelligent', this say a lot. > > To each their own :o). Oh, probably you could explain how I can read a > string containing spaces, followed by ":" and a number with scanf. Takes > one line in Perl and D's readfln (not yet distributed). scanf(" :%d", &i); Ciao
Mar 27 2007
On Tue, 27 Mar 2007 16:27:57 +0200, Roberto Mariottini wrote:Hi, I have got no reply to my questions. Can somebody answer them? Ciao -------- Original Message -------- Subject: Re: stdio performance in tango, stdlib, and perl Date: Fri, 23 Mar 2007 10:08:24 +0100 From: Roberto Mariottini <rmariottini mail.com> Organization: Digital Mars Newsgroups: digitalmars.D References: <4601A54A.8050307 erdani.org> <etsbup$2c5t$1 digitalmars.com> <4601B819.6080001 erdani.org> <etse2m$2fa2$1 digitalmars.com> <4601C25F.9050107 erdani.org> <ettem8$qgl$1 digitalmars.com> <4602C66E.4020100 erdani.org> Andrei Alexandrescu (See Website For Email) wrote: > Roberto Mariottini wrote: [...] >>> Essentially it's about information. The naive loop: >>> >>> while (readln(line)) { >>> write(line); >>> } >> >> I'm completely against that awful mess of code. > > What exactly would be bad about it? It's not clearly evident for a non-expert programmer that a new-line is appended at each line. Take any programmer from any language of your choice and ask what this snippets is supposed to do. This is against immediate comprehension of code.One of the small issues I have with 'readln' appending a newline character(s) at the end of a line is that such characters are not actually a part of the text line; they are delimiters that separate one line from another. In essence they are the same type of thing as the null byte that marks the ends of a C-style string. If the purpose of returning the newline character(s) by readln() is to inform the caller that a complete line was actually read in, then I would have thought that this is 'optional' data that the caller could choose to know about or not. If I call readln() and a complete line was not read in I would consider this an exception. And by the way, a text file that does not terminate with a newline is not an exception in my point of view as this could be just a situation in which a delimiting newline is not required (there is nothing to delimit the last from).>>> is guaranteed 100% to produce an accurate copy of its input. The >>> version that chops lines looks like: >>> >>> while (readln(line)) { >>> writeln(line); >>> } >>> >>> This may or may not add a newline to the output, possibly creating a >>> file larger by one byte. >> >> Are you sure? Can you elaborate more on this? > > Very simple. If the file ends with a newline, the code reproduces it. If > not, the code gratuitously appends a newline. A newline is two bytes here.Som reanln() implementations disregard the actual newline as supplied by the operating system and just append a single 0x0A byte for all operating systems. And when it comes to outputing this, it is transformed back into the appropriate newline sequence for the running opsys.>>> Moreover, with the automated chopping it is basically impossible to >>> write a program that exactly reproduces its input because readln >>> essentially loses information. A text file is not a binary file. A newline at end of file is completely irrelevant.Exactly. It is merely a delimiter *between* lines.On the other side, no code should break if the last newline is there or not. The problem with your code is that the last line comes different from the others.The last line does not need a delimiter - so some systems make it optional.>>> Also, stdio also offers a readln() that creates a new line on every >>> call. That is useful if you want fresh lines every read: >>> >>> char[] line; >>> while ((line = readln()).length > 0) { >>> ++dictionary[line]; >>> } >> >> This way you'll get two different dictionaries on Windows and on Unix. >> Wrong, very wrong. > > Yes, wrong, very wrong. Except it's not me who's wrong :o). Ehm, can you elaborate how good is to put a '\n' at the end of any string when working with: - databases - communication programs - interprocess communication - distributed computingDoes not make a lot of sense to me either. Like I said earlier, the first thing I usually do when reading a line is to remove the damned newline character(s).>>> The code _just works_ because an empty line means _precisely_ and >>> without the shadow of a doubt that the file has ended. (An I/O error >>> throws an exception, and does NOT return an empty line; that is >>> another important point.) An API that uses automated chopping should >>> not offer such a function because an empty line may mean that an >>> empty line was read, or that it's eof time. So the API would force >>> people to write convoluted code. >> >> What is your definition of "convolute"? >> I find your code 'convolute', 'unclear', 'buggy' and 'unportable'. > > You are objectively wrong. Say 'subjectively'. Assignments in boolean expressions should be avoided. The average programmer knows something about this magic, but fears to touch it, and never completely understand it. Still, any programmer from any language would think that this code ends at the first empty line. Here is one of the many possible non-convoluted versions: char[] line = readln(); while (line.length > 0) { ++dictionary[chomp(line)]; line = readln(); } And this is how it should be: char[] line = readln(); while (line != null) { ++dictionary[line]; line = readln(); }This depends on distinguishing between an empty line and a null line.> The code is portable. Newline translation > takes care of it. Just try it. Newline translation is an old problem with C, C++ and now with D. Nothing can be resolved with newline translation. Opening a file in binary mode on Unix and treating it like a text file works only as long as the program is run on Unix. Newline translation is prone to portability errors, thus non-portable. In my experience, newline translations pose more portability problems than it solves.Unless done right by the compiler/language and not having to be done by the code writer each time. Much like a GC system. -- Derek Parnell Melbourne, Australia "Justice for David Hicks!" skype: derek.j.parnell
Mar 27 2007