digitalmars.D - formattedRead whitespace quirks (compared to scanf)
- Gordon (67/67) Dec 25 2013 Hello,
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (12/26) Dec 25 2013 That is by design. Since a string can contain space characters, the
- Gordon (6/13) Dec 25 2013 Thank you, Ali - this explains few other discrepancies I found
- Andrei Alexandrescu (5/25) Dec 25 2013 Yah, that's intentional. scanf has its usefulness slashed to a fraction
- Gordon (11/14) Dec 25 2013 I like the fixes, it just that the interface *looks* the same
Hello, Trying to use "std.file.slurp" as generic file loader, I encountered a quirk in the way formattedRead works compared to the (standard?) scanf. It seems using "%s" format does not stop at whitespace if the "%s" is the last item in the format string. A simple comparison would be: -- in C -- char a[100] = {0}; chat *input = "hello world 42"; sscanf(input, "%s", &a); -- in D -- string a; string input = "hello world 42"; formattedRead(input,"%s", &a); ----------- In "C", the variable "a" would contain only "hello"; In "D", the variable "a" would contain "hello world 42"; BUT, If the format string would be "%s %s %d" (and we had three variables), then "formattedRead()" would behave exactly like "sscanf()". Complete code to illustrate the issue: -- scanf_test.c -- #include <stdio.h> #include <stdlib.h> #include <string.h> int main() { const char *text = "hello world 42"; char a[100] = {0}; char b[100] = {0}; char c[100] = {0}; int pos=0; sscanf(text,"%s %s %s", &a, &b, &c); printf("reading all-at-once: a='%s' b='%s' c='%s'\n", a,b,c); sscanf(text,"%s%n", &a, &pos); printf("reading first word: a='%s' (remaining text='%s')\n",a,text+pos); } ------------------ --- formattedread_test.d --- import std.string; import std.format; import std.stdio; void main() { string a,b,c; string text = "hello world 42"; formattedRead(text,"%s %s %s", &a, &b, &c); writeln("reading all-at-once: a = '",a,"' b='",b,"' c='",c,"'"); text = "hello world 42"; formattedRead(text, "%s", &a); writeln("reading first word: a = '",a,"' remaining text='",text,"'"); } ---------------------------- The output: $ gcc -o scanf_test scanf_test.c $ ./scanf_test reading all-at-once: a='hello' b='world' c='42' reading first word: a='hello' (remaining text=' world 42') $ rdmd formattedread_test.d reading all-at-once: a = 'hello' b='world' c='42' reading first word: a = 'hello world 42' remaining text=''
Dec 25 2013
On 12/25/2013 12:43 PM, Gordon wrote:-- in C -- char a[100] = {0}; chat *input = "hello world 42"; sscanf(input, "%s", &a); -- in D -- string a; string input = "hello world 42"; formattedRead(input,"%s", &a); ----------- In "C", the variable "a" would contain only "hello"; In "D", the variable "a" would contain "hello world 42"; BUT, If the format string would be "%s %s %d" (and we had three variables), then "formattedRead()" would behave exactly like "sscanf()".That is by design. Since a string can contain space characters, the normal behavior is to read everything as a part of the the string. scanf is defined differently. However, just like scanf, a space character in the format string matches zero or more whitespace characters in the input and ignores them. So, "%s %s %d" means to read and ignore the spaces between the three data that you are reading. By the way, you almost never need anything but %s in D; unlike scanf, formattedRead() actually knows the exact types of its parameters, so %s works for integers as well. Ali
Dec 25 2013
On Wednesday, 25 December 2013 at 21:06:46 UTC, Ali Çehreli wrote:On 12/25/2013 12:43 PM, Gordon wrote:Thank you, Ali - this explains few other discrepancies I found (or just didn't expect). Since it all started with "slurp" for me, I now see that using "slurp" correctly is much trickier than I expected. Oh well..In "C", the variable "a" would contain only "hello"; In "D", the variable "a" would contain "hello world 42";That is by design. Since a string can contain space characters, the normal behavior is to read everything as a part of the the string. scanf is defined differently.
Dec 25 2013
On 12/25/13 1:06 PM, Ali Çehreli wrote:On 12/25/2013 12:43 PM, Gordon wrote: > -- in C -- > char a[100] = {0}; > chat *input = "hello world 42"; > sscanf(input, "%s", &a); > -- in D -- > string a; > string input = "hello world 42"; > formattedRead(input,"%s", &a); > ----------- > > In "C", the variable "a" would contain only "hello"; > In "D", the variable "a" would contain "hello world 42"; > > BUT, > If the format string would be "%s %s %d" (and we had three variables), > then "formattedRead()" would behave exactly like "sscanf()". That is by design. Since a string can contain space characters, the normal behavior is to read everything as a part of the the string. scanf is defined differently.Yah, that's intentional. scanf has its usefulness slashed to a fraction because of the way it handles strings. People added %[...] to compensate for that; I chose to just fix it. Andrei
Dec 25 2013
On Thursday, 26 December 2013 at 00:22:14 UTC, Andrei Alexandrescu wrote:Yah, that's intentional. scanf has its usefulness slashed to a fraction because of the way it handles strings. People added %[...] to compensate for that; I chose to just fix it.I like the fixes, it just that the interface *looks* the same (format specifiers and all), but the implementation is different (and mostly undocumented? or perhaps I missed it...) - so the results were surprising to me. Related, I've submitted a tiny patch to count parsed variables in a tuple, here: https://github.com/D-Programming-Language/phobos/pull/1812 Hope this is the right way to send patches... -gordon
Dec 25 2013