digitalmars.D - formattedRead whitespace quirks (compared to scanf)
- Gordon (67/67) Dec 25 2013 Hello,
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (12/26) Dec 25 2013 That is by design. Since a string can contain space characters, the
- Gordon (6/13) Dec 25 2013 Thank you, Ali - this explains few other discrepancies I found
- Andrei Alexandrescu (5/25) Dec 25 2013 Yah, that's intentional. scanf has its usefulness slashed to a fraction
- Gordon (11/14) Dec 25 2013 I like the fixes, it just that the interface *looks* the same
Hello,
Trying to use "std.file.slurp" as generic file loader,
I encountered a quirk in the way formattedRead works compared to
the (standard?) scanf.
It seems using "%s" format does not stop at whitespace if the
"%s" is the last item in the format string.
A simple comparison would be:
-- in C --
char a[100] = {0};
chat *input = "hello world 42";
sscanf(input, "%s", &a);
-- in D --
string a;
string input = "hello world 42";
formattedRead(input,"%s", &a);
-----------
In "C", the variable "a" would contain only "hello";
In "D", the variable "a" would contain "hello world 42";
BUT,
If the format string would be "%s %s %d" (and we had three
variables), then "formattedRead()" would behave exactly like
"sscanf()".
Complete code to illustrate the issue:
-- scanf_test.c --
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
const char *text = "hello world 42";
char a[100] = {0};
char b[100] = {0};
char c[100] = {0};
int pos=0;
sscanf(text,"%s %s %s", &a, &b, &c);
printf("reading all-at-once: a='%s' b='%s' c='%s'\n",
a,b,c);
sscanf(text,"%s%n", &a, &pos);
printf("reading first word: a='%s' (remaining
text='%s')\n",a,text+pos);
}
------------------
--- formattedread_test.d ---
import std.string;
import std.format;
import std.stdio;
void main()
{
string a,b,c;
string text = "hello world 42";
formattedRead(text,"%s %s %s", &a, &b, &c);
writeln("reading all-at-once: a = '",a,"' b='",b,"'
c='",c,"'");
text = "hello world 42";
formattedRead(text, "%s", &a);
writeln("reading first word: a = '",a,"' remaining
text='",text,"'");
}
----------------------------
The output:
$ gcc -o scanf_test scanf_test.c
$ ./scanf_test
reading all-at-once: a='hello' b='world' c='42'
reading first word: a='hello' (remaining text=' world 42')
$ rdmd formattedread_test.d
reading all-at-once: a = 'hello' b='world' c='42'
reading first word: a = 'hello world 42' remaining text=''
Dec 25 2013
On 12/25/2013 12:43 PM, Gordon wrote:
-- in C --
char a[100] = {0};
chat *input = "hello world 42";
sscanf(input, "%s", &a);
-- in D --
string a;
string input = "hello world 42";
formattedRead(input,"%s", &a);
-----------
In "C", the variable "a" would contain only "hello";
In "D", the variable "a" would contain "hello world 42";
BUT,
If the format string would be "%s %s %d" (and we had three variables),
then "formattedRead()" would behave exactly like "sscanf()".
That is by design. Since a string can contain space characters, the
normal behavior is to read everything as a part of the the string. scanf
is defined differently.
However, just like scanf, a space character in the format string matches
zero or more whitespace characters in the input and ignores them. So,
"%s %s %d" means to read and ignore the spaces between the three data
that you are reading.
By the way, you almost never need anything but %s in D; unlike scanf,
formattedRead() actually knows the exact types of its parameters, so %s
works for integers as well.
Ali
Dec 25 2013
On Wednesday, 25 December 2013 at 21:06:46 UTC, Ali Çehreli wrote:On 12/25/2013 12:43 PM, Gordon wrote:Thank you, Ali - this explains few other discrepancies I found (or just didn't expect). Since it all started with "slurp" for me, I now see that using "slurp" correctly is much trickier than I expected. Oh well..In "C", the variable "a" would contain only "hello"; In "D", the variable "a" would contain "hello world 42";That is by design. Since a string can contain space characters, the normal behavior is to read everything as a part of the the string. scanf is defined differently.
Dec 25 2013
On 12/25/13 1:06 PM, Ali Çehreli wrote:
On 12/25/2013 12:43 PM, Gordon wrote:
> -- in C --
> char a[100] = {0};
> chat *input = "hello world 42";
> sscanf(input, "%s", &a);
> -- in D --
> string a;
> string input = "hello world 42";
> formattedRead(input,"%s", &a);
> -----------
>
> In "C", the variable "a" would contain only "hello";
> In "D", the variable "a" would contain "hello world 42";
>
> BUT,
> If the format string would be "%s %s %d" (and we had three variables),
> then "formattedRead()" would behave exactly like "sscanf()".
That is by design. Since a string can contain space characters, the
normal behavior is to read everything as a part of the the string. scanf
is defined differently.
Yah, that's intentional. scanf has its usefulness slashed to a fraction
because of the way it handles strings. People added %[...] to compensate
for that; I chose to just fix it.
Andrei
Dec 25 2013
On Thursday, 26 December 2013 at 00:22:14 UTC, Andrei Alexandrescu wrote:Yah, that's intentional. scanf has its usefulness slashed to a fraction because of the way it handles strings. People added %[...] to compensate for that; I chose to just fix it.I like the fixes, it just that the interface *looks* the same (format specifiers and all), but the implementation is different (and mostly undocumented? or perhaps I missed it...) - so the results were surprising to me. Related, I've submitted a tiny patch to count parsed variables in a tuple, here: https://github.com/D-Programming-Language/phobos/pull/1812 Hope this is the right way to send patches... -gordon
Dec 25 2013









"Gordon" <me home.com> 