www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - formattedRead whitespace quirks (compared to scanf)

reply "Gordon" <me home.com> writes:
Hello,

Trying to use "std.file.slurp" as generic file loader,
I encountered a quirk in the way formattedRead works compared to 
the (standard?) scanf.

It seems using "%s" format does not stop at whitespace if the 
"%s" is the last item in the format string.

A simple comparison would be:

-- in C --
char a[100] = {0};
chat *input = "hello world 42";
sscanf(input, "%s", &a);
-- in D --
string a;
string input = "hello world 42";
formattedRead(input,"%s", &a);
-----------

In "C", the variable "a" would contain only "hello";
In "D", the variable "a" would contain "hello world 42";

BUT,
If the format string would be "%s %s %d" (and we had three 
variables), then "formattedRead()" would behave exactly like 
"sscanf()".

Complete code to illustrate the issue:

-- scanf_test.c --
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
         const char *text = "hello world 42";
         char a[100] = {0};
         char b[100] = {0};
         char c[100] = {0};
         int pos=0;

         sscanf(text,"%s %s %s", &a, &b, &c);
         printf("reading all-at-once: a='%s' b='%s' c='%s'\n", 
a,b,c);

         sscanf(text,"%s%n", &a, &pos);
         printf("reading first word: a='%s' (remaining 
text='%s')\n",a,text+pos);
}
------------------

--- formattedread_test.d ---
import std.string;
import std.format;
import std.stdio;

void main()
{
         string a,b,c;
         string text = "hello world 42";
         formattedRead(text,"%s %s %s", &a, &b, &c);
         writeln("reading all-at-once: a = '",a,"' b='",b,"' 
c='",c,"'");

         text = "hello world 42";
         formattedRead(text, "%s", &a);
         writeln("reading first word: a = '",a,"' remaining 
text='",text,"'");
}
----------------------------

The output:
$ gcc -o scanf_test scanf_test.c
$ ./scanf_test
reading all-at-once: a='hello' b='world' c='42'
reading first word: a='hello' (remaining text=' world 42')

$ rdmd formattedread_test.d
reading all-at-once: a = 'hello' b='world' c='42'
reading first word: a = 'hello world 42' remaining text=''
Dec 25 2013
parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 12/25/2013 12:43 PM, Gordon wrote:

 -- in C --
 char a[100] = {0};
 chat *input = "hello world 42";
 sscanf(input, "%s", &a);
 -- in D --
 string a;
 string input = "hello world 42";
 formattedRead(input,"%s", &a);
 -----------

 In "C", the variable "a" would contain only "hello";
 In "D", the variable "a" would contain "hello world 42";

 BUT,
 If the format string would be "%s %s %d" (and we had three variables),
 then "formattedRead()" would behave exactly like "sscanf()".
That is by design. Since a string can contain space characters, the normal behavior is to read everything as a part of the the string. scanf is defined differently. However, just like scanf, a space character in the format string matches zero or more whitespace characters in the input and ignores them. So, "%s %s %d" means to read and ignore the spaces between the three data that you are reading. By the way, you almost never need anything but %s in D; unlike scanf, formattedRead() actually knows the exact types of its parameters, so %s works for integers as well. Ali
Dec 25 2013
next sibling parent "Gordon" <me home.com> writes:
On Wednesday, 25 December 2013 at 21:06:46 UTC, Ali Çehreli wrote:
 On 12/25/2013 12:43 PM, Gordon wrote:
 In "C", the variable "a" would contain only "hello";
 In "D", the variable "a" would contain "hello world 42";
That is by design. Since a string can contain space characters, the normal behavior is to read everything as a part of the the string. scanf is defined differently.
Thank you, Ali - this explains few other discrepancies I found (or just didn't expect). Since it all started with "slurp" for me, I now see that using "slurp" correctly is much trickier than I expected. Oh well..
Dec 25 2013
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/25/13 1:06 PM, Ali Çehreli wrote:
 On 12/25/2013 12:43 PM, Gordon wrote:

  > -- in C --
  > char a[100] = {0};
  > chat *input = "hello world 42";
  > sscanf(input, "%s", &a);
  > -- in D --
  > string a;
  > string input = "hello world 42";
  > formattedRead(input,"%s", &a);
  > -----------
  >
  > In "C", the variable "a" would contain only "hello";
  > In "D", the variable "a" would contain "hello world 42";
  >
  > BUT,
  > If the format string would be "%s %s %d" (and we had three variables),
  > then "formattedRead()" would behave exactly like "sscanf()".

 That is by design. Since a string can contain space characters, the
 normal behavior is to read everything as a part of the the string. scanf
 is defined differently.
Yah, that's intentional. scanf has its usefulness slashed to a fraction because of the way it handles strings. People added %[...] to compensate for that; I chose to just fix it. Andrei
Dec 25 2013
parent "Gordon" <me home.com> writes:
On Thursday, 26 December 2013 at 00:22:14 UTC, Andrei 
Alexandrescu wrote:
 Yah, that's intentional. scanf has its usefulness slashed to a 
 fraction because of the way it handles strings. People added 
 %[...] to compensate for that; I chose to just fix it.
I like the fixes, it just that the interface *looks* the same (format specifiers and all), but the implementation is different (and mostly undocumented? or perhaps I missed it...) - so the results were surprising to me. Related, I've submitted a tiny patch to count parsed variables in a tuple, here: https://github.com/D-Programming-Language/phobos/pull/1812 Hope this is the right way to send patches... -gordon
Dec 25 2013