digitalmars.D.bugs - inconsistent behavior of std.string.split

zwang (20/20) Aug 20 2005 According to the documentation:

Jarrett Billingsley (7/27) Aug 20 2005 Yeah, the one that takes a delimiter string should skip any zero-length

zwang (5/40) Aug 20 2005 Keeping zero-length strings is sometimes useful, for example, when parsi...

Jarrett Billingsley (3/7) Aug 20 2005 Good point.

zwang <nehzgnaw gmail.com> writes:

According to the documentation:
<spec>
char[][] split(char[] s)
     Split s[] into an array of words, using whitespace as the delimiter.

char[][] split(char[] s, char[] delim)
     Split s[] into an array of words, using delim[] as the delimiter.
</spec>

Intuitively, split(s) should be equivalent to split(s, " \t\f\r\n\v").
But the former function discards empty lines while the latter does not.
The following example demonstrates the difference.

<code>
import std.stdio;
import std.string;
void main(){
	writefln(std.string.split("0  3"," ")); //[0,,3]
	writefln(std.string.split("0  3"));     //[0,3]
	writefln(std.string.split("    "," ")); //[,,,,]
	writefln(std.string.split("    "));     //[]
}
</code>

Aug 20 2005

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"zwang" <nehzgnaw gmail.com> wrote in message 
news:de7c7e$17au$1 digitaldaemon.com...
 According to the documentation:
 <spec>
 char[][] split(char[] s)
     Split s[] into an array of words, using whitespace as the delimiter.

 char[][] split(char[] s, char[] delim)
     Split s[] into an array of words, using delim[] as the delimiter.
 </spec>

 Intuitively, split(s) should be equivalent to split(s, " \t\f\r\n\v").
 But the former function discards empty lines while the latter does not.
 The following example demonstrates the difference.

 <code>
 import std.stdio;
 import std.string;
 void main(){
 writefln(std.string.split("0  3"," ")); //[0,,3]
 writefln(std.string.split("0  3"));     //[0,3]
 writefln(std.string.split("    "," ")); //[,,,,]
 writefln(std.string.split("    "));     //[]
 }
 </code>

Yeah, the one that takes a delimiter string should skip any zero-length 
strings in-between delimiters.  The whitespace one will keep skipping 
characters until it hits a non-whitespace one, but the delimiter one will 
create a new string after every delimiter, when it should just keep reading 
delimiters until it hits a non-delimiter sequence.

Aug 20 2005

zwang <nehzgnaw gmail.com> writes:

Jarrett Billingsley wrote:
 "zwang" <nehzgnaw gmail.com> wrote in message 
 news:de7c7e$17au$1 digitaldaemon.com...
 
According to the documentation:
<spec>
char[][] split(char[] s)
    Split s[] into an array of words, using whitespace as the delimiter.

char[][] split(char[] s, char[] delim)
    Split s[] into an array of words, using delim[] as the delimiter.
</spec>

Intuitively, split(s) should be equivalent to split(s, " \t\f\r\n\v").
But the former function discards empty lines while the latter does not.
The following example demonstrates the difference.

<code>
import std.stdio;
import std.string;
void main(){
writefln(std.string.split("0  3"," ")); //[0,,3]
writefln(std.string.split("0  3"));     //[0,3]
writefln(std.string.split("    "," ")); //[,,,,]
writefln(std.string.split("    "));     //[]
}
</code>

 
 
 Yeah, the one that takes a delimiter string should skip any zero-length 
 strings in-between delimiters.  The whitespace one will keep skipping 
 characters until it hits a non-whitespace one, but the delimiter one will 
 create a new string after every delimiter, when it should just keep reading 
 delimiters until it hits a non-delimiter sequence. 
 
 

Keeping zero-length strings is sometimes useful, for example, when parsing 
a CSV or tab-delimited file. A better solution might be two versions of 
split that handle consecutive delimiters differently. Or another two 
overloaded split functions for the special case of whitespace delimiters.

Aug 20 2005

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"zwang" <nehzgnaw gmail.com> wrote in message 
news:de7e7l$18se$1 digitaldaemon.com...
 Keeping zero-length strings is sometimes useful, for example, when parsing 
 a CSV or tab-delimited file. A better solution might be two versions of 
 split that handle consecutive delimiters differently. Or another two 
 overloaded split functions for the special case of whitespace delimiter

Good point.

Aug 20 2005

D Programming

C/C++ Programming

Other

digitalmars.D.bugs - inconsistent behavior of std.string.split