www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 5977] New: String splitting with empty separator

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5977

           Summary: String splitting with empty separator
           Product: D
           Version: D2
          Platform: x86
        OS/Version: Windows
            Status: NEW
          Keywords: patch
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: bearophile_hugs eml.cc



This D2 program seems to go in infinte loop (dmd 2.053beta):


import std.string;
void main() {
    split("a test", "");
}

------------------------

My suggestion is to add code like this in std.array.split():

if (delim.length == 0)
    return split(s);

This means that en empty splitting string is like splitting on generic
whitespace. This is useful in code like:

auto foo(string txt, string delim="") {
    return txt.split(delim);
}


This means that calling foo with no arguments splits txt on whitespace,
otherwise splits on the given string. This allows to use the two forms of split
in foo() without if conditions. This is done in Python too, where None is used
instead of an empty string.


The modified split is something like (there is a isSomeString!S2 because are
special, they aren't generic arrays, splitting on whitespace is meaningful for
strings only):


Unqual!(S1)[] split(S1, S2)(S1 s, S2 delim)
if (isForwardRange!(Unqual!S1) && isForwardRange!S2)
{
    Unqual!S1 us = s;
    if (isSomeString!S2 && delim.length == 0)
    {
        return split(s);
    }
    else
    {
        auto app = appender!(Unqual!(S1)[])();
        foreach (word; std.algorithm.splitter(us, delim))
        {
            app.put(word);
        }
        return app.data;
    }
}


Beside this change, I presume std.algorithm.splitter() too needs to test for an
empty delim.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 10 2011
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5977




Alternative: throw an ArgumentError("delim argument is empty") exception if
delim is empty.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 25 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5977


monarchdodra gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |daniel350 bigpond.com



*** Issue 8551 has been marked as a duplicate of this issue. ***

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 22 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5977


monarchdodra gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
                 CC|                            |monarchdodra gmail.com
         AssignedTo|nobody puremagic.com        |monarchdodra gmail.com




 This D2 program seems to go in infinte loop (dmd 2.053beta):
 
 
 import std.string;
 void main() {
     split("a test", "");
 }
 
 ------------------------
 
 My suggestion is to add code like this in std.array.split():
 
 if (delim.length == 0)
     return split(s);
 
 This means that en empty splitting string is like splitting on generic
 whitespace. This is useful in code like:
 
 auto foo(string txt, string delim="") {
     return txt.split(delim);
 }
I think it is a bad idea on two counts: 1. If the user wanted that behavior, he'd have written it as such. If the user actually passed a seperator that is an empty range, he probably didn't mean for it split by spaces. 2. I think it would also bring a deviation of behavior between strings and non-strings. Supposing r is empty: * "hello world".split(""); //Ok, split white * [1, 2].split(r); //Derp.
 Alternative: throw an ArgumentError("delim argument is empty") exception if
 delim is empty.
I *really* think that is a *much* saner approach. Splitting with an empty separator is just not logic. Trying to force a default behavior in that scenario is wishful thinking (IMO). I think it should throw an error. I'll implement this. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Oct 22 2012
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5977


hsteoh quickfur.ath.cx changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hsteoh quickfur.ath.cx



FWIW, in perl, splitting on an empty string simply returns an array of
characters. I think that better reflects the symmetry of join("", array).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jan 03 2013