digitalmars.D - [review] new string type (take 2)
- Steven Schveighoffer (402/402) Jan 13 2011 Based on the suggestions of others, I have updated my string type.
- Steven Wawryk (18/39) Jan 13 2011 Same for both.
- Steven Schveighoffer (19/66) Jan 14 2011 Exactly, that is what my string type allows. You can either do it at th...
- Andrei Alexandrescu (4/66) Jan 14 2011 I equally respectfully disagree. I think random access is defined as
- Steven Schveighoffer (11/29) Jan 15 2011 I'd actually go as far as saying that any container you can do a quick
- Andrei Alexandrescu (6/36) Jan 15 2011 Here you are trying to equate a dictionary (map/hashtable) with an
- Steven Schveighoffer (17/56) Jan 15 2011 I see a lot of similarity. Both have sparsely populated indexes. Both ...
- Andrei Alexandrescu (21/78) Jan 15 2011 I think this is way off. We're comparing a highly structured topology
- Steven Wawryk (22/30) Jan 16 2011 Andrei,
- Andrei Alexandrescu (21/43) Jan 16 2011 One thing that I don't think is understood is that Walter is the key
Based on the suggestions of others, I have updated my string type. changes: * opApply for iterating with foreach w/ index * indexing the string at an invalid location (i.e. not the start of a code point) throws a RangeError (does that make sense, or should it be an exception?). * charStart is now public so you can use it to ensure you are accessing the start of a code point * validIdx new function that tells you if your index is at the start of a code point. * data property which gets the underlying T[] * Added free functions for string_t!dchar to make it have the same properties as the other string types. * Added an ability to assign to a T[] array for ease of use. * Added a ptr property so it works seamlessly with code that currently uses strings (but we still need $, however it appears this isn't implemented yet in dmd). * fully documented Here it is: // Written in the D programming language. /** Copyright: Copyright Andrei Alexandrescu and Steven Schveighoffer 2008-2010 License: <a href="http://www.boost.org/LICENSE_1_0.txt">Boost License 1.0</a>. Authors: $(WEB erdani.org, Andrei Alexandrescu, Steven Schveighoffer) Copyright Andrei Alexandrescu and Steven Schveighoffer 2008-2010. Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) */ import std.utf; import std.traits; import core.exception; // BUG version = norangeopapplyoverload; /** * Narrow string type. This string implements utf-8 or utf-16 depending on the * character type. * * The string type is a bi-directional range of dchars, with a monotonic * indexing scheme. Essentially, because dchars are encoded at variable * widths, indexing and slicing based on dchars would be an O(n) operation. * Therefore, we allow indexing and slicing, but based on the 'code-unit' or * unit of encoding (wchar or char). This means some indexes are valid and * some are not (those which do not point to the start of an encoded dchar are * not). * * While this might seem confusing, it is very rare that one needs arbitrary * index access. In order to achieve this you can either ensure to yourself * that the string's code-unit to dchar ratio is 1 to 1 (never more than one * code-unit to encode a dchar), or you can use the charStart funciton to * ensure a valid index. */ struct string_t(T) if (is(Unqual!T == char) || is(Unqual!T == wchar)) { private T[] _data; /** * Constructor, builds a string based on given array data. */ this(T[] data) { this._data = data; } /** * Provide access to underlying array data. */ property T[] data() { return _data; } /** * forward access to the ptr part of the array. This allows the most * efficient operations when one knows what he is doing. */ property T* ptr() { return _data.ptr; } /** * Finds the largest valid index in the string that is <= idx. * Essentially, this can be used to convert arbitrary indexes into valid * indexes. */ size_t charStart(size_t idx) const { static if(is(Unqual!T == wchar)) { immutable c = _data.ptr[idx]; if(c >= 0xDC00 && c <= 0xDFFF) { // surrogate pair detected, verify we have at least 2 wchars, // and that both wchars are properly encoded. assert(idx > 0, "Invalid UTF character at beginning of string"); return idx-1; } else return idx; } else { const p = _data.ptr + idx; if ((p[0] & 0b1100_0000) != 0b1000_0000) { return idx; } else if (idx >= 1 && (p[-1] & 0b1100_0000) != 0b1000_0000) { return idx - 1; } else if (idx >= 2 && (p[-2] & 0b1100_0000) != 0b1000_0000) { return idx - 2; } else if (idx >= 3 && (p[-3] & 0b1100_0000) != 0b1000_0000) { return idx - 3; } else { assert(false, "Invalid UTF character in string"); } } } /** * Returns true if the given index starts an encoded dchar. */ bool validIdx(size_t idx) { if(idx >= _data.length) { if(idx is _data.length) return true; // index one beyond the string is valid. return false; } immutable c = _data[idx]; static if(is(Unqual!T == wchar)) { // make sure this isn't the second character of a surrogate pair return (c < 0xDC00 || c > 0xDFFF); } else // char { return ((c & 0b1100_0000) != 0b1000_0000); } } /** * remove the first code-point from the string. */ void popFront() { auto nc = std.utf.stride(_data, 0); assert(nc <= _data.length && nc != 0xFF, "Invalid sequence at beginning of string"); _data = _data[nc .. $]; } /** * Remove the last code-point from the string. */ void popBack() { immutable n = _data.length; assert(n, "Attempting to pop back of an empty string"); _data = _data.ptr[0..charStart(n-1)]; } /** * Get the first code-point in the string */ property dchar front() const { assert(_data.length, "Attempting to fetch the front of an empty string"); size_t i = 0; return decode(_data, i); } /** * Get the last code-point in the string */ property dchar back() const { immutable n = _data.length; assert(n, "Attempting to fetch the back of an empty string"); auto idx = charStart(n-1); return std.utf.decode(_data, idx); } /** * Does the string contain any data? */ property bool empty() const { return !_data.length; } /** * Copy the string (trivial function, needed for range definitions) */ property typeof(this) save() { return this; } /** * support read-only random access via code unit index. * * Note that an invalid idx (one that does not start a code point) results * in an exception */ dchar opIndex(size_t idx) { if(idx is _data.length || !validIdx(idx)) throw new RangeError(__FILE__, __LINE__); return std.utf.decode(_data, idx); } /** * slice the whole string */ string_t opSlice() { return this; } /** * Slice based on valid start and end indexes. * * Throws RangeError if start or end are not valid indexes */ string_t opSlice(size_t start, size_t end) { if(end < start || !validIdx(start) || !validIdx(end)) throw new RangeError(__FILE__, __LINE__); return string_t(_data[start..end]); } /** * Get the number of code units in the string. * * Note that this is specifically not called length because length * generally implies the number of elements in a range. Since dchar is our * element type, and the number of dchars cannot be determined in O(1) * time, using the name length would be incorrect. */ property size_t codeUnits() const { return _data.length; } /** * Append a string to this string. * * TODO: support appending any string type to this string type. */ ref string_t opOpAssign(string op, U)(U data) if (op == "~" && is(U == string_t)) { _data ~= data._data; return this; } /** * Support appending any types that the underlying array can support. */ ref string_t opOpAssign(string op, U)(U data) if (op == "~" && !is(U == string_t) && is(typeof(_data ~= U.init))) { _data ~= data; return this; } /** * Concatenation between two strings */ string_t opBinary(string op, U)(U data) if (op == "~" && is(U == string_t)) { return string_t(_data ~ data._data); } /** * Support any concatenation that is supported by the underlying data * array. */ string_t opBinary(string op, U)(U data) if (op == "~" && !is(U == string_t) && is(typeof(_data ~ U.init))) { return string_t(_data ~ data); } // note, this should not be required, it should use the range interface but // the compiler doesn't allow both to coexist for foreach. version(norangeopapplyoverload) { /** * Foreach with just dchars. Note, you cannot actually change the * data, despite the argument being ref. opApply requires ref. */ int opApply(scope int delegate(ref dchar d) dg) { dchar d; size_t idx = 0; immutable len = _data.length; int result = 0; while(result == 0 && idx < len) { d = std.utf.decode(_data, idx); result = dg(d); } return result; } } /** * Foreach over the string with accompanied index. * * Note, the refs are required for foreach, you cannot change the index or * the data in the string. */ int opApply(scope int delegate(ref size_t idx, ref dchar d) dg) { dchar d; size_t idx = 0; immutable len = _data.length; int result = 0; while(result == 0 && idx < len) { size_t tmpidx = idx; d = std.utf.decode(_data, idx); result = dg(tmpidx, d); } return result; } /** * Assign a string */ string_t opAssign(U)(U u) if(is(U == string_t)) { this._data = u._data; return this; } /** * Assign a string from another type */ string_t opAssign(U)(U u) if (!is(U == string_t) && is(typeof(_data = u))) { _data = u; return this; } } /** * String type for dchar */ template string_t(T) if (is(Unqual!T == dchar)) { alias T[] string_t; } // support string functions for dchar that aren't already defined. // TODO: do we need this one? // TODO: should be inout instead of a template property T[] data(T)(T[] t) if (is(Unqual!T == dchar)) { return t; } /** * Finds the largest valid index in the string that is <= idx. * Essentially, this can be used to convert arbitrary indexes into valid * indexes. */ size_t charStart(const(dchar)[] t, size_t idx) { return idx; } /** * Returns true if the given index starts an encoded dchar. */ bool validIdx(const(dchar)[] t, size_t idx) { return idx <= t.length; } // TODO: do we need this one? property size_t codeUnits(const(dchar)[] t) { return t.length; } /** begin test code **/ import std.stdio; alias string_t!(immutable char) mystring; alias string_t!(immutable wchar) mywstring; alias string_t!(immutable dchar) mydstring; void main() { auto str = mystring("hello"); foreach(dchar d; str) { } str ~= " world"; str ~= mystring("!!!"); writeln(str.data); mystring str2 = "blah blah"; str2 = str; str2 = "blah blah"; }
Jan 13 2011
On 14/01/11 02:25, Steven Schveighoffer wrote:On Wed, 12 Jan 2011 04:49:26 -0500, Steven Wawryk <stevenw acres.com.au> wrote:myString vs myString.dataI like the direction you're taking but have some quibbles about details. Specifically, I'd go for a more complete separation into random-access code-unit ranges and bidirectional code-point ranges:Thanks for taking the time. I will respond to your points, but please make your rebuttals to the new thread I'm about to create with an updated string type.I don't see a need for _charStart, opIndex, opSlice and codeUnits. If the underlying T[] can be returned by a property, then these can be done through the code-unit array, which is random-access.But that puts extra pain on the user for not much reason. Currently, strings slice in one operation, you are proposing that we slice in three operations: 1. get the underlying array2. slice itSame for both.3. reconstruct a string based on the slice.myOtherString = find(myString, 'x'); vs myOtherString = find(myString.data, 'x'); You may see extra pain. I see extra control. The user is making it explicit at what level (code-unit/code-point/grapheme/whatever) of range he/she wants the called algorithm to be working on.Plus, if you remove opIndex, you are restricting the usefulness of the range. Note that this string type already will decode dchars out of the front and back, why not just give that ability to the middle of thestring? Because at the code-point level it *isn't* a random-access range and the index makes no sense at the code-point level, only at the code-unit level. It's encouraging the confusion of 2 distinctly different abstractions or "views" of the same data. All the slicing and indexing you're artificially putting in the code-point range is already available in the code-unit range, and its only benefit is to allow the user to save typing ".data". - other Steve
Jan 13 2011
On Thu, 13 Jan 2011 23:03:35 -0500, Steven Wawryk <stevenw acres.com.au> wrote:On 14/01/11 02:25, Steven Schveighoffer wrote: > On Wed, 12 Jan 2011 04:49:26 -0500, Steven Wawryk <stevenw acres.com.au> > wrote: > >> >> I like the direction you're taking but have some quibbles about >> details. Specifically, I'd go for a more complete separation into >> random-access code-unit ranges and bidirectional code-point ranges: > > Thanks for taking the time. I will respond to your points, but please > make your rebuttals to the new thread I'm about to create with an > updated string type. > >> I don't see a need for _charStart, opIndex, opSlice and codeUnits. If >> the underlying T[] can be returned by a property, then these can be >> done through the code-unit array, which is random-access. > > But that puts extra pain on the user for not much reason. Currently, > strings slice in one operation, you are proposing that we slice in three > operations: > > 1. get the underlying array myString vs myString.data > 2. slice it Same for both. > 3. reconstruct a string based on the slice. myOtherString = find(myString, 'x'); vs myOtherString = find(myString.data, 'x'); You may see extra pain. I see extra control. The user is making it explicit at what level (code-unit/code-point/grapheme/whatever) of range he/she wants the called algorithm to be working on.Exactly, that is what my string type allows. You can either do it at the code-point (and probably grapheme, discussion in progress) level, or you can do it at the code-unit level. I don't see how restricting the user to only doing it at the code-unit level is not more painful.> Plus, if you remove opIndex, you are restricting the usefulness of the > range. Note that this string type already will decode dchars out of the > front and back, why not just give that ability to the middle of the string? Because at the code-point level it *isn't* a random-access range and the index makes no sense at the code-point level, only at the code-unit level. It's encouraging the confusion of 2 distinctly different abstractions or "views" of the same data. All the slicing and indexing you're artificially putting in the code-point range is already available in the code-unit range, and its only benefit is to allow the user to save typing ".data".I respectfully disagree. A stream built on fixed-sized units, but with variable length elements, where you can determine the start of an element in O(1) time given a random index absolutely provides random-access. It just doesn't provide length. You are also forgetting one thing, the main reason why a string type is better than the array -- it's possible to slice a code-unit array using indexes that create an invalid range. With my type it is not possible to do that (it throws an exception). We want the basic user to use strings properly (and inform them of their errors at the site of the error), and if an advanced user wants more control, they can jump down to the code-unit level by accessing the data property.- other Stevehehe, you can be Steve' :) -Steve
Jan 14 2011
On 1/14/11 5:06 AM, Steven Schveighoffer wrote:On Thu, 13 Jan 2011 23:03:35 -0500, Steven Wawryk <stevenw acres.com.au> wrote:I equally respectfully disagree. I think random access is defined as accessing the ith element in O(1) time. That's not the case here. AndreiOn 14/01/11 02:25, Steven Schveighoffer wrote:Exactly, that is what my string type allows. You can either do it at the code-point (and probably grapheme, discussion in progress) level, or you can do it at the code-unit level. I don't see how restricting the user to only doing it at the code-unit level is not more painful.On Wed, 12 Jan 2011 04:49:26 -0500, Steven Wawryk<stevenw acres.com.au>wrote:threeI like the direction you're taking but have some quibbles about details. Specifically, I'd go for a more complete separation into random-access code-unit ranges and bidirectional code-point ranges:Thanks for taking the time. I will respond to your points, but please make your rebuttals to the new thread I'm about to create with an updated string type.I don't see a need for _charStart, opIndex, opSlice and codeUnits. If the underlying T[] can be returned by a property, then these can be done through the code-unit array, which is random-access.But that puts extra pain on the user for not much reason. Currently, strings slice in one operation, you are proposing that we slice inoperations: 1. get the underlying arraymyString vs myString.data2. slice itSame for both.3. reconstruct a string based on the slice.myOtherString = find(myString, 'x'); vs myOtherString = find(myString.data, 'x'); You may see extra pain. I see extra control. The user is making it explicit at what level (code-unit/code-point/grapheme/whatever) of range he/she wants the called algorithm to be working on.I respectfully disagree. A stream built on fixed-sized units, but with variable length elements, where you can determine the start of an element in O(1) time given a random index absolutely provides random-access. It just doesn't provide length.Plus, if you remove opIndex, you are restricting the usefulness of the range. Note that this string type already will decode dchars out of the front and back, why not just give that ability to the middle of thestring? Because at the code-point level it *isn't* a random-access range and the index makes no sense at the code-point level, only at the code-unit level. It's encouraging the confusion of 2 distinctly different abstractions or "views" of the same data. All the slicing and indexing you're artificially putting in the code-point range is already available in the code-unit range, and its only benefit is to allow the user to save typing ".data".
Jan 14 2011
On Fri, 14 Jan 2011 12:03:28 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:On 1/14/11 5:06 AM, Steven Schveighoffer wrote:I'd actually go as far as saying that any container you can do a quick lookup given an index (O(lgn) or better) is random-access. For example, a dictionary file of english words sorted alphabetically can look up any word definition in O(lgn) time, but you can't look up the nth word in O(lgn) time. What do you call this type of container if not random access? Indexed maybe? Whatever term you call it, I'd call it that, and say opIndex belongs for that container. -SteveOn Thu, 13 Jan 2011 23:03:35 -0500, Steven Wawryk <stevenw acres.com.au> wrote:I equally respectfully disagree. I think random access is defined as accessing the ith element in O(1) time. That's not the case here.Because at the code-point level it *isn't* a random-access range and the index makes no sense at the code-point level, only at the code-unit level. It's encouraging the confusion of 2 distinctly different abstractions or "views" of the same data. All the slicing and indexing you're artificially putting in the code-point range is already available in the code-unit range, and its only benefit is to allow the user to save typing ".data".I respectfully disagree. A stream built on fixed-sized units, but with variable length elements, where you can determine the start of an element in O(1) time given a random index absolutely provides random-access. It just doesn't provide length.
Jan 15 2011
On 1/15/11 11:15 CST, Steven Schveighoffer wrote:On Fri, 14 Jan 2011 12:03:28 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Here you are trying to equate a dictionary (map/hashtable) with an otherwise unstructured variable length encoded array. I don't think there is much similarity.On 1/14/11 5:06 AM, Steven Schveighoffer wrote:I'd actually go as far as saying that any container you can do a quick lookup given an index (O(lgn) or better) is random-access. For example, a dictionary file of english words sorted alphabetically can look up any word definition in O(lgn) time, but you can't look up the nth word in O(lgn) time.On Thu, 13 Jan 2011 23:03:35 -0500, Steven Wawryk <stevenw acres.com.au> wrote:I equally respectfully disagree. I think random access is defined as accessing the ith element in O(1) time. That's not the case here.Because at the code-point level it *isn't* a random-access range and the index makes no sense at the code-point level, only at the code-unit level. It's encouraging the confusion of 2 distinctly different abstractions or "views" of the same data. All the slicing and indexing you're artificially putting in the code-point range is already available in the code-unit range, and its only benefit is to allow the user to save typing ".data".I respectfully disagree. A stream built on fixed-sized units, but with variable length elements, where you can determine the start of an element in O(1) time given a random index absolutely provides random-access. It just doesn't provide length.What do you call this type of container if not random access? Indexed maybe? Whatever term you call it, I'd call it that, and say opIndex belongs for that container. -SteveI'm not sure how you can call it, but it's definitely not random access. Andrei
Jan 15 2011
On Sat, 15 Jan 2011 13:19:34 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:On 1/15/11 11:15 CST, Steven Schveighoffer wrote:I see a lot of similarity. Both have sparsely populated indexes. Both cannot determine their length unless you save it separately. Both are encoded on an array of fixed-length elements. Both store their data sorted by index. The difference is that the index is a size_t for one and a string for the other.On Fri, 14 Jan 2011 12:03:28 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Here you are trying to equate a dictionary (map/hashtable) with an otherwise unstructured variable length encoded array. I don't think there is much similarity.On 1/14/11 5:06 AM, Steven Schveighoffer wrote:I'd actually go as far as saying that any container you can do a quick lookup given an index (O(lgn) or better) is random-access. For example, a dictionary file of english words sorted alphabetically can look up any word definition in O(lgn) time, but you can't look up the nth word in O(lgn) time.On Thu, 13 Jan 2011 23:03:35 -0500, Steven Wawryk <stevenw acres.com.au> wrote:I equally respectfully disagree. I think random access is defined as accessing the ith element in O(1) time. That's not the case here.Because at the code-point level it *isn't* a random-access range and the index makes no sense at the code-point level, only at the code-unit level. It's encouraging the confusion of 2 distinctly different abstractions or "views" of the same data. All the slicing and indexing you're artificially putting in the code-point range is already available in the code-unit range, and its only benefit is to allow the user to save typing ".data".I respectfully disagree. A stream built on fixed-sized units, but with variable length elements, where you can determine the start of an element in O(1) time given a random index absolutely provides random-access. It just doesn't provide length.The former complaint was "because this != random access, opIndex doesn't belong." If we can get past the terminology of what this is, I think at least opSlice belongs and I still think opIndex belongs. Can you not see the value of 'give me the element at index X' giving you back a full element and not just a code-unit? If opIndex doesn't belong unless it's a random access container (i.e. array), then we should get rid of the operator for hashes and sorted trees. -SteveWhat do you call this type of container if not random access? Indexed maybe? Whatever term you call it, I'd call it that, and say opIndex belongs for that container. -SteveI'm not sure how you can call it, but it's definitely not random access.
Jan 15 2011
On 1/15/11 2:32 PM, Steven Schveighoffer wrote:On Sat, 15 Jan 2011 13:19:34 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I think this is way off. We're comparing a highly structured topology with a simple linear encoding.On 1/15/11 11:15 CST, Steven Schveighoffer wrote:I see a lot of similarity. Both have sparsely populated indexes. Both cannot determine their length unless you save it separately. Both are encoded on an array of fixed-length elements. Both store their data sorted by index. The difference is that the index is a size_t for one and a string for the other.On Fri, 14 Jan 2011 12:03:28 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Here you are trying to equate a dictionary (map/hashtable) with an otherwise unstructured variable length encoded array. I don't think there is much similarity.On 1/14/11 5:06 AM, Steven Schveighoffer wrote:I'd actually go as far as saying that any container you can do a quick lookup given an index (O(lgn) or better) is random-access. For example, a dictionary file of english words sorted alphabetically can look up any word definition in O(lgn) time, but you can't look up the nth word in O(lgn) time.On Thu, 13 Jan 2011 23:03:35 -0500, Steven Wawryk <stevenw acres.com.au> wrote:I equally respectfully disagree. I think random access is defined as accessing the ith element in O(1) time. That's not the case here.Because at the code-point level it *isn't* a random-access range and the index makes no sense at the code-point level, only at the code-unit level. It's encouraging the confusion of 2 distinctly different abstractions or "views" of the same data. All the slicing and indexing you're artificially putting in the code-point range is already available in the code-unit range, and its only benefit is to allow the user to save typing ".data".I respectfully disagree. A stream built on fixed-sized units, but with variable length elements, where you can determine the start of an element in O(1) time given a random index absolutely provides random-access. It just doesn't provide length.I do see the value. I think it's essential to reckon that (1) that is not random access, (2) that allows people to write wrong code because they believe they have random access.The former complaint was "because this != random access, opIndex doesn't belong." If we can get past the terminology of what this is, I think at least opSlice belongs and I still think opIndex belongs. Can you not see the value of 'give me the element at index X' giving you back a full element and not just a code-unit?What do you call this type of container if not random access? Indexed maybe? Whatever term you call it, I'd call it that, and say opIndex belongs for that container. -SteveI'm not sure how you can call it, but it's definitely not random access.If opIndex doesn't belong unless it's a random access container (i.e. array), then we should get rid of the operator for hashes and sorted trees.The existence of opIndex for hashes and ordered trees is not being contended, and throws red herrings into the discussion. It's all very simple. At the end of the day, a string type must make a Boolean decision: the user is aware of an underlying variable-length encoding, or not. Allow me to enumerate the ways I know that hide the variable encoding 100% from the user: 1. Expose a bidirectional range. Now, for a variety of reasons, it is efficient to have access to the index in the supporting structure for the encoding. That's great, but it should be understood that offering such access allows people who don't understand the underlying variable-length encoding to write wrong code. Bottom line: you can't abstract away the variable-length encoding and offer indexes at the same time. Andrei
Jan 15 2011
Andrei, I note that your specific criticisms of Steve's proposal all relate to the "random-access" of the bidirectional code-point range. This is essentially my objection too. As you are the key person to determine whether it is adopted, what do you think of the remainder of his proposal? Apart from the indexing and slicing of code-points I see a lot of merit to the proposal, especially that it would mean that std.algorithm would not need to special-case for strings any more. It would need only be concerned about whether it has a random-access range or a bidirectional range. The user could choose whether to pass to std.algorithm any of: * a random-access code-unit range * a bidirectional code-point range * a bidirectional grapheme range, etc according to the outcome of the other discussions In the interest of moving this on, would it become acceptable to you if: 1. indexing and slicing of the code-point range were removed? 2. any additional ranges are exposed to the user according to decisions made about graphemes, etc? 3. other constructive criticisms were accommodated? Steve On 15/01/11 03:33, Andrei Alexandrescu wrote:On 1/14/11 5:06 AM, Steven Schveighoffer wrote:I respectfully disagree. A stream built on fixed-sized units, but with variable length elements, where you can determine the start of an element in O(1) time given a random index absolutely provides random-access. It just doesn't provide length.I equally respectfully disagree. I think random access is defined as accessing the ith element in O(1) time. That's not the case here. Andrei
Jan 16 2011
On 1/16/11 6:35 PM, Steven Wawryk wrote:Andrei, I note that your specific criticisms of Steve's proposal all relate to the "random-access" of the bidirectional code-point range. This is essentially my objection too. As you are the key person to determine whether it is adopted, what do you think of the remainder of his proposal?One thing that I don't think is understood is that Walter is the key person here. This is a compiler and druntime change, and a big one. My only possible role is that I can call Walter on Skype with a strong case. Walter has a strong "wanking detector". This whole changing the string type, which is working well, with a type that has its own problems, breaks a lot of code, and at the end of the day makes small improvements - that will trigger the wanking detector for sure, and I simply don't have enough good arguments to counter that.Apart from the indexing and slicing of code-points I see a lot of merit to the proposal, especially that it would mean that std.algorithm would not need to special-case for strings any more.This is a misunderstanding. std.algorithm needs no special casing for strings. It uses isXxxRange which work correctly for strings. Functions in std.algorithm are special-cased for strings when they could gain in efficiency. This is because there is no VLERange abstraction.It would need only be concerned about whether it has a random-access range or a bidirectional range. The user could choose whether to pass to std.algorithm any of: * a random-access code-unit range * a bidirectional code-point range * a bidirectional grapheme range, etc according to the outcome of the other discussions In the interest of moving this on, would it become acceptable to you if: 1. indexing and slicing of the code-point range were removed? 2. any additional ranges are exposed to the user according to decisions made about graphemes, etc? 3. other constructive criticisms were accommodated? SteveI believe the proposed scheme: 1. Changes the language in a major way; 2. Is highly disruptive; 3. Improves the status quo in only minor ways. I'd be much more willing to improve things by e.g. defining the representation() function I talked about a bit ago, and other less disruptive additions. Andrei
Jan 16 2011