digitalmars.D - std.stringbuffer
- Janice Caron (89/89) Apr 29 2008 Hi all,
- Jarrett Billingsley (3/9) Apr 29 2008 Might I ask why a StringBuffer class would be necessary?
- Janice Caron (2/3) Apr 29 2008 Walter has vetoed that one, so it's moot now. :-)
- Bruno Medeiros (12/21) Apr 29 2008 I'm with Jarret here, why the hell do we need a StringBuffer class?
- Me Here (49/63) Apr 29 2008 As one of those that has request "a standard library of string functions...
- Janice Caron (12/24) Apr 30 2008 Yeah, I got that from an earlier post when someone said "What you need
- Sean Kelly (9/14) Apr 30 2008 This would only work for large arrays I'm afraid, given the GC
- Janice Caron (11/13) Apr 30 2008 So does Phobos. std.gc.realloc().
- Sean Kelly (6/19) Apr 30 2008 It's perhaps worth noting here that C++ objects don't typically minimize
- Me Here (60/76) Apr 30 2008 I did laugh. Not quite "any colour you like so long as its black", but
- Janice Caron (20/34) Apr 30 2008 One is file, the other is a folder. std.string is a file, so it can't
- Me Here (31/32) Apr 30 2008 What's in a name? Pre-conceptions of other worlds and other tools.
- Janice Caron (10/13) Apr 30 2008 I've kind of lost track of the number of times I've said this in
- Matti Niemenmaa (12/16) Apr 30 2008 It's possible that, in some obscure case, you can't uppercase UTF-16 in ...
- Janice Caron (8/10) Apr 30 2008 Perhaps surprisingly, that's not so. This is because the alphabets of
-
Janice Caron
(9/9)
Apr 30 2008
Oh, sorry, I didn't read your whole post before replying.
. - Matti Niemenmaa (7/12) Apr 30 2008 You're right, of course. I was referring more to some hypothetical toUpp...
- terranium (2/4) Apr 30 2008 does this have any practical use?
- Janice Caron (7/11) Apr 30 2008 Private use characters can be used for invented alphabets, e.g.
- terranium (2/5) Apr 30 2008 really?
- Janice Caron (7/12) Apr 30 2008 Yes really.
- Sean Kelly (10/24) Apr 30 2008 In all fairness, you can uppercase UTF-8 in place so long as none of
- Me Here (34/48) Apr 30 2008 Ignoring for the moment Matti's pronouncement that this is an obscure an...
- Frits van Bommel (9/26) Apr 30 2008 Actually, you can't uppercase UTF-16 and UTF-32 in-place either if you
- Janice Caron (12/14) Apr 30 2008 I know about that, and for the future I have plans for a proper
- Spacen Jasset (11/28) May 01 2008 I think uppercasing non ascii (english) characters is a more of
- Janice Caron (12/15) May 01 2008 The Unicode Standard defines casing unambiguously for all characters.
- Steven Schveighoffer (6/20) May 01 2008 What about inPlaceToUpperASCII(char[] str)?
- Robert Fraser (10/52) Apr 30 2008 I like StringBuffers :-). Did Walter veto the idea completely or did he
- Janice Caron (4/6) Apr 30 2008 Yeah, he said not a class. And that was probably my fault because in
- Bill Baxter (6/16) Apr 30 2008 Herein lies the genius in Tango's naming conventions. You *can* have
- Steven Schveighoffer (3/18) Apr 30 2008 Not on Windoze :)
- Sean Kelly (6/24) Apr 30 2008 It should still work, I believe. The source file will have a .d extensi...
- Steven Schveighoffer (6/35) Apr 30 2008 Excellent point, I completely forgot that even though you import std.Str...
- Bill Baxter (4/40) May 01 2008 Yes it works fine on Windows too. I pretty much work only on Windows
- Bruno Medeiros (5/12) May 01 2008 Something like this would be completely unacceptable not to work on Wind...
- Steven Schveighoffer (3/10) May 01 2008 I was wrong, look at my response to Sean. Sorry about that.
- Adam D. Ruppe (5/10) Apr 30 2008 --
- Sean Kelly (17/24) Apr 30 2008 D arrays do have this feature, thanks to a suggestion by Derek Parnell. ...
- Me Here (95/95) Apr 30 2008 As my ascii art was screwed by the time it got to the server, here is a
- Janice Caron (4/9) Apr 30 2008 Sorry, I meant
- Pedro Ferreira (2/16) May 02 2008 Weren't 'void[]'s banned?
- Janice Caron (10/12) Apr 29 2008 That's why we're having this discussion.
- Bruno Medeiros (10/25) May 01 2008 "mutable versions were called "by mistake" "? I don't think that point
- Frits van Bommel (7/26) May 01 2008 What if you wanted a modified copy of the input, but that input happened...
- Steven Schveighoffer (14/40) May 01 2008 Any modifying versions would take mutable strings, COW version would req...
- Bruno Medeiros (8/36) May 01 2008 Yes, the idea to distinguish them with a different name sounds good
- Frits van Bommel (18/27) May 01 2008 I don't like 'doToUpper', but something like 'makeUpper' could be a good...
- Simen Kjaeraas (3/6) May 01 2008 So anyone who uses alphabets other than pure english will have to write ...
- Pedro Ferreira (17/31) May 02 2008 (snip)
Hi all, More than one person has complained about the lack of string functions in Phobos which operate on mutable chars. In the thread titled "Is all this Invariant ****....", I suggested creating a new module, std.stringbuffer, to contain two things: (1) a StringBuffer class (2) parallel mutable versions of the functions in std.string. Walter OKed the idea, so it looks like that's a go. To that end, I've looked through the functions in std.string and sorted them into different groups. I think it's important to get the API right so comments are welcome on all of the below: The following functions are incorrectly declared in std.string because they are currently declared to take strings, not const(char)[]. They should be: long atoi(in char[] s) real atof(in char[] s) size_t count(in char[] s, in char[] sub) bool inPattern(dchar c, in char[] pattern) int inPattern(dchar c, in char[][] patterns) size_t countchars(in char[] s, in char[] pattern) bool isNumeric(in char[] s, in bool bAllowSep = false) size_t column(char[] str, int tabsize = 8) The following functions are badly declared in std.string because they are declared to take and return strings. With the following change, they become type agnostic size_t isEmail(in char[] s) size_t isURL(in char[] s) The following function is the /only/ function currently in std.string which takes an optional mutable buffer to use instead of allocating on the heap. For consistency, let's put the mutable version into std.stringbuffer, and let std.string have an invariant version, as follows: string soundex(string s) The remaining functions go in std.stringbuffer. The following functions all take an optional mutable buffer as input into which to write the return value to avoid allocation. char[] tolower(in char[] s, char[] buffer=null) char[] toupper(in char[] s, char[] buffer=null) char[] capitalize(in char[] s, char[] buffer=null) char[] capwords(in char[] s, char[] buffer=null) char[] repeat(in char[] s, size_t n, char[] buffer=null) char[] join(in char[][] words, char[] sep, char[] buffer=null) char[] ljustify(in char[] s, int width, char[] buffer=null) char[] rjustify(in char[] s, int width, char[] buffer=null) char[] center(in char[] s, int width, char[] buffer=null) char[] zfill(in char[] s, int width, char[] buffer=null) char[] replace(in char[] s, in char[] from, in char[] to, char[] buffer=null) char[] replaceSlice(in char[] s, in char[] slice, in char[] replacement, char[] buffer=null) char[] insert(in char[] s, size_t index, in char[] sub, char[] buffer=null) char[] expandtabs(in char[] str, int tabsize=8, char[] buffer=null) char[] entab(in char[] s, int tabsize=8, char[] buffer=null) // in place? char[] maketrans(in char[] from, in char[] to, char[] buffer=null) char[] translate(in char[] s, in char[] transtab, in char[] delchars, char[] buffer=null) char[] succ(in char[] s, char[] buffer=null) char[] soundex(in char[] s, char[] buffer=null) char[] wrap(in char[] s, int columns = 80, in char[] firstindent = null, in char[] indent = null, int tabsize = 8, char[] buffer=null) The following functions I am uncertain about. They could be declared to take a mutable buffer as input, consistent with the above. /Or/ they could operate on data in place. Opinions are welcome. char[] removechars(in char[] s, in char[] pattern, char[] buffer=null) // in place? char[] squeeze(in char[] s, in char[] pattern = null, char[] buffer=null) // in place? char[] tr(in char[] str, in char[] from, in char[] to, in char[] modifiers=null, char[] buffer=null) // in place? The following functions need to be overloaded for both const and mutable input char[][] split(char[] s) const(char)[][] split(const(char)[] s) char[][] split(char[] s, in char[] delim) const(char)[][] split(const(char)[] s, in char[] delim) char[][] splitlines(char[] s) const(char)[][] splitlines((char)[] s) char[] stripl(char[] s) const(char)[] stripl(const(char)[] s) char[] stripr(char[] s) const(char)[] stripr(const(char)[] s) char[] strip(char[] s) const(char)[] strip(const(char)[] s) char[] chop(char[] s) const(char)[] chop(const(char)[] s) Not sure what to do about the following one. AAs of mutable arrays are notoriously difficult to get bug free. Should we bother with this one? char[][char[]] abbrev(in char[][] values) // May be impractical Finally - what do we all think about the inconstitent capitalization thoughout std.string. (toupper versus toString, capwords versus endsWith, etc.)
Apr 29 2008
"Janice Caron" <caron800 googlemail.com> wrote in message news:mailman.508.1209497029.2351.digitalmars-d puremagic.com...Hi all, More than one person has complained about the lack of string functions in Phobos which operate on mutable chars. In the thread titled "Is all this Invariant ****....", I suggested creating a new module, std.stringbuffer, to contain two things: (1) a StringBuffer classMight I ask why a StringBuffer class would be necessary?
Apr 29 2008
2008/4/29 Jarrett Billingsley <kb3ctd2 yahoo.com>:Might I ask why a StringBuffer class would be necessary?Walter has vetoed that one, so it's moot now. :-)
Apr 29 2008
A couple of thoughts: Janice Caron wrote:Hi all, More than one person has complained about the lack of string functions in Phobos which operate on mutable chars. In the thread titled "Is all this Invariant ****....", I suggested creating a new module, std.stringbuffer, to contain two things: (1) a StringBuffer class (2) parallel mutable versions of the functions in std.string.I'm with Jarret here, why the hell do we need a StringBuffer class? 'string' is not a class either, so just use char[]. I would recomment aliasing char[] to 'mstring' (short for mutable string. I think such an alias is more readable than 'char[]' Also, is there a reason why these mutable functions shouldn't be in std.string, together with their invariant/const brethren? I don't think it makes sense to have another package if one opt by the (2) solution. -- Bruno Medeiros - Software Developer, MSc. in CS/E graduate http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Apr 29 2008
Bruno Medeiros wrote:A couple of thoughts:As one of those that has request "a standard library of string functions that accept and return mutable strings Ie. char[]", I see no reason it should be a class, free function seem to work just fine. A class would just be bloat. I would be perfectly happy for these to co-exist in the std.string space. Indeed I would prefer it. If a separate namespace is deamed /essential/, then I see no reason to go with, and certainly did "ask for" it to be called the misleading name of std.StringBuffer. As far as I recall, that was Janice's own suggestion. For preference, if a separate namespace is absolutely necessary, I go for: std.string.mutable The right namespace and does what is says on the tin. Further, /if/ I had any input to the design, then the suggestion for me to have to pass in preallocated buffers to accomodate the mutated data if it needs to grow would be scotched forthwith. They should look and work in exactly the same way as the existing v2 std.string functions taking the same number and order of parameters. Just char[] (or compatible alias) instead of string. If the buffers need to grow, then allocate space from wherever (I assume the heap) that std.string allocates from now. If they do not change size, then return the original intact. If they shrink, and if the D array internals permit this, then adjust the .length attribute whilst leaving the actual allocation unchanged. That way it is there for use should further mutations cause it to grow again. This also helps prevent heap fragmentation if the functions are called on heap allocated data. Finally, if the retention of unused but allocated space in an array is a feature of the current design, then I would add a debug time warning indicating when a char[] has had to be grown. These could be used during devlopment to adjust the preallocated size of arrays to be large enough to accomodate all (most? typical?) requirements. In summary: mutable string functions shoudl do exactly the same as the invarient functions do now, except only reallocate if necessary and (optionally) issue a warning under debug if they have to. Seems almost as if a template solution could be used, except that I think the additional conditional code would hamper the performance of both instantiations. Unless Ds templating is capable of optiising away branches of code that relate to the /other/ type instantiations? I've had no occasion to use templates in D yet, so that might be pie in the sky. --std.stringbuffer, to contain two things: (1) a StringBuffer class (2) parallel mutable versions of the functions in std.string.I'm with Jarret here, why the hell do we need a StringBuffer class? 'string' is not a class either, so just use char[]. I would recomment aliasing char[] to 'mstring' (short for mutable string. I think such an alias is more readable than 'char[]' Also, is there a reason why these mutable functions shouldn't be in std.string, together with their invariant/const brethren? I don't think it makes sense to have another package if one opt by the (2) solution.
Apr 29 2008
2008/4/30 Me Here <p9e883002 sneakemail.com>:If a separate namespace is deamed /essential/, then I see no reason to go with, and certainly did "ask for" it to be called the misleading name of std.StringBuffer. As far as I recall, that was Janice's own suggestion.Yeah, I got that from an earlier post when someone said "What you need is a string buffer" in response to some question. The name can be anything we want it to be.For preference, if a separate namespace is absolutely necessary, I go for: std.string.mutableExcept "std.string.anything" :-) "std.string" is a module, so it can't also be a package. That's a limitation of the D language.Finally, if the retention of unused but allocated space in an array is a feature of the current design, then I would add a debug time warning indicating when a char[] has had to be grown. These could be used during devlopment to adjust the preallocated size of arrays to be large enough to accomodate all (most? typical?) requirements.I would support the addition of some function like gc.minimise(char[]) which returned all the unused space following the end of the array back to the gc, without any copying of the used part. I wouldn't be able to write that though - the gc is not my area of expertise.
Apr 30 2008
== Quote from Janice Caron (caron800 googlemail.com)'s articleI would support the addition of some function like gc.minimise(char[]) which returned all the unused space following the end of the array back to the gc, without any copying of the used part. I wouldn't be able to write that though - the gc is not my area of expertise.This would only work for large arrays I'm afraid, given the GC implementation for D--it uses fixed-size blocks until the block size is 4096 bytes or larger. Also, the shrinking would be done in chunks of 4096 bytes, so a fairly substantial size change would have to occur for anything to happen at all. That said, things get a lot easier if moving the block is allowed. Tango even exposes a GC.realloc() routine which will do this for you. Sean
Apr 30 2008
2008/4/30 Sean Kelly <sean invisibleduck.org>:Tango even exposes a GC.realloc() routine which will do this for you.So does Phobos. std.gc.realloc(). However, realloc() doesn't promise not to copy, and not copying is the objective. Thanks for all the cool info, but I just think programmers would just feel more "comfortable" if, after they've done all their in-place string manipulations, they can call some minimizing function, even if only to give them a warm fuzzy feeling that they're not wasting any more memory than is necessary. Frankly, it could even be implemented a do-nothing function. That way, at least "blame" for excessive memory use passes from the programmer to Phobos, and future gc implementations might do things differently.
Apr 30 2008
== Quote from Janice Caron (caron800 googlemail.com)'s article2008/4/30 Sean Kelly <sean invisibleduck.org>:It's perhaps worth noting here that C++ objects don't typically minimize either. That's why Scott Meyers (?) proposed the idiom: myVector.swap(std::vector(myVector));Tango even exposes a GC.realloc() routine which will do this for you.So does Phobos. std.gc.realloc(). However, realloc() doesn't promise not to copy, and not copying is the objective. Thanks for all the cool info, but I just think programmers would just feel more "comfortable" if, after they've done all their in-place string manipulations, they can call some minimizing function, even if only to give them a warm fuzzy feeling that they're not wasting any more memory than is necessary.Frankly, it could even be implemented a do-nothing function. That way, at least "blame" for excessive memory use passes from the programmer to Phobos, and future gc implementations might do things differently.Fair enough. Sean
Apr 30 2008
Janice Caron wrote:The name can be anything we want it to be. ... Except "std.string.anything" :-)I did laugh. Not quite "any colour you like so long as its black", but close :)"std.string" is a module, so it can't also be a package. That's a limitation of the D language.Now. This is where you show me up to be nothing but a pretender in this forum. I have no idea what the distinction is be tween thos two in D.I /think/ you may have misunderstood my intent here. Unsurprising cos it was badly outlined. And I'm not at all sure that D works this way. In, for example, Perl, an array can be pre-sized but then set to be empty. That is, it can have space preallocated to it, but contain nothing. Likewise strings have two length attributes internally. - one denotes the length of the contents, as woudl be returned to the program by the length() function. - one indicated the actual length of the ram allocated to it. This allows, or example, chomp() to simply move adjust a number (the program visible length) and do not adjustment or reallocation at all. It can also adjust the left hand end of the contents effectively foreshortening the string, again without adjusting the allocation. So visually, a scalar holding a string might at some point in its life look something like: (this ascii art is going to come out a mess on the server but...) header [ offset ] |--------+ [actualLen ]--------------------------------------------------------------------------> [pgmVisible] |------------------------------------------------> [pointer ]----v | [][][][][][][the contents the program can see is here][][][][][][] Basically, it start out with offset zero and only as much padding (if any) as is required to bring it to suitable alignment. But if you remove characters at the end (chomp or chop) then the padding grows as the content shrink and nothing is allocated. If you remove characters from the front of the string the offset accomodates that and the allocation doesn't change. And if further mutations expand the string, then these spaces are reused before a new allocation is made. If for example, you know you are going to be build ia long string up piecewise from small appendages, you can inilialise it to some length big enough for the expected final length and the truncate it (assign '' to it) and it will retain its allocation, even though the program visible length is zero. Then, as you add stuff to it, it grows into the allocation. My point was that /if/ Ds arrays have a similar capability, to be preallocated large and empty and grow into the space then when a mutation requires a reallocation of a mutable array because it has outgrown its original allocation, then a debug-enabled warning saying by how much, might allow the programmer to preallocate the initial mutable array larger and so avoid reallocation at runtime. There's a whole heap of speculation about what might be going on inside D that I have no real knowledge of at all. Note:There is no suggestion here that D shoudl work this way. Only that if it does allow preallocation of arrays sizes, then a warning when a mutation causes allocation would allow the programmer to best use that facility. Cheers, b. --Finally, if the retention of unused but allocated space in an array is a feature of the current design, then I would add a debug time warning indicating when a char[] has had to be grown. These could be used during devlopment to adjust the preallocated size of arrays to be large enough to accomodate all (most? typical?) requirements.I would support the addition of some function like gc.minimise(char[]) which returned all the unused space following the end of the array back to the gc, without any copying of the used part. I wouldn't be able to write that though - the gc is not my area of expertise.
Apr 30 2008
2008/4/30 Me Here <p9e883002 sneakemail.com>:One is file, the other is a folder. std.string is a file, so it can't also be a folder."std.string" is a module, so it can't also be a package. That's a limitation of the D language.Now. This is where you show me up to be nothing but a pretender in this forum. I have no idea what the distinction is be tween thos two in D.I /think/ you may have misunderstood my intent here. Unsurprising cos it was badly outlined. And I'm not at all sure that D works this way. In, for example, Perl, an array can be pre-sized but then set to be empty. That is, it can have space preallocated to it, but contain nothing. Likewise strings have two length attributes internally. - one denotes the length of the contents, as woudl be returned to the program by the length() function. - one indicated the actual length of the ram allocated to it.Well, that's what a StringBuffer would do, but nobody seemed to like the idea. A string contains two pieces of information: (1) ptr, and (2) length. A StringBuffer would carry a third piece of information: (3) capacity. (Actually, in general it would be Buffer!(T), with StringBuffer just being a special case). Built in-strings to have a capacity, but it's not carried round in a field. Instead. to find the capacity of an array, you have to call std.gc.capacity(array) - and I can't see how there can not be a performance hit there. Increasing the length of a D array doesn't necessarily mean reallocating (although as noted above, the code has to do some work to find out the capacity), but it /does/ mean re-initialising the newly exposed elements. Again, that has to be a performance hit. With a Buffer!(), you could increase the length (up to capacity) not only without reallocating but also without reinitializing, just by changing the value of an int. But <shrugs> - if people don't want StringBuffers, who am I to argue?
Apr 30 2008
Janice Caron wrote:But <shrugs> - if people don't want StringBuffers, who am I to argue?What's in a name? Pre-conceptions of other worlds and other tools. Specifically Java. Additionally, the casing suggests a class? For my part, I simply want string functions that operate on char[]s. Because, I percieve that for the type of mutations I am currently doing, Invarient strings would incur too high a cost. If your StringBuffer concept would accept and manipulate char[]s and not require the instantiation, initialisation and syntax of an object. By which I mean that if having used a string function upon my char[] I can still apply slice operations to it using the standard syntax. And then apply another string function, and then another slice. Or even, apply a string function to a slice of a larger string and mutate that larger string, in-place through the slice: char[] a = ...2000 chars from somewhere. char[] field1 = a[ 312 .. 357 ]; field1.toUpper(); char[] checksum = a[ $-16 .. $ ]; checksum = md5hex( a ); ... Then I will be very happy. Beyond that, I have no requirements :) All the stuff about warnings and internal and external lengths was just speclation about what might be going on inside on the basis of what I know, have seen (Perl) and have personally implemented. (Not Perl). Cheers, b. Ps. Is there a paper/article/reference on the reasoning behind Invariant strings somewhere? --
Apr 30 2008
2008/4/30 Me Here <p9e883002 sneakemail.com>:char[] a = ...2000 chars from somewhere. char[] field1 = a[ 312 .. 357 ]; field1.toUpper();I've kind of lost track of the number of times I've said this in recent days, but... You cannot uppercase in place, because for any given dchar, c, the number of UTF-8 bytes required to express c may be different from the number of UTF-8 bytes required to express toupper(c). If any of you have plans to uppercase or lowercase UTF-8 in place, forget that now. It just ain't possible. (You can uppercase ASCII, UTF-16, or UTF-32 in place. But not UTF-8, and char[], by definition, is UTF-8).
Apr 30 2008
Janice Caron wrote:If any of you have plans to uppercase or lowercase UTF-8 in place, forget that now. It just ain't possible. (You can uppercase ASCII, UTF-16, or UTF-32 in place. But not UTF-8, and char[], by definition, is UTF-8).It's possible that, in some obscure case, you can't uppercase UTF-16 in place either. A code point in the private use area (U+E000 to U+F8FF), which can be represented with one UTF-16 code unit, may uppercase to something in the supplementary private use areas (U+F0000 upwards), whose code points require two UTF-16 code units each. Of course the toUpper function in question must be aware of this configuration of the private use areas. This is an extremely contrived case and I doubt it'll ever come up in practice, anywhere, but in theory it might. <g> -- E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi
Apr 30 2008
2008/4/30 Matti Niemenmaa <see_signature for.real.address>:It's possible that, in some obscure case, you can't uppercase UTF-16 in place either.Perhaps surprisingly, that's not so. This is because the alphabets of *ALL* living languages exist within Unicode's "Basic Multilingual Plane" (...which is to say, they can be encoded in a single wchar). The characters outside the BMP (...those which need a dchar, not a wchar...) are the letters of dead languages, or other special symbols. The probability that a letter from a living language will uppercase to a letter of a dead language is as near to zero as makes no odds.
Apr 30 2008
Oh, sorry, I didn't read your whole post before replying. <embarrassed>. OK, so private use characters might be a contrived exception. BUT, nobody expects toUpper() to acknowledge private use characters. That would require a run-time extensibility mechanism which is way beyond what toUpper() does now, and likely beyond anything it's ever likely to do any time soon. Maybe some future Unicode library with a registerPrivateUseCharacters() function might cover that functionality, but there are no plans for that on the table right now. (And even then - as you say - it's a /very/ contrived case).
Apr 30 2008
Janice Caron wrote:OK, so private use characters might be a contrived exception. BUT, nobody expects toUpper() to acknowledge private use characters. That would require a run-time extensibility mechanism which is way beyond what toUpper() does now, and likely beyond anything it's ever likely to do any time soon.You're right, of course. I was referring more to some hypothetical toUpper() function rather than one which I would expect to find in any standard library---the generic case of "uppercasing a character" as opposed to std.string[buffer].toUpper. -- E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi
Apr 30 2008
Matti Niemenmaa Wrote:A code point in the private use area (U+E000 to U+F8FF), which can be represented with one UTF-16 code unit, may uppercase to something in thedoes this have any practical use?
Apr 30 2008
2008/4/30 terranium <spam here.lot>:Matti Niemenmaa Wrote: > A code point in the private use area (U+E000 to U+F8FF), which can be > represented with one UTF-16 code unit, may uppercase to something in the does this have any practical use?Private use characters can be used for invented alphabets, e.g. Klingon, or my-made-up-funky-alphabet. You can define them to be whatever you want. However the mechanism for /interpreting/ such characters is outside the scope of Unicode. All co-operating applications have to have the same knowledge of what those characters "mean".
Apr 30 2008
Janice Caron Wrote:You cannot uppercase in place, because for any given dchar, c, the number of UTF-8 bytes required to express c may be different from the number of UTF-8 bytes required to express toupper(c).really?
Apr 30 2008
2008/4/30 terranium <spam here.lot>:Janice Caron Wrote: > You cannot uppercase in place, because for any given dchar, c, the > number of UTF-8 bytes required to express c may be different from the > number of UTF-8 bytes required to express toupper(c). really?Yes really. toUpper( '\u2C65' ) == '\u023A' toLower( '\u023A' ) == '\u2C65' '\u023A' requires two bytes in UTF-8 '\u2C65' requires three bytes in UTF-8 Not a problem in UTF-16, of course.
Apr 30 2008
== Quote from Janice Caron (caron800 googlemail.com)'s article2008/4/30 Me Here <p9e883002 sneakemail.com>:In all fairness, you can uppercase UTF-8 in place so long as none of the characters within the string require a multi-byte capital. Thus one questionable strategy would be to uppercase in place until the first multibyte conversion is required. The obvious downside being that the original buffer may end up partially capitalized, with the fully capitalized result returned in a new buffer. I'm sure people processing ASCII text would love this, but I can see it causing problems elsewhere. Seanchar[] a = ...2000 chars from somewhere. char[] field1 = a[ 312 .. 357 ]; field1.toUpper();I've kind of lost track of the number of times I've said this in recent days, but... You cannot uppercase in place, because for any given dchar, c, the number of UTF-8 bytes required to express c may be different from the number of UTF-8 bytes required to express toupper(c). If any of you have plans to uppercase or lowercase UTF-8 in place, forget that now. It just ain't possible. (You can uppercase ASCII, UTF-16, or UTF-32 in place. But not UTF-8, and char[], by definition, is UTF-8).
Apr 30 2008
Janice Caron wrote:2008/4/30 Me Here <p9e883002 sneakemail.com>:Ignoring for the moment Matti's pronouncement that this is an obscure and unlikely event, it really depends upon how the library is coded. For example, if the case change is effected in place for the majority of cases when it can be, when the occasion occurs that it cannot, and raises a runtime exception, catch the error and use replaceSlice to handle it: import std.stdio; import std.string; int main( char[][] args ) { char [] s = "the quick brown fox"; try{ s[ 8 .. 9 ] = \u1234; } catch { s = s.replaceSlice( s[ 8 .. 9 ], \u1234 ); } writefln( s ); return 0; } Though it would be (much) nicer if the builtin lvalue slice handled this for us. I was just disappointed for the second to (re)discover this imitation of Ds slicing. I had forgotten because other languages I used do not. This is one of those things that I doubt I will ever agree with the decision. But I'm just another jerk on the internet with an opinion, and we all know what that is analogous to. If the language doesn't handle it, the the library should. If it doesn't, then I will have to. And you, and Bill and Fred and Sue ,,, Cheers, b. --char[] a = ...2000 chars from somewhere. char[] field1 = a[ 312 .. 357 ]; field1.toUpper();I've kind of lost track of the number of times I've said this in recent days, but... You cannot uppercase in place, because for any given dchar, c, the number of UTF-8 bytes required to express c may be different from the number of UTF-8 bytes required to express toupper(c). If any of you have plans to uppercase or lowercase UTF-8 in place, forget that now. It just ain't possible. (You can uppercase ASCII, UTF-16, or UTF-32 in place. But not UTF-8, and char[], by definition, is UTF-8).
Apr 30 2008
Janice Caron wrote:2008/4/30 Me Here <p9e883002 sneakemail.com>:Actually, you can't uppercase UTF-16 and UTF-32 in-place either if you want to be entirely correct. For example: \u00df ("ß") --> \u0053 \u0053 ("SS"). This increases the byte count for both UTF-16 and UTF-32. (This does work for UTF-8 though, since \u00df happens to require 2 UTF-8 code units, and both \u0053s only one each) (See <http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt> for what should be a complete list of characters with similar annoying casing properties)char[] a = ...2000 chars from somewhere. char[] field1 = a[ 312 .. 357 ]; field1.toUpper();I've kind of lost track of the number of times I've said this in recent days, but... You cannot uppercase in place, because for any given dchar, c, the number of UTF-8 bytes required to express c may be different from the number of UTF-8 bytes required to express toupper(c). If any of you have plans to uppercase or lowercase UTF-8 in place, forget that now. It just ain't possible. (You can uppercase ASCII, UTF-16, or UTF-32 in place. But not UTF-8, and char[], by definition, is UTF-8).
Apr 30 2008
2008/5/1 Frits van Bommel <fvbommel remwovexcapss.nl>:Actually, you can't uppercase UTF-16 and UTF-32 in-place either if you want to be entirely correct. For example: \u00df ("ß") --> \u0053 \u0053 ("SS").I know about that, and for the future I have plans for a proper unicode lib with normalisation, full casing, etc. However - none of that is the job of std.string.toUpper() or std.string.toLower(). These functions only need to /simple/ casing, not /full/ casing, and in /simple/ casing, one dchar always maps to one dchar. In particular '\u00DF' maps to '\u00DF'. In full casing, toLower('\u1E9E') (LATIN CAPITAL LETTER SHARP S) is '\u00DF' (LATIN SMALL LETTER SHARP S), but the converse is not true. What fun! :-). But full casing is not the concern of std.string (nor of std.stringbuffer, or whatever we end up calling it), so we don't need to worry about that here.
Apr 30 2008
Janice Caron wrote:2008/4/30 Me Here <p9e883002 sneakemail.com>:I think uppercasing non ascii (english) characters is a more of specialised business anyway (some languages have no notion of upper case, and yet others depend on context), which often should be perfomed by a presentation layer. People need a toupper/lower all the time, and 90% of the time they use it on strings that are in the ascii range, often because they deal with protocols, file formats and other such things. In which case phobos's string.toupper shouldn't really be doing work outside of ascii, in my opinion anyway. This also means that a string can be uppercased in place.char[] a = ...2000 chars from somewhere. char[] field1 = a[ 312 .. 357 ]; field1.toUpper();I've kind of lost track of the number of times I've said this in recent days, but... You cannot uppercase in place, because for any given dchar, c, the number of UTF-8 bytes required to express c may be different from the number of UTF-8 bytes required to express toupper(c). If any of you have plans to uppercase or lowercase UTF-8 in place, forget that now. It just ain't possible. (You can uppercase ASCII, UTF-16, or UTF-32 in place. But not UTF-8, and char[], by definition, is UTF-8).
May 01 2008
On 01/05/2008, Spacen Jasset <spacenjasset yahoo.co.uk> wrote:I think uppercasing non ascii (english) characters is a more of specialised business anyway (some languages have no notion of upper case, and yet others depend on context), which often should be perfomed by a presentation layer.The Unicode Standard defines casing unambiguously for all characters. Yes, toupper() of a Chinese character will leave it unchanged, but it's still defined, and that is /not/ locale dependent. However, casing in place is possible for UTF-8 if you're prepared to throw an exception for those (extremely rare) cases when the sequence length changes. So that means, you'd need two versions, the in-place version toUpperInPlace(char[] s) // might throw and the general version char[] toUpper(const(char)[] s, char[] buffer=null) That could be done
May 01 2008
"Janice Caron" wrote2008/4/30 Me Here:What about inPlaceToUpperASCII(char[] str)? in other words, yeah, toUpper can use a UTF-8 string, and return a UTF-8 string, but I can see use in having a function that expects to receive ASCII and uppercases in-place. The function would be a lot simpler in any case :) -Stevechar[] a = ...2000 chars from somewhere. char[] field1 = a[ 312 .. 357 ]; field1.toUpper();I've kind of lost track of the number of times I've said this in recent days, but... You cannot uppercase in place, because for any given dchar, c, the number of UTF-8 bytes required to express c may be different from the number of UTF-8 bytes required to express toupper(c). If any of you have plans to uppercase or lowercase UTF-8 in place, forget that now. It just ain't possible. (You can uppercase ASCII, UTF-16, or UTF-32 in place. But not UTF-8, and char[], by definition, is UTF-8).
May 01 2008
Janice Caron wrote:2008/4/30 Me Here <p9e883002 sneakemail.com>:I like StringBuffers :-). Did Walter veto the idea completely or did he say "not a class". I'd use a struct - there's no extra bloat, the interface can be encapsulated, and people can use a pointer if they're passing between functions (since it will most often be used within the scope of a single function anyway). Or just pass it on the stack, if it's guaranteed to only be 3 DWORDs. My suggestion (grain of salt) is to represent them similarly to the way mtext does by using two bits somewhere to hold the character type (char, wchar, dchar) and change character types as needed.One is file, the other is a folder. std.string is a file, so it can't also be a folder."std.string" is a module, so it can't also be a package. That's a limitation of the D language.Now. This is where you show me up to be nothing but a pretender in this forum. I have no idea what the distinction is be tween thos two in D.I /think/ you may have misunderstood my intent here. Unsurprising cos it was badly outlined. And I'm not at all sure that D works this way. In, for example, Perl, an array can be pre-sized but then set to be empty. That is, it can have space preallocated to it, but contain nothing. Likewise strings have two length attributes internally. - one denotes the length of the contents, as woudl be returned to the program by the length() function. - one indicated the actual length of the ram allocated to it.Well, that's what a StringBuffer would do, but nobody seemed to like the idea. A string contains two pieces of information: (1) ptr, and (2) length. A StringBuffer would carry a third piece of information: (3) capacity. (Actually, in general it would be Buffer!(T), with StringBuffer just being a special case). Built in-strings to have a capacity, but it's not carried round in a field. Instead. to find the capacity of an array, you have to call std.gc.capacity(array) - and I can't see how there can not be a performance hit there. Increasing the length of a D array doesn't necessarily mean reallocating (although as noted above, the code has to do some work to find out the capacity), but it /does/ mean re-initialising the newly exposed elements. Again, that has to be a performance hit. With a Buffer!(), you could increase the length (up to capacity) not only without reallocating but also without reinitializing, just by changing the value of an int. But <shrugs> - if people don't want StringBuffers, who am I to argue?
Apr 30 2008
2008/4/30 Robert Fraser <fraserofthenight gmail.com>:I like StringBuffers :-). Did Walter veto the idea completely or did he say "not a class".Yeah, he said not a class. And that was probably my fault because in my first post on this thread I used the word "class". Janice
Apr 30 2008
Janice Caron wrote:2008/4/30 Me Here <p9e883002 sneakemail.com>:Herein lies the genius in Tango's naming conventions. You *can* have both a package std.string, and a module named std.String. If you consistently use different case for package and module names, then you can have your cake and eat it too. --bbOne is file, the other is a folder. std.string is a file, so it can't also be a folder."std.string" is a module, so it can't also be a package. That's a limitation of the D language.Now. This is where you show me up to be nothing but a pretender in this forum. I have no idea what the distinction is be tween thos two in D.
Apr 30 2008
"Bill Baxter" wroteJanice Caron wrote:Not on Windoze :) -Steve2008/4/30 Me Here :Herein lies the genius in Tango's naming conventions. You *can* have both a package std.string, and a module named std.String. If you consistently use different case for package and module names, then you can have your cake and eat it too.One is file, the other is a folder. std.string is a file, so it can't also be a folder."std.string" is a module, so it can't also be a package. That's a limitation of the D language.Now. This is where you show me up to be nothing but a pretender in this forum. I have no idea what the distinction is be tween thos two in D.
Apr 30 2008
== Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article"Bill Baxter" wroteIt should still work, I believe. The source file will have a .d extension and the folder won't, so there shouldn't be a filesystem collision. Or are you saying that the compiler does some checking behind the scenes anyway? I'll admit I've never actually tried this. SeanJanice Caron wrote:Not on Windoze :)2008/4/30 Me Here :Herein lies the genius in Tango's naming conventions. You *can* have both a package std.string, and a module named std.String. If you consistently use different case for package and module names, then you can have your cake and eat it too.One is file, the other is a folder. std.string is a file, so it can't also be a folder."std.string" is a module, so it can't also be a package. That's a limitation of the D language.Now. This is where you show me up to be nothing but a pretender in this forum. I have no idea what the distinction is be tween thos two in D.
Apr 30 2008
"Sean Kelly" wrote== Quote from Steven SchveighofferExcellent point, I completely forgot that even though you import std.String, you are really looking at the file std/String.d. In that case, I think you are right, it would work on Windoze. -Steve"Bill Baxter" wroteIt should still work, I believe. The source file will have a .d extension and the folder won't, so there shouldn't be a filesystem collision. Or are you saying that the compiler does some checking behind the scenes anyway? I'll admit I've never actually tried this.Janice Caron wrote:Not on Windoze :)2008/4/30 Me Here :Herein lies the genius in Tango's naming conventions. You *can* have both a package std.string, and a module named std.String. If you consistently use different case for package and module names, then you can have your cake and eat it too.One is file, the other is a folder. std.string is a file, so it can't also be a folder."std.string" is a module, so it can't also be a package. That's a limitation of the D language.Now. This is where you show me up to be nothing but a pretender in this forum. I have no idea what the distinction is be tween thos two in D.
Apr 30 2008
Steven Schveighoffer wrote:"Sean Kelly" wroteYes it works fine on Windows too. I pretty much work only on Windows testing things occasionally on VMWare Linux. --bb== Quote from Steven SchveighofferExcellent point, I completely forgot that even though you import std.String, you are really looking at the file std/String.d. In that case, I think you are right, it would work on Windoze. -Steve"Bill Baxter" wroteIt should still work, I believe. The source file will have a .d extension and the folder won't, so there shouldn't be a filesystem collision. Or are you saying that the compiler does some checking behind the scenes anyway? I'll admit I've never actually tried this.Janice Caron wrote:Not on Windoze :)2008/4/30 Me Here :Herein lies the genius in Tango's naming conventions. You *can* have both a package std.string, and a module named std.String. If you consistently use different case for package and module names, then you can have your cake and eat it too.One is file, the other is a folder. std.string is a file, so it can't also be a folder."std.string" is a module, so it can't also be a package. That's a limitation of the D language.Now. This is where you show me up to be nothing but a pretender in this forum. I have no idea what the distinction is be tween thos two in D.
May 01 2008
Steven Schveighoffer wrote:"Bill Baxter" wrote Not on Windoze :) -SteveSomething like this would be completely unacceptable not to work on Windows. -- Bruno Medeiros - Software Developer, MSc. in CS/E graduate http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
May 01 2008
"Bruno Medeiros" wroteSteven Schveighoffer wrote:I was wrong, look at my response to Sean. Sorry about that. -SteveNot on Windoze :) -SteveSomething like this would be completely unacceptable not to work on Windows.
May 01 2008
On Thu, May 01, 2008 at 02:19:51AM +0900, Bill Baxter wrote:Herein lies the genius in Tango's naming conventions. You *can* have both a package std.string, and a module named std.String. If you consistently use different case for package and module names, then you can have your cake and eat it too.Does that work on Windows?--bb-- Adam D. Ruppe http://arsdnet.net
Apr 30 2008
== Quote from Me Here (p9e883002 sneakemail.com)'s articleMy point was that /if/ Ds arrays have a similar capability, to be preallocated large and empty and grow into the space then when a mutation requires a reallocation of a mutable array because it has outgrown its original allocation, then a debug-enabled warning saying by how much, might allow the programmer to preallocate the initial mutable array larger and so avoid reallocation at runtime.D arrays do have this feature, thanks to a suggestion by Derek Parnell. That is, reducing the array's length property does not cause a reallocation, even when length is set to zero. Thus it is possible to do: void fn( inout char[] buf ) { buf.length = 1024; // preallocate 1024 bytes of storage buf.length = 0; buf ~= "hello"; // will copy into preallocated buffer } Thus the proper way to discard a buffer is to do: buf = null; I think for specific buffers it's probably enough to print their length when you're done filling them and then explicitly preallocate the next run based on this info. Tango also offers a means of performing program-level preallocation via GC.reserve() for people so inclined. Sean
Apr 30 2008
As my ascii art was screwed by the time it got to the server, here is a better illustration of what goes on: This is long and wordy and maybe of no interest. But it does illustrate the point i was trying to make. [0] Perl> use Devel::Peek;; allocated. SV = NULL(0x0) at 0x194a9cc REFCNT = 1 FLAGS = () [0] Perl> Dump $s;; SV = PV(0x2252e8) at 0x194a9cc REFCNT = 1 FLAGS = (POK,pPOK) length incase we pass it to C teh first 5 characters [0] Perl> Dump $s;; SV = PVIV(0x2256ec) at 0x194a9cc REFCNT = 1 FLAGS = (POK,OOK,pPOK) start of the buffer PV = 0x191e1a9 ( "abcde" . ) "fghijklmnopqrstuvwxyz"\0 [0] Perl> Dump $s;; SV = PVIV(0x2256ec) at 0x194a9cc REFCNT = 1 FLAGS = (POK,OOK,pPOK) IV = 5 (OFFSET) PV = 0x191e1a9 ( "abcde" . ) "fghijklmnop"\0 [0] Perl> $s = 'XX' . $s;; Prepend some new stuff back [0] Perl> Dump $s;; SV = PVIV(0x2256ec) at 0x194a9cc REFCNT = 1 FLAGS = (POK,OOK,pPOK) IV = 5 (OFFSET) PV = 0x191e1a9 ( "abcde" . ) "XXfghijklmnop"\0 [0] Perl> Dump $s;; SV = PVIV(0x2256ec) at 0x194a9cc REFCNT = 1 FLAGS = (POK,OOK,pPOK) IV = 5 (OFFSET) PV = 0x191e1a9 ( "abcde" . ) "XXfghijklmnopXX"\0 LEN = 22 offset space [0] Perl> Dump $s;; SV = PVIV(0x2256ec) at 0x194a9cc REFCNT = 1 FLAGS = (POK,OOK,pPOK) PV = 0x191e1a9 ( "abcde" . ) "XXfghijklmnopXX??????"\0 CUR = 21 LEN = 22 [0] Perl> Dump $s;; SV = PVIV(0x2256ec) at 0x194a9cc REFCNT = 1 FLAGS = (POK,pPOK) CUR = 23 LEN = 27 [0] Perl> Dump $s;; SV = PVIV(0x2256ec) at 0x194a9cc REFCNT = 1 FLAGS = (POK,pPOK) CUR = 26 LEN = 27 [0] Perl> Dump $s;; SV = PVIV(0x2256ec) at 0x194a9cc REFCNT = 1 FLAGS = (POK,pPOK) memory. CUR = 27 LEN = 28 --
Apr 30 2008
2008/4/30 Janice Caron <caron800 googlemail.com>:I would support the addition of some function like gc.minimise(char[]) which returned all the unused space following the end of the array back to the gc, without any copying of the used part. I wouldn't be able to write that though - the gc is not my area of expertise.Sorry, I meant std.gc.minimise(void[] array) This function doesn't exist right now.
Apr 30 2008
Janice Caron escreveu:2008/4/30 Janice Caron <caron800 googlemail.com>:Weren't 'void[]'s banned?I would support the addition of some function like gc.minimise(char[]) which returned all the unused space following the end of the array back to the gc, without any copying of the used part. I wouldn't be able to write that though - the gc is not my area of expertise.Sorry, I meant std.gc.minimise(void[] array) This function doesn't exist right now.
May 02 2008
2008/4/30 Bruno Medeiros <brunodomedeiros+spam com.gmail>:Also, is there a reason why these mutable functions shouldn't be in std.string, together with their invariant/const brethren?That's why we're having this discussion. The idea is that std.string can be optimised for invariant strings, while std.stringbuffer could be optimised for mutable strings. There are pros and cons for separate modules. I don't think Walter wants std.string "polluted" by all these functions he doesn't much care for. Also, it would be bad if mutable versions were called "by mistake" with consequent unexpected behavior. But keep discussing. The people I want to hear from most are the people calling for mutable string functions.
Apr 29 2008
Janice Caron wrote:2008/4/30 Bruno Medeiros <brunodomedeiros+spam com.gmail>:"mutable versions were called "by mistake" "? I don't think that point applies to D, after all, the purpose of the immutability system is for the compiler to check that this won't happen, so unless there is some compiler bug, that shouldn't happen in D.Also, is there a reason why these mutable functions shouldn't be in std.string, together with their invariant/const brethren?That's why we're having this discussion. The idea is that std.string can be optimised for invariant strings, while std.stringbuffer could be optimised for mutable strings. There are pros and cons for separate modules. I don't think Walter wants std.string "polluted" by all these functions he doesn't much care for. Also, it would be bad if mutable versions were called "by mistake" with consequent unexpected behavior.But keep discussing. The people I want to hear from most are the people calling for mutable string functions.You may find that a large segment of those people are using Tango, and so they might not participate much in this Phobos design issue discussion. -- Bruno Medeiros - Software Developer, MSc. in CS/E graduate http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
May 01 2008
Bruno Medeiros wrote:Janice Caron wrote:What if you wanted a modified copy of the input, but that input happened to be mutable? The modifying versions should have some distinguishing characteristic to separate them from the COW versions. I'd say either a different function name or an extra out-buffer parameter (as long as they still work if the buffer is the same array as the normal input).2008/4/30 Bruno Medeiros <brunodomedeiros+spam com.gmail>:"mutable versions were called "by mistake" "? I don't think that point applies to D, after all, the purpose of the immutability system is for the compiler to check that this won't happen, so unless there is some compiler bug, that shouldn't happen in D.Also, is there a reason why these mutable functions shouldn't be in std.string, together with their invariant/const brethren?That's why we're having this discussion. The idea is that std.string can be optimised for invariant strings, while std.stringbuffer could be optimised for mutable strings. There are pros and cons for separate modules. I don't think Walter wants std.string "polluted" by all these functions he doesn't much care for. Also, it would be bad if mutable versions were called "by mistake" with consequent unexpected behavior.
May 01 2008
"Frits van Bommel" wroteBruno Medeiros wrote:Any modifying versions would take mutable strings, COW version would require invariant strings. They would be able to go in the same module, because there would be no ambiguity. But if you have non-modifying versions that you want to use on mutable strings, those would most likely take a const pointer. Those would have to be named differently than the invariant versions, because invariant implicitly casts to const. Besides all this, it is good to separate them into 2 different modules because the linker includes all functions that are in a module, not just ones that are used. So if you are of the persuasion to only use mutable or only use COW functions, then you probably don't want to link in the other versions if you can help it. -SteveJanice Caron wrote:What if you wanted a modified copy of the input, but that input happened to be mutable? The modifying versions should have some distinguishing characteristic to separate them from the COW versions. I'd say either a different function name or an extra out-buffer parameter (as long as they still work if the buffer is the same array as the normal input).2008/4/30 Bruno Medeiros:"mutable versions were called "by mistake" "? I don't think that point applies to D, after all, the purpose of the immutability system is for the compiler to check that this won't happen, so unless there is some compiler bug, that shouldn't happen in D.Also, is there a reason why these mutable functions shouldn't be in std.string, together with their invariant/const brethren?That's why we're having this discussion. The idea is that std.string can be optimised for invariant strings, while std.stringbuffer could be optimised for mutable strings. There are pros and cons for separate modules. I don't think Walter wants std.string "polluted" by all these functions he doesn't much care for. Also, it would be bad if mutable versions were called "by mistake" with consequent unexpected behavior.
May 01 2008
Frits van Bommel wrote:Bruno Medeiros wrote:Hum, I see what you mean, yes, that could happen.Janice Caron wrote:What if you wanted a modified copy of the input, but that input happened to be mutable?2008/4/30 Bruno Medeiros <brunodomedeiros+spam com.gmail>:"mutable versions were called "by mistake" "? I don't think that point applies to D, after all, the purpose of the immutability system is for the compiler to check that this won't happen, so unless there is some compiler bug, that shouldn't happen in D.Also, is there a reason why these mutable functions shouldn't be in std.string, together with their invariant/const brethren?That's why we're having this discussion. The idea is that std.string can be optimised for invariant strings, while std.stringbuffer could be optimised for mutable strings. There are pros and cons for separate modules. I don't think Walter wants std.string "polluted" by all these functions he doesn't much care for. Also, it would be bad if mutable versions were called "by mistake" with consequent unexpected behavior.The modifying versions should have some distinguishing characteristic to separate them from the COW versions. I'd say either a different function name or an extra out-buffer parameter (as long as they still work if the buffer is the same array as the normal input).Yes, the idea to distinguish them with a different name sounds good (names like "doToUpper", maybe?). So that means you agree it should be in the same package? :P -- Bruno Medeiros - Software Developer, MSc. in CS/E graduate http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
May 01 2008
Bruno Medeiros wrote:Frits van Bommel wrote:I don't like 'doToUpper', but something like 'makeUpper' could be a good convention. That makes it pretty clear they're modifying the input, I think. I don't particularly care what package they're in, but their names should make it clear what they do. Especially if you're working with both sets of functions in the same module... Looking at the Phobos2 std.string docs, I do think some of those functions could benefit from at least a const(char)[] overload so they'll work with non-invariant parameters too. The ones that don't even return string data[1] should probably just replace all invariant parameters with const ones. Of course, for the rest the return type of const overloads could be debated. (First question: should they ever return a slice? If not, should the return type be mutable or invariant[2]?) [1]: In particular: inPattern(), size_t count*(), bool is*() and size_t column() are the ones I saw. [2]: It shouldn't be const though, that'd be pointless: returning newly allocated memory as const means it's effectively invariant anyway.The modifying versions should have some distinguishing characteristic to separate them from the COW versions. I'd say either a different function name or an extra out-buffer parameter (as long as they still work if the buffer is the same array as the normal input).Yes, the idea to distinguish them with a different name sounds good (names like "doToUpper", maybe?). So that means you agree it should be in the same package? :P
May 01 2008
Spacen Jasset Wrote:string.toupper shouldn't really be doing work outside of ascii, in my opinion anyway. This also means that a string can be uppercased in place.So anyone who uses alphabets other than pure english will have to write their own function to uppercase their strings, even though the unicode standard defines how it should work, and D is supposed to support unicode? --Simen
May 01 2008
Janice Caron escreveu:Hi all, More than one person has complained about the lack of string functions in Phobos which operate on mutable chars. In the thread titled "Is all this Invariant ****....", I suggested creating a new module, std.stringbuffer, to contain two things: (1) a StringBuffer class (2) parallel mutable versions of the functions in std.string. Walter OKed the idea, so it looks like that's a go. To that end, I've looked through the functions in std.string and sorted them into different groups. I think it's important to get the API right so comments are welcome on all of the below:(snip) I agree with this and will welcome the module. I've had to do some ugly .idup and .dup around a compiler I coded to accomodate for various functions around Phobos (such as writeLine from OutputStream). I'd like to suggest, though, the usage of template code: T[] split(T)(in data) and perform a static if inside. It'd save the assle of maintaining two modules in seperate, which are bound to have different functions some day. For example,say that a function is added to std.string and not to std.stringbuffer. Also, it would be easier to maintain documentation consistency. On an extra note, ASCII UTF variants could be taken care in a single function. That would require a lot of work though. Well, should you require assistance, gimme a shout. Cheers
May 02 2008