digitalmars.D - std.string will get the boot
- Andrei Alexandrescu (31/31) Jan 29 2010 I plan a few improvements to Phobos that will improve string handling.
- bearophile (8/16) Jan 29 2010 32 bits are not enough to represent certain "characters", they need more...
- Andrei Alexandrescu (13/29) Jan 29 2010 I think it's a tad late for that.
- Simen kjaeraas (5/9) Jan 30 2010 So adding aliases to object.d is not possible this late in the process?
- Andrei Alexandrescu (3/13) Jan 30 2010 That would be possible.
- Denis Koroskin (4/12) Jan 31 2010 Everyone can do that on their own. I see no reason to pollute the
- Simen kjaeraas (5/20) Jan 31 2010 Nor do I. I was only inquiring as to its feasibility.
- Lionello Lunesu (10/34) Jan 30 2010 I also doubt 32-bit is not enough. In fact, Unicode has 0x10FFFF
- Michel Fortin (22/37) Jan 30 2010 32-bit is enough to cover all code points. But there are many combining
- Simen kjaeraas (13/19) Jan 31 2010 struct Typedef( T ) {
- Denis Koroskin (8/25) Jan 31 2010 a =
- Lionello Lunesu (12/36) Feb 01 2010 Using alias you loose all type safety.
- Jacob Carlborg (6/37) Jan 29 2010 I would keep std.string for string specific functions and perhaps
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (12/15) Jan 29 2010 I've been thinking about characters lately and have realized that
- Andrei Alexandrescu (6/25) Jan 29 2010 My thoughts exactly. In fact I'm thinking of generalizing toupper and
- Jacob Carlborg (6/22) Jan 29 2010 I'm not sure I really understand this, probably because I don't know
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (25/54) Jan 29 2010 'i' and 'i' are the same "character", because they have the same ASCII
- Andrei Alexandrescu (8/17) Jan 29 2010 My idea of functions for upper/lowercase would help you solve exactly
- dsimcha (5/7) Jan 29 2010 Please, no. I **HATE** fine-grained imports like Tango has. I don't wa...
- Jonathan M Davis (14/24) Jan 29 2010 We need a balance. Fine-grained can be great, but if it's too fine-grain...
- Lutger (12/20) Jan 29 2010 I like how naturaldocs, which is similar to ddoc helps with this: by
- Lutger (3/4) Jan 29 2010 sorry, wrong anchor:
- Andrei Alexandrescu (7/34) Jan 29 2010 I think the idea of tags is awesome, particularly because it doesn't
- Lutger (5/39) Jan 29 2010 Cool, tags are even better (naturaldocs groups aren't tags really). How
- bearophile (9/14) Jan 29 2010 I am far from expert about such hairy matters, so I can be wrong. This i...
- Lutger (4/8) Jan 29 2010 This is about the documentation, which at the moment is based on the
- Andrei Alexandrescu (9/26) Jan 29 2010 I don't think it would be too far-fetched to define and use tags for
- bearophile (6/9) Jan 29 2010 A next step is to allow to import all names with a specified tag, even i...
- Robert Jacques (6/36) Jan 29 2010 By the way, in the sort term you could greatly improve the usability of ...
- Andrei Alexandrescu (5/45) Jan 29 2010 That jump to index is automatically generated. I can have it sorted
- Clemens (3/28) Feb 02 2010 I think you may misunderstand what the "alias this" construct does. It d...
I plan a few improvements to Phobos that will improve string handling. Currently arrays of characters count as random-access ranges, which is not true for arrays of char and wchar. I plan to make std.range aware of that and only characterize char[] and wchar[] (and their qualified versions) as bidirectional ranges. Also, std.range will define s.front and s.back for strings to return the correctly decoded dchar. Naturally, s.popFront and s.popBack will yank an entire encoded character, which is what you want most of the time anyway. (You're still free to do s = s[1 .. $] if that's what you need.) These changes will have the great effect of enabling std.algorithm to work with strings correctly without any further impedance adaptation. (At some point I'd defined byDchar to wrap a string as a bidirectional range; it works, but of course it's much better without an intermediary.) Following that change, I plan to eliminate std.string entirely and roll all of its functionality into std.algorithm. This is because I noticed that I'd like many string functions to be available for other data types, and also because people who want to define their own non-UTF encodings can benefit of the support that UTF already has. (As an example, startsWith or endsWith are very useful not only with strings, but general data as well.) A possible idea would be to move algorithms out of std.string and roll std.utf and std.encoding into std.string. That way std.string becomes something UTF-specific, which may be sensible. One problem I foresee is the growth of std.algorithm. It already has many things in it, and I fear that some user who just wants to trim a string may find it intimidating to browse through all that documentation. I wonder how we could break std.algorithm into smaller units (which is an issue largely independent from generalizing the algorithms now found in std.string). Any ideas are welcome. Andrei
Jan 29 2010
Andrei Alexandrescu:Currently arrays of characters count as random-access ranges, which is not true for arrays of char and wchar. I plan to make std.range aware of that and only characterize char[] and wchar[] (and their qualified versions) as bidirectional ranges.32 bits are not enough to represent certain "characters", they need more than one of such dchar. So dchar too may be a bidirectional range. I can't remember the bit size of wchar and dchar. So names like char, char16 and char32 can be better... Sometimes I have ugly 7-bit ASCII strings, I am not sure I want to be forced to use cast(ubyte[]) every time I use an algorithm on them :-)One problem I foresee is the growth of std.algorithm. It already has many things in it, and I fear that some user who just wants to trim a string may find it intimidating to browse through all that documentation.It's not just a matter of documentation: to choose among n items a human needs more time as n grows (people that designg important menus in GUIs must be aware of this). So huge APIs slow down programming. A possible solution is to keep the std.string module, but make it just a list of aliases and thin wrappers around functions of std.algorithm, tuned for string processing (example I usually don't need tolower on generic arrays), there are some operations that are mostly useful for strings). Bye, bearophile
Jan 29 2010
bearophile wrote:Andrei Alexandrescu:[citation needed]Currently arrays of characters count as random-access ranges, which is not true for arrays of char and wchar. I plan to make std.range aware of that and only characterize char[] and wchar[] (and their qualified versions) as bidirectional ranges.32 bits are not enough to represent certain "characters", they need more than one of such dchar. So dchar too may be a bidirectional range.I can't remember the bit size of wchar and dchar. So names like char, char16 and char32 can be better...I think it's a tad late for that.Sometimes I have ugly 7-bit ASCII strings, I am not sure I want to be forced to use cast(ubyte[]) every time I use an algorithm on them :-)That's exactly one of the cases in which my change would help. char is UTF-8, so that's out as an option for expressing ASCII characters. You'll be able to define your own type: struct AsciiChar { ubyte datum; ... } Then express stuff in terms of AsciiChar[] etc.That's a good possibility. AndreiOne problem I foresee is the growth of std.algorithm. It already has many things in it, and I fear that some user who just wants to trim a string may find it intimidating to browse through all that documentation.It's not just a matter of documentation: to choose among n items a human needs more time as n grows (people that designg important menus in GUIs must be aware of this). So huge APIs slow down programming. A possible solution is to keep the std.string module, but make it just a list of aliases and thin wrappers around functions of std.algorithm, tuned for string processing (example I usually don't need tolower on generic arrays), there are some operations that are mostly useful for strings).
Jan 29 2010
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:bearophile wrote:So adding aliases to object.d is not possible this late in the process? I'm not sure I want that to happen, just out of curiosity. -- SimenI can't remember the bit size of wchar and dchar. So names like char, char16 and char32 can be better...I think it's a tad late for that.
Jan 30 2010
Simen kjaeraas wrote:Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:That would be possible. Andreibearophile wrote:So adding aliases to object.d is not possible this late in the process? I'm not sure I want that to happen, just out of curiosity.I can't remember the bit size of wchar and dchar. So names like char, char16 and char32 can be better...I think it's a tad late for that.
Jan 30 2010
On Sun, 31 Jan 2010 01:30:41 +0300, Simen kjaeraas <simen.kjaras gmail.com> wrote:Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Everyone can do that on their own. I see no reason to pollute the namespace.bearophile wrote:So adding aliases to object.d is not possible this late in the process? I'm not sure I want that to happen, just out of curiosity.I can't remember the bit size of wchar and dchar. So names like char, char16 and char32 can be better...I think it's a tad late for that.
Jan 31 2010
On Sun, 31 Jan 2010 15:09:28 +0100, Denis Koroskin <2korden gmail.com> wrote:On Sun, 31 Jan 2010 01:30:41 +0300, Simen kjaeraas <simen.kjaras gmail.com> wrote:Nor do I. I was only inquiring as to its feasibility. -- SimenAndrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Everyone can do that on their own. I see no reason to pollute the namespace.bearophile wrote:So adding aliases to object.d is not possible this late in the process? I'm not sure I want that to happen, just out of curiosity.I can't remember the bit size of wchar and dchar. So names like char, char16 and char32 can be better...I think it's a tad late for that.
Jan 31 2010
On 30-1-2010 1:59, Andrei Alexandrescu wrote:bearophile wrote:I also doubt 32-bit is not enough. In fact, Unicode has 0x10FFFF as the highest code point.Andrei Alexandrescu:[citation needed]Currently arrays of characters count as random-access ranges, which is not true for arrays of char and wchar. I plan to make std.range aware of that and only characterize char[] and wchar[] (and their qualified versions) as bidirectional ranges.32 bits are not enough to represent certain "characters", they need more than one of such dchar. So dchar too may be a bidirectional range.I miss typedef. I think this is exactly what typedef was intended for. Perhaps we can reintroduce it as a 'short hand' for such a struct? By the way, ASCII is a subset of UTF-8 (that was the whole point), so there's no reason why 'char[]' can't still be used for ASCII strings, right? L.Sometimes I have ugly 7-bit ASCII strings, I am not sure I want to be forced to use cast(ubyte[]) every time I use an algorithm on them :-)That's exactly one of the cases in which my change would help. char is UTF-8, so that's out as an option for expressing ASCII characters. You'll be able to define your own type: struct AsciiChar { ubyte datum; ... } Then express stuff in terms of AsciiChar[] etc.
Jan 30 2010
On 2010-01-30 22:06:06 -0500, Lionello Lunesu <lio lunesu.remove.com> said:On 30-1-2010 1:59, Andrei Alexandrescu wrote:32-bit is enough to cover all code points. But there are many combining code points in Unicode, allowing you to combine diacritic with various other characters, such as an acute accent with a 'k'. Some of these combinations exists in precombined form and are considered equivalent. So if you want to count the number of characters the user actually see instead of counting code points, then you need to take these combining code points into account. But if you really wanted to iterate over "characters" instead of code points, note that it can become quite hard if you take into account double diacritics, combining diacritic signs placed across two letters. So I think it's reasonable to have dchar, a code point, as the base unit for iterating over a string. http://en.wikipedia.org/wiki/Combining_character http://en.wikipedia.org/wiki/Unicode_normalization Another interesting case: http://en.wikipedia.org/wiki/Combining_grapheme_joiner Unicode, isn't it great? -- Michel Fortin michel.fortin michelf.com http://michelf.com/bearophile wrote:I also doubt 32-bit is not enough. In fact, Unicode has 0x10FFFF as the highest code point.Andrei Alexandrescu:[citation needed]Currently arrays of characters count as random-access ranges, which is not true for arrays of char and wchar. I plan to make std.range aware of that and only characterize char[] and wchar[] (and their qualified versions) as bidirectional ranges.32 bits are not enough to represent certain "characters", they need more than one of such dchar. So dchar too may be a bidirectional range.
Jan 30 2010
Lionello Lunesu <lio lunesu.remove.com> wrote:I miss typedef. I think this is exactly what typedef was intended for. Perhaps we can reintroduce it as a 'short hand' for such a struct?struct Typedef( T ) { T payload; alias payload this; } Usage: alias Typedef!( int ) myInt; Is this what you want?By the way, ASCII is a subset of UTF-8 (that was the whole point), so there's no reason why 'char[]' can't still be used for ASCII strings, right?AS far as I have understood (I am no Unicode guru), in some locales toUpper and toLower map ASCII chars to non-ASCII chars. So ASCII being a strict subset of UTF-8 is not always true. -- Simen
Jan 31 2010
On Sun, 31 Jan 2010 11:34:03 +0300, Simen kjaeraas = <simen.kjaras gmail.com> wrote:Lionello Lunesu <lio lunesu.remove.com> wrote:I miss typedef. I think this is exactly what typedef was intended for. Perhaps we can reintroduce it as a 'short hand' for such a struct?struct Typedef( T ) { T payload; alias payload this; } Usage: alias Typedef!( int ) myInt; Is this what you want?By the way, ASCII is a subset of UTF-8 (that was the whole point), so there's no reason why 'char[]' can't still be used for ASCII strings, right?AS far as I have understood (I am no Unicode guru), in some locales =toUpper and toLower map ASCII chars to non-ASCII chars. So ASCII being=a =strict subset of UTF-8 is not always true.I only know one example (in turkish): i < - > =C4=B0 =C4=B1 < - > I That's a big issue because toUpper/toLower needs a locale to provide = correct result.
Jan 31 2010
On 31-1-2010 16:34, Simen kjaeraas wrote:Lionello Lunesu <lio lunesu.remove.com> wrote:Using alias you loose all type safety. I remember Andrei mentioned that he and Walter couldn't agree whether typedef should behave as a sub or super class. I think it should not be looked at from a inheritance perspective, but just consider it as wrapper struct with a ctor that takes the underlying type.I miss typedef. I think this is exactly what typedef was intended for. Perhaps we can reintroduce it as a 'short hand' for such a struct?struct Typedef( T ) { T payload; alias payload this; } Usage: alias Typedef!( int ) myInt; Is this what you want?True, but then that upper resp lowercase would no longer be ASCII. As long as you stick to ASCII, char[] should work just fine. So, toLower and toUpper can accept ASCII char[] but always output one of those new char ranges. Problem fixed :) L.By the way, ASCII is a subset of UTF-8 (that was the whole point), so there's no reason why 'char[]' can't still be used for ASCII strings, right?AS far as I have understood (I am no Unicode guru), in some locales toUpper and toLower map ASCII chars to non-ASCII chars. So ASCII being a strict subset of UTF-8 is not always true.
Feb 01 2010
On 1/29/10 18:36, Andrei Alexandrescu wrote:I plan a few improvements to Phobos that will improve string handling. Currently arrays of characters count as random-access ranges, which is not true for arrays of char and wchar. I plan to make std.range aware of that and only characterize char[] and wchar[] (and their qualified versions) as bidirectional ranges. Also, std.range will define s.front and s.back for strings to return the correctly decoded dchar. Naturally, s.popFront and s.popBack will yank an entire encoded character, which is what you want most of the time anyway. (You're still free to do s = s[1 .. $] if that's what you need.) These changes will have the great effect of enabling std.algorithm to work with strings correctly without any further impedance adaptation. (At some point I'd defined byDchar to wrap a string as a bidirectional range; it works, but of course it's much better without an intermediary.) Following that change, I plan to eliminate std.string entirely and roll all of its functionality into std.algorithm. This is because I noticed that I'd like many string functions to be available for other data types, and also because people who want to define their own non-UTF encodings can benefit of the support that UTF already has.I would keep std.string for string specific functions and perhaps publicly import std.algorithm. For exmaple functions like: tolower, icmp and toStringz.(As an example, startsWith or endsWith are very useful not only with strings, but general data as well.) A possible idea would be to move algorithms out of std.string and roll std.utf and std.encoding into std.string. That way std.string becomes something UTF-specific, which may be sensible. One problem I foresee is the growth of std.algorithm. It already has many things in it, and I fear that some user who just wants to trim a string may find it intimidating to browse through all that documentation. I wonder how we could break std.algorithm into smaller units (which is an issue largely independent from generalizing the algorithms now found in std.string).Perhaps it's time to start adding more packages than just the std. Make std.algorithm a package and try to split it into several modules.Any ideas are welcome. Andrei
Jan 29 2010
Jacob Carlborg wrote:I would keep std.string for string specific functions and perhaps publicly import std.algorithm. For exmaple functions like: tolower, icmp and toStringz.I've been thinking about characters lately and have realized that tolower, toupper, icmp, and friends should not be in a string library. Those functions need an "alphabet" to be useful; not language, nor locale... In fact, the character itself must have alphabet information. Otherwise a string like "ali & jim" cannot be converted to upper-case correctly(*) as "ALİ & JIM". And the word "correctly" there depends on each character's alphabet. Similarly, two characters that look the same cannot be compared for ordering. Comparing the 'x' of one alphabet to the 'x' of another alphabet is a meaningless operation. Ali
Jan 29 2010
Ali Çehreli wrote:Jacob Carlborg wrote: > I would keep std.string for string specific functions and perhaps > publicly import std.algorithm. For exmaple functions like: tolower, icmp > and toStringz. I've been thinking about characters lately and have realized that tolower, toupper, icmp, and friends should not be in a string library. Those functions need an "alphabet" to be useful; not language, nor locale... In fact, the character itself must have alphabet information. Otherwise a string like "ali & jim" cannot be converted to upper-case correctly(*) as "ALİ & JIM". And the word "correctly" there depends on each character's alphabet. Similarly, two characters that look the same cannot be compared for ordering. Comparing the 'x' of one alphabet to the 'x' of another alphabet is a meaningless operation.My thoughts exactly. In fact I'm thinking of generalizing toupper and tolower for strings to take an optional trie mapping strings to strings. That way correct capitalization can be done for any string, given a good collection of capitalization patterns. Andrei
Jan 29 2010
On 1/29/10 22:18, Ali Çehreli wrote:Jacob Carlborg wrote: > I would keep std.string for string specific functions and perhaps > publicly import std.algorithm. For exmaple functions like: tolower, icmp > and toStringz. I've been thinking about characters lately and have realized that tolower, toupper, icmp, and friends should not be in a string library. Those functions need an "alphabet" to be useful; not language, nor locale... In fact, the character itself must have alphabet information. Otherwise a string like "ali & jim" cannot be converted to upper-case correctly(*) as "ALİ & JIM". And the word "correctly" there depends on each character's alphabet. Similarly, two characters that look the same cannot be compared for ordering. Comparing the 'x' of one alphabet to the 'x' of another alphabet is a meaningless operation. AliI'm not sure I really understand this, probably because I don't know much about how Unciode works. I'm thinking out loud: If "i", as you have in "ali", have the corresponding "İ" as upper case wouldn't that be another character than the English "i"? If so, I'm not sure I see the problem. If not, I see the problem.
Jan 29 2010
Jacob Carlborg wrote:On 1/29/10 22:18, Ali Çehreli wrote:'i' and 'i' are the same "character", because they have the same ASCII and Unicode values in different alphabets. But it is not the same "letter" when they are part of different text. iİ (and ıI) issue is probably too special. A number of Turkic alphabets chose ASCII 'i' probably for historical reasons. Unicode did not define a separate code point for 'i' either, probably because those alphabets already were using the ASCII 'i'.Jacob Carlborg wrote: > I would keep std.string for string specific functions and perhaps > publicly import std.algorithm. For exmaple functions like: tolower, icmp > and toStringz. I've been thinking about characters lately and have realized that tolower, toupper, icmp, and friends should not be in a string library. Those functions need an "alphabet" to be useful; not language, nor locale... In fact, the character itself must have alphabet information. Otherwise a string like "ali & jim" cannot be converted to upper-case correctly(*) as "ALİ & JIM". And the word "correctly" there depends on each character's alphabet. Similarly, two characters that look the same cannot be compared for ordering. Comparing the 'x' of one alphabet to the 'x' of another alphabet is a meaningless operation. AliI'm not sure I really understand this, probably because I don't know much about how Unciode works. I'm thinking out loud: If "i", as you have in "ali", have the corresponding "İ" as upper case wouldn't that be another character than the English "i"?If so, I'm not sure I see the problem. If not, I see the problem.The letter 'i' (and I) is special but the issue is valid for any other letter: Is it valid to compare an 'i' in English text to an 'i' in German text? I think it's only valid at the lowest data representation level. And ASCII never claims to be more than a code table for "information interchange". That part is fine. The problem is with the use of certain ranges of the ASCII table as the English alphabet. It is unfortunate that it works... :) D is great that it supports three separate Unicode encodings in the language, but encodings are at a lower level of abstraction than "letters". I am not sure what data is used for toUniUpper and toUniLower in std.uni, but they can't work correctly without alphabet information. They favor the ASCII layout probabyl because for historical reasons. I think the problems with using the ASCII table for sorting is well known. A more interesting example is with the Azeri alphabet: it uses the ASCII xX characters, but sorts them after hH. Ali
Jan 29 2010
Ali Çehreli wrote:D is great that it supports three separate Unicode encodings in the language, but encodings are at a lower level of abstraction than "letters". I am not sure what data is used for toUniUpper and toUniLower in std.uni, but they can't work correctly without alphabet information. They favor the ASCII layout probabyl because for historical reasons. I think the problems with using the ASCII table for sorting is well known. A more interesting example is with the Azeri alphabet: it uses the ASCII xX characters, but sorts them after hH.My idea of functions for upper/lowercase would help you solve exactly the issue you mention. A conversion trie as an optional parameter would allow to capitalize Straße as STRASSE and ali as ALİ. The trie will match the longest substring of the original string and will have translation strings in the nodes. The way capitalization is done will depend on the way you set up the table. Andrei
Jan 29 2010
== Quote from Jacob Carlborg (doob me.com)'s articlePerhaps it's time to start adding more packages than just the std. Make std.algorithm a package and try to split it into several modules.Please, no. I **HATE** fine-grained imports like Tango has. I don't want to write tons of boilerplate at the top of every file just to have access to a bunch of closely related functionality. If this is done, **PLEASE** at least make a std.algorithm.all that publicly imports everything in the old std.algorithm.
Jan 29 2010
dsimcha wrote:== Quote from Jacob Carlborg (doob me.com)'s articleWe need a balance. Fine-grained can be great, but if it's too fine-grained, it gets hard to find things and you have to import a ton of modules. Not fine-grained enough, however, and you have a hard me finding things because there's so much to search through in each module - though importing what you need is easy. Personally, I'm fine with std.algorithm being split into sub-modules. It's already fairly large and splitting it up would make a lot of sense. But then a solution allowing you to import large portions - if not all of it - at once would definitely be nice. It's why being able to do something like import std.*; and have it recursively grab every sub-module would be nice. But std.algorithm.all is a good idea. - Jonathan M DavisPerhaps it's time to start adding more packages than just the std. Make std.algorithm a package and try to split it into several modules.Please, no. I **HATE** fine-grained imports like Tango has. I don't want to write tons of boilerplate at the top of every file just to have access to a bunch of closely related functionality. If this is done, **PLEASE** at least make a std.algorithm.all that publicly imports everything in the old std.algorithm.
Jan 29 2010
On 01/29/2010 06:36 PM, Andrei Alexandrescu wrote: ...One problem I foresee is the growth of std.algorithm. It already has many things in it, and I fear that some user who just wants to trim a string may find it intimidating to browse through all that documentation. I wonder how we could break std.algorithm into smaller units (which is an issue largely independent from generalizing the algorithms now found in std.string). Any ideas are welcome. AndreiI like how naturaldocs, which is similar to ddoc helps with this: by adding a group tag. See this example of a summary of a class: http://www.naturaldocs.org/documenting/reference.html#Example_Class Probably it is possible to come up with categories for algorithm like: - functional tools - searching and sorting - string utilities ... Arguably a more D like alternative is to make std.algorithm a package and each 'category' a module of that package.
Jan 29 2010
On 01/29/2010 09:13 PM, Lutger wrote:http://www.naturaldocs.org/documenting/reference.html#Example_Classsorry, wrong anchor: http://www.naturaldocs.org/documenting/reference.html#Summaries
Jan 29 2010
Lutger wrote:On 01/29/2010 06:36 PM, Andrei Alexandrescu wrote: ...I think the idea of tags is awesome, particularly because it doesn't require one to divide items in disjoint sets. I'll think some more of it. It might require changes in ddoc. At any rate, sounds like a D3 thing. Until then, I think I'll add to std.algorithm in confidence that we can scale the documentation later. AndreiOne problem I foresee is the growth of std.algorithm. It already has many things in it, and I fear that some user who just wants to trim a string may find it intimidating to browse through all that documentation. I wonder how we could break std.algorithm into smaller units (which is an issue largely independent from generalizing the algorithms now found in std.string). Any ideas are welcome. AndreiI like how naturaldocs, which is similar to ddoc helps with this: by adding a group tag. See this example of a summary of a class: http://www.naturaldocs.org/documenting/reference.html#Example_Class Probably it is possible to come up with categories for algorithm like: - functional tools - searching and sorting - string utilities ... Arguably a more D like alternative is to make std.algorithm a package and each 'category' a module of that package.
Jan 29 2010
On 01/29/2010 09:18 PM, Andrei Alexandrescu wrote:Lutger wrote:Cool, tags are even better (naturaldocs groups aren't tags really). How are you going to do so? Perhaps better to reserve this as a standard ddoc section saying it is 'to be imlemented'? This way everybody can benefit eventually.On 01/29/2010 06:36 PM, Andrei Alexandrescu wrote: ...I think the idea of tags is awesome, particularly because it doesn't require one to divide items in disjoint sets. I'll think some more of it. It might require changes in ddoc. At any rate, sounds like a D3 thing. Until then, I think I'll add to std.algorithm in confidence that we can scale the documentation later. AndreiOne problem I foresee is the growth of std.algorithm. It already has many things in it, and I fear that some user who just wants to trim a string may find it intimidating to browse through all that documentation. I wonder how we could break std.algorithm into smaller units (which is an issue largely independent from generalizing the algorithms now found in std.string). Any ideas are welcome. AndreiI like how naturaldocs, which is similar to ddoc helps with this: by adding a group tag. See this example of a summary of a class: http://www.naturaldocs.org/documenting/reference.html#Example_Class Probably it is possible to come up with categories for algorithm like: - functional tools - searching and sorting - string utilities ... Arguably a more D like alternative is to make std.algorithm a package and each 'category' a module of that package.
Jan 29 2010
Andrei Alexandrescu:I think the idea of tags is awesome, particularly because it doesn't require one to divide items in disjoint sets. I'll think some more of it.A hierarchical D/Python-like module system isn't the only way to organize blocks of code. Both future Windows file system and Google Email use tags to create groups of items in a less disjoint way. But I don't know if it's possible to design the equivalent of a module system based on tags instead of a hierarchy of modules/packages (and superpackages). It seems a cute idea.I am far from expert about such hairy matters, so I can be wrong. This is from Wikipedia: http://en.wikipedia.org/wiki/UTF-3232 bits are not enough to represent certain "characters", they need more than one of such dchar. So dchar too may be a bidirectional range.<<[citation needed]<Though a fixed number of bytes per code point seems convenient, it is not used as much as the other Unicode encodings. It makes truncation slightly easier but not significantly so compared to UTF-8 and UTF-16. It does not make calculating the displayed width of a string any easier except in very limited cases, since even with a "fixed width" font there may be more than one code point per character position (combining marks) or more than one character position per code point (for example CJK ideographs). Combining marks also mean editors cannot treat one code point as being the same as one unit for editing.<That paragraph of text also links to: http://en.wikipedia.org/wiki/Combining_character http://en.wikipedia.org/wiki/CJK Bye, bearophile
Jan 29 2010
On 01/29/2010 09:43 PM, bearophile wrote:Andrei Alexandrescu:This is about the documentation, which at the moment is based on the module system, type system and order of declarations. Such tags allow for better indexes, organization and search through the docs.I think the idea of tags is awesome, particularly because it doesn't require one to divide items in disjoint sets. I'll think some more of it.A hierarchical D/Python-like module system isn't the only way to organize blocks of code. Both future Windows file system and Google Email use tags to create groups of items in a less disjoint way. But I don't know if it's possible to design the equivalent of a module system based on tags instead of a hierarchy of modules/packages (and superpackages). It seems a cute idea.
Jan 29 2010
Lutger wrote:On 01/29/2010 09:43 PM, bearophile wrote:I don't think it would be too far-fetched to define and use tags for selective imports a la: // inside std.algorithm tag(string, comparison) bool startsWith(...)(...) { ... } // in client code // get everything tagged with "string" import std.algorithm : tag(string); AndreiAndrei Alexandrescu:This is about the documentation, which at the moment is based on the module system, type system and order of declarations. Such tags allow for better indexes, organization and search through the docs.I think the idea of tags is awesome, particularly because it doesn't require one to divide items in disjoint sets. I'll think some more of it.A hierarchical D/Python-like module system isn't the only way to organize blocks of code. Both future Windows file system and Google Email use tags to create groups of items in a less disjoint way. But I don't know if it's possible to design the equivalent of a module system based on tags instead of a hierarchy of modules/packages (and superpackages). It seems a cute idea.
Jan 29 2010
Andrei Alexandrescu:// in client code // get everything tagged with "string" import std.algorithm : tag(string);A next step is to allow to import all names with a specified tag, even if such names are inside more than one text file (the compiler can create a json txt file to speed up this retrieval): import tag(string); To keep things tidy I think it's better to minimize the number of different tags inside each file, so they are similar to modules anyway: perfect hierarchies are sometimes too much rigid to represent real life complexities, but an approximate hierarchy is tidier and simpler to understand than an amorphous soup of tags. Bye, bearophile
Jan 29 2010
On Fri, 29 Jan 2010 15:18:14 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Lutger wrote:By the way, in the sort term you could greatly improve the usability of std.algorithm by cleaning up the index ("jump to") at the top of the file. A simple alphabetical listing would be great and you could easily start grouping links under categories (which would eventually become tags)On 01/29/2010 06:36 PM, Andrei Alexandrescu wrote: ...I think the idea of tags is awesome, particularly because it doesn't require one to divide items in disjoint sets. I'll think some more of it. It might require changes in ddoc. At any rate, sounds like a D3 thing. Until then, I think I'll add to std.algorithm in confidence that we can scale the documentation later. AndreiOne problem I foresee is the growth of std.algorithm. It already has many things in it, and I fear that some user who just wants to trim a string may find it intimidating to browse through all that documentation. I wonder how we could break std.algorithm into smaller units (which is an issue largely independent from generalizing the algorithms now found in std.string). Any ideas are welcome. AndreiI like how naturaldocs, which is similar to ddoc helps with this: by adding a group tag. See this example of a summary of a class: http://www.naturaldocs.org/documenting/reference.html#Example_Class Probably it is possible to come up with categories for algorithm like: - functional tools - searching and sorting - string utilities ... Arguably a more D like alternative is to make std.algorithm a package and each 'category' a module of that package.
Jan 29 2010
Robert Jacques wrote:On Fri, 29 Jan 2010 15:18:14 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:That jump to index is automatically generated. I can have it sorted alphabetically, which makes sense for large lists. But then should I also list components in alphabetical order? AndreiLutger wrote:By the way, in the sort term you could greatly improve the usability of std.algorithm by cleaning up the index ("jump to") at the top of the file. A simple alphabetical listing would be great and you could easily start grouping links under categories (which would eventually become tags)On 01/29/2010 06:36 PM, Andrei Alexandrescu wrote: ...I think the idea of tags is awesome, particularly because it doesn't require one to divide items in disjoint sets. I'll think some more of it. It might require changes in ddoc. At any rate, sounds like a D3 thing. Until then, I think I'll add to std.algorithm in confidence that we can scale the documentation later. AndreiOne problem I foresee is the growth of std.algorithm. It already has many things in it, and I fear that some user who just wants to trim a string may find it intimidating to browse through all that documentation. I wonder how we could break std.algorithm into smaller units (which is an issue largely independent from generalizing the algorithms now found in std.string). Any ideas are welcome. AndreiI like how naturaldocs, which is similar to ddoc helps with this: by adding a group tag. See this example of a summary of a class: http://www.naturaldocs.org/documenting/reference.html#Example_Class Probably it is possible to come up with categories for algorithm like: - functional tools - searching and sorting - string utilities ... Arguably a more D like alternative is to make std.algorithm a package and each 'category' a module of that package.
Jan 29 2010
Lionello Lunesu Wrote:On 31-1-2010 16:34, Simen kjaeraas wrote:I think you may misunderstand what the "alias this" construct does. It does exactly what you ask for: http://www.digitalmars.com/d/2.0/class.html#AliasThisLionello Lunesu <lio lunesu.remove.com> wrote:Using alias you loose all type safety. I remember Andrei mentioned that he and Walter couldn't agree whether typedef should behave as a sub or super class. I think it should not be looked at from a inheritance perspective, but just consider it as wrapper struct with a ctor that takes the underlying type.I miss typedef. I think this is exactly what typedef was intended for. Perhaps we can reintroduce it as a 'short hand' for such a struct?struct Typedef( T ) { T payload; alias payload this; } Usage: alias Typedef!( int ) myInt; Is this what you want?
Feb 02 2010