digitalmars.D - Transcoding - Summary
- Arcane Jill (52/52) Aug 17 2004 We have two separate problems:
- Arcane Jill (4/11) Aug 17 2004 Nick, I think your work falls into category (5). If you want that job, I...
- Derek (7/72) Aug 17 2004 I hope I'm not stating the bleeding obvious, but you are talking about T...
- Walter (8/10) Aug 17 2004 be
- Regan Heath (16/28) Aug 17 2004 Did you miss the thread that mentioned that sentence structure in variou...
- Russ Lewis (10/27) Aug 17 2004 This isn't strictly a requirement of the formatting tools. Perhaps a
- Regan Heath (16/44) Aug 17 2004 The disadvantage being that the above idea is harder to maintain, there
- Arcane Jill (26/48) Aug 18 2004 Well, I didn't mean to cause trouble here. :)
- Derek Parnell (17/28) Aug 17 2004 I think that AJ was suggesting that there exists a business need for a t...
- antiAlias (28/80) Aug 18 2004 Jill ~ I have a utf-8 transcoder that I'm using as a plaything within Ma...
- Arcane Jill (13/15) Aug 18 2004 Not really interested because (a) there's one in std.utf, and (b) I coul...
- antiAlias (10/14) Aug 18 2004 genericity,
- Arcane Jill (4/6) Aug 19 2004 If you have an account on dsource, you can contact me privately there. M...
- J C Calvarese (10/21) Aug 18 2004 I'm not volunteering to single-handedly re-document std.stream, but I
We have two separate problems: (1) formatted I/O (2) unformatted I/O For unformatted I/O, we need the ability to read a sequence of dchars from some source, and the ability to write a sequence of dchars to some sink. The class which acts as a dchar source must perform decoding from some underlying ubyte source. The class which acts as a dchar sink must perform encoding to some underlying ubyte sink. The source and sink could be anything - a string; a console; a file; a socket; - even a simple counter which counts bytes and throws away data. So, to keep things generic, I shall use the terms "ubyte source", "ubyte sink", "dchar source" and "dchar sink". The traditional terms are: ubyte source = input stream ubyte sink = output stream dchar source = reader dchar sink = writer (I'm using new terms merely in order to avoid confusion with objects in std.stream, mango.io, and Java). For formatted I/O, we need: (1a) a replacement for printf() which emits a formatted sequence of dchars to an arbitrary dchar sink (1b) a replacement for scanf() which parses a sequence of dchars obtained from an arbitrary dchar source Further, for reasons of internationalization, our printf replacement must be able to random-access its variadic arguments. Observe that if the output of (1a) is plumbed into an encoder, and the input to (1b) is plumbed into a decoder, then formatted transcoding is achieved. This makes our printf/scanf replacements relatively easy to write. They are likely to require very little modification from the existing format()/unformat() routines, with essentially the only difference being that they must be dchar-based, not char-based. (Random-access of the arguments would be a new feature, however, though not necessarily an urgent one). Another oft-voiced requirement is that transcoding be independent of any particular string/stream implementation. (I suspect that if Phobos streams were fully-featured, fully-documented, bug-free and intuitive, then nobody would be asking for this requirement. But as things are, the requirement is there). So ... listed below are the jobs which need to be done. Volunteers are requested for any unclaimed jobs: (1) The source and sink interfaces need to be nailed down. (2) Given (1), dchar-based format()/unformat() replacements can be written. (3) Given (1), encoder and decoder classes/interfaces can be written. (4) Given (3), classes can be written to attach our encoders/decoders to std and mango streams, to strings, etc. (5) Given (3), encoders and decoders for SPECIFIC encodings can now be written. (6) Will somebody /please/ document std.Stream? I volunteer for (1) and (3). I'm hoping Sean will volunteer for (2). AntiAlias's excellent ideas for throughput enhancement using buffers are part of (1) and (3), so I suggest AntiAlias and I send each other code back and forth until we are both happy with it. Volunteers still needed for (4), (5) and (6) (though (4) and (5) are dependent upon (3)). Anyone who's a dab hand at Wiki might like to volunteer for (6). Arcane Jill
Aug 17 2004
In article <cfsm6d$va0$1 digitaldaemon.com>, Arcane Jill says...(1) The source and sink interfaces need to be nailed down. (2) Given (1), dchar-based format()/unformat() replacements can be written. (3) Given (1), encoder and decoder classes/interfaces can be written. (4) Given (3), classes can be written to attach our encoders/decoders to std and mango streams, to strings, etc. (5) Given (3), encoders and decoders for SPECIFIC encodings can now be written. (6) Will somebody /please/ document std.Stream?Nick, I think your work falls into category (5). If you want that job, I guess it's yours, but if so, please wait for (3) before you start. Jill
Aug 17 2004
On Tue, 17 Aug 2004 10:21:01 +0000 (UTC), Arcane Jill wrote:We have two separate problems: (1) formatted I/O (2) unformatted I/O For unformatted I/O, we need the ability to read a sequence of dchars from some source, and the ability to write a sequence of dchars to some sink. The class which acts as a dchar source must perform decoding from some underlying ubyte source. The class which acts as a dchar sink must perform encoding to some underlying ubyte sink. The source and sink could be anything - a string; a console; a file; a socket; - even a simple counter which counts bytes and throws away data. So, to keep things generic, I shall use the terms "ubyte source", "ubyte sink", "dchar source" and "dchar sink". The traditional terms are: ubyte source = input stream ubyte sink = output stream dchar source = reader dchar sink = writer (I'm using new terms merely in order to avoid confusion with objects in std.stream, mango.io, and Java). For formatted I/O, we need: (1a) a replacement for printf() which emits a formatted sequence of dchars to an arbitrary dchar sink (1b) a replacement for scanf() which parses a sequence of dchars obtained from an arbitrary dchar source Further, for reasons of internationalization, our printf replacement must be able to random-access its variadic arguments. Observe that if the output of (1a) is plumbed into an encoder, and the input to (1b) is plumbed into a decoder, then formatted transcoding is achieved. This makes our printf/scanf replacements relatively easy to write. They are likely to require very little modification from the existing format()/unformat() routines, with essentially the only difference being that they must be dchar-based, not char-based. (Random-access of the arguments would be a new feature, however, though not necessarily an urgent one). Another oft-voiced requirement is that transcoding be independent of any particular string/stream implementation. (I suspect that if Phobos streams were fully-featured, fully-documented, bug-free and intuitive, then nobody would be asking for this requirement. But as things are, the requirement is there). So ... listed below are the jobs which need to be done. Volunteers are requested for any unclaimed jobs: (1) The source and sink interfaces need to be nailed down. (2) Given (1), dchar-based format()/unformat() replacements can be written. (3) Given (1), encoder and decoder classes/interfaces can be written. (4) Given (3), classes can be written to attach our encoders/decoders to std and mango streams, to strings, etc. (5) Given (3), encoders and decoders for SPECIFIC encodings can now be written. (6) Will somebody /please/ document std.Stream? I volunteer for (1) and (3). I'm hoping Sean will volunteer for (2). AntiAlias's excellent ideas for throughput enhancement using buffers are part of (1) and (3), so I suggest AntiAlias and I send each other code back and forth until we are both happy with it. Volunteers still needed for (4), (5) and (6) (though (4) and (5) are dependent upon (3)). Anyone who's a dab hand at Wiki might like to volunteer for (6). Arcane JillI hope I'm not stating the bleeding obvious, but you are talking about TEXT I/O aren't you? There is also a lot of other I/O that is not text based - sound and image files, databases, etc... -- Derek Melbourne, Australia
Aug 17 2004
"Arcane Jill" <Arcane_member pathlink.com> wrote in message news:cfsm6d$va0$1 digitaldaemon.com...Further, for reasons of internationalization, our printf replacement mustbeable to random-access its variadic arguments.I disagree with this requirement. It breaks the nice way that std.format works. The only place where reordering the arguments is useful is in date/time formatting, and a specialized formatter would be suitable for that (and there are many other nice things one can do with a specialized date/time formatter).
Aug 17 2004
On Tue, 17 Aug 2004 15:00:28 -0700, Walter <newshound digitalmars.com> wrote:"Arcane Jill" <Arcane_member pathlink.com> wrote in message news:cfsm6d$va0$1 digitaldaemon.com...Did you miss the thread that mentioned that sentence structure in various languages differ? Example: english :- "The DOG is BIG" other :- ".. BIG .. DOG" (I don't actually know any other languages) So, it would be kind of useful to be able to define the format strings as: english :- "The $1 is $2" other :- ".. $2 .. $1" and be able to go: printf(format[lang_id],"DOG","BIG"); Regan -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/Further, for reasons of internationalization, our printf replacement mustbeable to random-access its variadic arguments.I disagree with this requirement. It breaks the nice way that std.format works. The only place where reordering the arguments is useful is in date/time formatting, and a specialized formatter would be suitable for that (and there are many other nice things one can do with a specialized date/time formatter).
Aug 17 2004
Regan Heath wrote:Did you miss the thread that mentioned that sentence structure in various languages differ? Example: english :- "The DOG is BIG" other :- ".. BIG .. DOG" (I don't actually know any other languages) So, it would be kind of useful to be able to define the format strings as: english :- "The $1 is $2" other :- ".. $2 .. $1" and be able to go: printf(format[lang_id],"DOG","BIG");This isn't strictly a requirement of the formatting tools. Perhaps a library function which, given a number of varargs, reordered them and passed them to another function? Your code could look (very roughly) like this: char[] formatString = LookupNLSFormat (msgID, language); char[] reorderString = LookupNLSReorder(msgID, language); vwritef(formatString, doArgumentReorder(reorderString, <args>)); The advantage here is that you can do reordering for NLS support but writef stays simple.
Aug 17 2004
On Tue, 17 Aug 2004 19:45:47 -0700, Russ Lewis <spamhole-2001-07-16 deming-os.org> wrote:Regan Heath wrote:The disadvantage being that the above idea is harder to maintain, there are 2 things that define how the message is displayed, 2 things in which a mistake could be made, 2 things in which you have to make changes, .. How hard or complex is it to implement a writef that can do: writef("The %1 is %2","dog","big"); (%1 and %2 can be changed to any symbol that fits with the current symbol set used in writef) I can't see it being a particularly big leap from what it currently does. Also consider: writef("A really long %1 that contains the same %1 several times. %1's like this could be quite common, yes?","string"); Regan -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/Did you miss the thread that mentioned that sentence structure in various languages differ? Example: english :- "The DOG is BIG" other :- ".. BIG .. DOG" (I don't actually know any other languages) So, it would be kind of useful to be able to define the format strings as: english :- "The $1 is $2" other :- ".. $2 .. $1" and be able to go: printf(format[lang_id],"DOG","BIG");This isn't strictly a requirement of the formatting tools. Perhaps a library function which, given a number of varargs, reordered them and passed them to another function? Your code could look (very roughly) like this: char[] formatString = LookupNLSFormat (msgID, language); char[] reorderString = LookupNLSReorder(msgID, language); vwritef(formatString, doArgumentReorder(reorderString, <args>)); The advantage here is that you can do reordering for NLS support but writef stays simple.
Aug 17 2004
In article <opscwr50rl5a2sq9 digitalmars.com>, Regan Heath says...On Tue, 17 Aug 2004 19:45:47 -0700, Russ LewisWell, I didn't mean to cause trouble here. :) Anyway. I'm agreeing with Regan, and slightly disagreeing with Walter. There /is/ a need to be able do: Sorry, but that's a requirement. It's not an /urgent/ requirement, but you can bet vast sums of money that internationalization will start to become more and more of an issue once other transcoding issues have been dealt with. Russ's idea is good, but obviously not /as/ good as simply coming up with an improved printf() replacement. Right now, POSIX-printf() can do this random-access, but D's writef() can't. It's not urgent, and we'll solve it in time. But it /is/ an internationalization issue, and it won't go away. Arcane JillThis isn't strictly a requirement of the formatting tools. Perhaps a library function which, given a number of varargs, reordered them and passed them to another function? Your code could look (very roughly) like this: char[] formatString = LookupNLSFormat (msgID, language); char[] reorderString = LookupNLSReorder(msgID, language); vwritef(formatString, doArgumentReorder(reorderString, <args>));The disadvantage being that the above idea is harder to maintain, there are 2 things that define how the message is displayed, 2 things in which a mistake could be made, 2 things in which you have to make changes, .. How hard or complex is it to implement a writef that can do: writef("The %1 is %2","dog","big"); (%1 and %2 can be changed to any symbol that fits with the current symbol set used in writef) I can't see it being a particularly big leap from what it currently does. Also consider: writef("A really long %1 that contains the same %1 several times. %1's like this could be quite common, yes?","string");
Aug 18 2004
On Tue, 17 Aug 2004 15:00:28 -0700, Walter wrote:"Arcane Jill" <Arcane_member pathlink.com> wrote in message news:cfsm6d$va0$1 digitaldaemon.com...I think that AJ was suggesting that there exists a business need for a type of formatter that can express in its template, the order that arguments will appear in the resultant string, regardless of the order that they are presented to the formatter. For example (contrived for simplicity): char[] Msg; if (gUserLang == LANG_english) temp = "%{1}s %{2}s %{3}s %{4}s %{5}s\n"; else temp = "%{2}s %{1}s %{5}s %{4}s %{3}s\n"; Msg = expand(temp, pSubjectDesc, pSubject, pVerb, pObjectDesc, pObject); writef(Msg); -- Derek Melbourne, Australia 18/Aug/04 10:31:55 AMFurther, for reasons of internationalization, our printf replacement mustbeable to random-access its variadic arguments.I disagree with this requirement. It breaks the nice way that std.format works. The only place where reordering the arguments is useful is in date/time formatting, and a specialized formatter would be suitable for that (and there are many other nice things one can do with a specialized date/time formatter).
Aug 17 2004
Jill ~ I have a utf-8 transcoder that I'm using as a plaything within Mango; if you're interested, I'll send it on. "Arcane Jill" <Arcane_member pathlink.com> wrote in message news:cfsm6d$va0$1 digitaldaemon.com...We have two separate problems: (1) formatted I/O (2) unformatted I/O For unformatted I/O, we need the ability to read a sequence of dchars fromsomesource, and the ability to write a sequence of dchars to some sink. Theclasswhich acts as a dchar source must perform decoding from some underlyingubytesource. The class which acts as a dchar sink must perform encoding to some underlying ubyte sink. The source and sink could be anything - a string; a console; a file; asocket; -even a simple counter which counts bytes and throws away data. So, to keep things generic, I shall use the terms "ubyte source", "ubyte sink", "dchar source" and "dchar sink". The traditional terms are: ubyte source = input stream ubyte sink = output stream dchar source = reader dchar sink = writer (I'm using new terms merely in order to avoid confusion with objects in std.stream, mango.io, and Java). For formatted I/O, we need: (1a) a replacement for printf() which emits a formatted sequence of dcharsto anarbitrary dchar sink (1b) a replacement for scanf() which parses a sequence of dchars obtainedfroman arbitrary dchar source Further, for reasons of internationalization, our printf replacement mustbeable to random-access its variadic arguments. Observe that if the output of (1a) is plumbed into an encoder, and theinput to(1b) is plumbed into a decoder, then formatted transcoding is achieved.Thismakes our printf/scanf replacements relatively easy to write. They arelikely torequire very little modification from the existing format()/unformat()routines,with essentially the only difference being that they must be dchar-based,notchar-based. (Random-access of the arguments would be a new feature,however,though not necessarily an urgent one). Another oft-voiced requirement is that transcoding be independent of any particular string/stream implementation. (I suspect that if Phobos streamswerefully-featured, fully-documented, bug-free and intuitive, then nobodywould beasking for this requirement. But as things are, the requirement is there). So ... listed below are the jobs which need to be done. Volunteers arerequestedfor any unclaimed jobs: (1) The source and sink interfaces need to be nailed down. (2) Given (1), dchar-based format()/unformat() replacements can bewritten.(3) Given (1), encoder and decoder classes/interfaces can be written. (4) Given (3), classes can be written to attach our encoders/decoders tostd andmango streams, to strings, etc. (5) Given (3), encoders and decoders for SPECIFIC encodings can now bewritten.(6) Will somebody /please/ document std.Stream? I volunteer for (1) and (3). I'm hoping Sean will volunteer for (2).AntiAlias'sexcellent ideas for throughput enhancement using buffers are part of (1)and(3), so I suggest AntiAlias and I send each other code back and forthuntil weare both happy with it. Volunteers still needed for (4), (5) and (6) (though (4) and (5) aredependentupon (3)). Anyone who's a dab hand at Wiki might like to volunteer for(6).Arcane Jill
Aug 18 2004
In article <cfv1d0$26s7$1 digitaldaemon.com>, antiAlias says...Jill ~ I have a utf-8 transcoder that I'm using as a plaything within Mango; if you're interested, I'll send it on.Not really interested because (a) there's one in std.utf, and (b) I could write my own in just a few lines of code anyway. But we're really talking about general concepts here, not specific encodings. We need to get the architecture "right" first - which I guess means, in a form that everyone is happy with - and /then/ we start plugging in specific encodings. UTF-8 is one of the easiest, so I'm really not troubled by it. (ASCII is /the/ easiest, obviously). Antialias, it was you who came up with some ideas for throughput enhancement using buffers. I think we can do use those ideas without sacrificing genericity, which is why I suggested we collaborate on the generic interface. Would you be interested in that? Jill
Aug 18 2004
"Arcane Jill" <Arcane_member pathlink.com> wrote in messageAntialias, it was you who came up with some ideas for throughputenhancementusing buffers. I think we can do use those ideas without sacrificinggenericity,which is why I suggested we collaborate on the generic interface. Wouldyou beinterested in that?Sure, Jill. That's what I was attempting <g> Was offering a transcoder built in the manner suggested; to experiment with said interface. Sometimes it's easier to deal with a more concrete entitiy as opposed to something completely virtual -- if nothing else, it should serve to more fully describe the suggested approach.. I'll need an email address, if this module would be of any value to you?
Aug 18 2004
In article <cg01qr$13u$1 digitaldaemon.com>, antiAlias says...I'll need an email address, if this module would be of any value to you?If you have an account on dsource, you can contact me privately there. My username is "Arcane Jill" Jill
Aug 19 2004
Arcane Jill wrote: ...(6) Will somebody /please/ document std.Stream? I volunteer for (1) and (3). I'm hoping Sean will volunteer for (2). AntiAlias's excellent ideas for throughput enhancement using buffers are part of (1) and (3), so I suggest AntiAlias and I send each other code back and forth until we are both happy with it. Volunteers still needed for (4), (5) and (6) (though (4) and (5) are dependent upon (3)). Anyone who's a dab hand at Wiki might like to volunteer for (6).I'm not volunteering to single-handedly re-document std.stream, but I did start a wiki page for that purpose: http://www.prowiki.org/wiki4d/wiki.cgi?DocComments/Phobos/StdStream (Anyone can edit it by clicking on the "Edit" link in the upper right corner of the page.)Arcane Jill-- Justin (a/k/a jcc7) http://jcc_7.tripod.com/d/
Aug 18 2004