D - other languages for output.writeLine

Y.Tomino (43/46) Nov 22 2003 Hello.

Y.Tomino (11/11) Nov 22 2003 Sorry, I mistook editing.
Walter (7/53) Nov 22 2003 I'm puzzled why it's necessary to convert to wide char and then back to

Y.Tomino (22/25) Nov 23 2003 Because WriteFile can't output unicode letters to console.

Y.Tomino (3/7) Nov 23 2003 mstring m = toMBCS(w);
Hauke Duden (27/40) Nov 23 2003 This will only work if the current system codepage is UTF-8, since

Y.Tomino (8/29) Nov 23 2003 Sorry, It's my editing mistake. My toUTF16(mstring) is not used.

Matthew Wilson (3/9) Nov 23 2003 That would be my expectation of any unimplemented API function on Win9x ...
Hauke Duden (9/12) Nov 23 2003 Maybe, maybe not. In my experience, when you're dealing with the Windows...

Matthew Wilson (10/21) Nov 23 2003 ERROR_CALL_NOT_IMPLEMENTED

Hauke Duden (19/22) Nov 24 2003 Well, that can only be true for functions that already existed when the

Y.Tomino (9/12) Nov 24 2003 A even in the case of NT/2000/XP, WriteConsoleW may fail if handle of
Matthew Wilson (17/37) Nov 24 2003 Sure. I don't think anyone's suggesting otherwise.

Hauke Duden (26/61) Nov 24 2003 The point I was trying to make is that you cannot generally assume that

Matthew Wilson (8/14) Nov 24 2003 The alternative is to have the requisite amount of equivalent code build

Raiko (8/27) Nov 23 2003 Just to jump in for a second.

Matthew Wilson (6/30) Nov 23 2003 You're not out of place! :-)
Hauke Duden (10/11) Nov 24 2003 Unfortunately, that would require every D application to ship with the
Julio C�sar Carrascal Urquijo (2/4) Nov 24 2003 There's always ICU, wich is included in Parrot (Perl 6's engine).

"Y.Tomino" <demoonlit inter7.jp> writes:

Hello.

DMD accepts the unicode identifier when source file is written with UTF-8.
But we can't output non-ascii letters (Japanese, etc).

This code fix it to be able to output non-ascii letters to console with
UTF-8 source code.

YT

When DMD 0.74 released, Walter wrote.
That is a problem, I'm not sure what to do about it. One thing I have been
looking for is a mapping from Shift-JIS to unicode. Do you have such a
table?

----

typedef char[] mstring; //multi-byte encoding string

wchar[] toUTF16(mstring s)
{
 wchar[] result;
 result.length = MultiByteToWideChar(0, 0, s, s.length, null, 0);
 MultiByteToWideChar(0, 0, s, s.length, result, result.length);
 return result;
}

class Console : File
{
 this(HANDLE _handle, FileMode _mode){ super(_handle, _mode); }
 override void writeString(char[] s)
 {
  if(s.length > 0){
   DWORD written;
   wchar[] w = toUTF16(s);
   if(WriteConsoleW(handle, &w[0], w.length, &written, null) == FALSE){
    mstring m = toMBCS(w);
    if(WriteConsoleA(handle, &m[0], m.length, &written, null) == FALSE){
     super.writeString(m); // for redirect
    }
   }
  }
 }
 override void write(char[] s)
 {
  super.write(s.length);
  writeExact(&s[0], s.length * char.size); // for binary
 }
 static this()
 {
  std.stream.stdout = new Console(std.stream.stdout.handle(), FileMode.Out);
  std.stream.stderr = new Console(std.stream.stderr.handle(), FileMode.Out);
 }
}

Nov 22 2003

"Y.Tomino" <demoonlit inter7.jp> writes:

Sorry, I mistook editing.
toMBCS is here.

----

mstring toMBCS(wchar[] s)
{
 mstring result;
 result.length = WideCharToMultiByte(0, 0, s, s.length, null, 0, null,
null);
 WideCharToMultiByte(0, 0, s, s.length, result, result.length, null, null);
 return result;
}

Nov 22 2003

"Walter" <walter digitalmars.com> writes:

I'm puzzled why it's necessary to convert to wide char and then back to
multi byte?


"Y.Tomino" <demoonlit inter7.jp> wrote in message
news:bpp6od$1lfm$1 digitaldaemon.com...
 Hello.

 DMD accepts the unicode identifier when source file is written with UTF-8.
 But we can't output non-ascii letters (Japanese, etc).

 This code fix it to be able to output non-ascii letters to console with
 UTF-8 source code.

 YT

 When DMD 0.74 released, Walter wrote.
That is a problem, I'm not sure what to do about it. One thing I have


been
looking for is a mapping from Shift-JIS to unicode. Do you have such a
table?

 ----

 typedef char[] mstring; //multi-byte encoding string

 wchar[] toUTF16(mstring s)
 {
  wchar[] result;
  result.length = MultiByteToWideChar(0, 0, s, s.length, null, 0);
  MultiByteToWideChar(0, 0, s, s.length, result, result.length);
  return result;
 }

 class Console : File
 {
  this(HANDLE _handle, FileMode _mode){ super(_handle, _mode); }
  override void writeString(char[] s)
  {
   if(s.length > 0){
    DWORD written;
    wchar[] w = toUTF16(s);
    if(WriteConsoleW(handle, &w[0], w.length, &written, null) == FALSE){
     mstring m = toMBCS(w);
     if(WriteConsoleA(handle, &m[0], m.length, &written, null) == FALSE){
      super.writeString(m); // for redirect
     }
    }
   }
  }
  override void write(char[] s)
  {
   super.write(s.length);
   writeExact(&s[0], s.length * char.size); // for binary
  }
  static this()
  {
   std.stream.stdout = new Console(std.stream.stdout.handle(),

FileMode.Out);
   std.stream.stderr = new Console(std.stream.stderr.handle(),

FileMode.Out);
  }
 }

Nov 22 2003

"Y.Tomino" <demoonlit inter7.jp> writes:

Because WriteFile can't output unicode letters to console.
WriteConsoleW works correctly.
String literal on source code is UTF-8, first, it converts to UTF-16 for
WriteConsoleW.

But WriteConsoleW doesn't wok on Windows95/98/Me.
Microsoft Platform SDK says.
 Implemented as Unicode and ANSI versions on Windows NT/2000/XP. Also

supported by Microsoft Layer for Unicode.
So it call WriteConsoleA if WriteConsoleW failed.
WriteConsoleA's argument must be multi-byte string.
Multi-byte string is not UTF-8, it's necessary to convert with
WideCharToMultiByte.
Since Unicode has many characters rather than MBCS(Shift-JIS), it tries
WriteConsoleW previously.

And when we used redirect( C:\>myexe > output.txt ),
Console API may fail. It have to call super.writeString.
But as output of redirect, multi-byte encoded text file is natural like
other programs.
Therefore it pass "m" instead of "s" to super.writeString.

Thanks.
YT

"Walter" <walter digitalmars.com> wrote in message
news:bppk03$28l6$1 digitaldaemon.com...
 I'm puzzled why it's necessary to convert to wide char and then back to
 multi byte?

Nov 23 2003

"Y.Tomino" <demoonlit inter7.jp> writes:

Sorry, WriteConsoleA is same as WriteFile in this case, it's unnecessarily.

mstring m = toMBCS(w);
if(WriteConsoleA(handle, &m[0], m.length, &written, null) == FALSE){
  super.writeString(m); // for redirect
}

mstring m = toMBCS(w);
super.writeString(m); // for 95/98/Me and redirect

Nov 23 2003

Hauke Duden <H.NS.Duden gmx.net> writes:

Y.Tomino wrote:
 wchar[] toUTF16(mstring s)
 {
  wchar[] result;
  result.length = MultiByteToWideChar(0, 0, s, s.length, null, 0);
  MultiByteToWideChar(0, 0, s, s.length, result, result.length);
  return result;
 }

<snip>
  override void writeString(char[] s)
  {
   if(s.length > 0){
    DWORD written;
    wchar[] w = toUTF16(s);


This will only work if the current system codepage is UTF-8, since
MultiByteToWideChar assumes that the input string is in the current code
page. Passing CP_UTF8 to MultiByteToWideChar won't help either, because
that is only supported on Win98 and up.

Seems to me that the only way to do this is to manually convert the
string from UTF-8 to UTF-16 (not that much of a deal). The Win32
functions won't help you much because there's absolutely no Unicode
support on Win95.

    if(WriteConsoleW(handle, &w[0], w.length, &written, null) == FALSE){

This call might be a little dangerous. WriteConsoleW is not supported on
Win9x, so there's no guarantee that it won't cause a crash on some
systems or return an undefined result (or is there some explicit
guarantee somewhere in the docs?).

It would probably be better to check whether the OS is an NT variant and
call the W and A versions accordingly.

Something like:

OSVERSIONINFO osVersion;
GetVersionEx(&osVersion);

if(osVersion.dwPlatformId==VER_PLATFORM_WIN32_NT)
	WriteConsoleW(...);
else
{
	mstring m = toMBCS(w);
	WriteConsoleA(...)
}


Hauke

Nov 23 2003

"Y.Tomino" <demoonlit inter7.jp> writes:

 wchar[] toUTF16(mstring s)
 {
  wchar[] result;
  result.length = MultiByteToWideChar(0, 0, s, s.length, null, 0);
  MultiByteToWideChar(0, 0, s, s.length, result, result.length);
  return result;
 }

 <snip>
  override void writeString(char[] s)
  {
   if(s.length > 0){
    DWORD written;
    wchar[] w = toUTF16(s);


 This will only work if the current system codepage is UTF-8, since
 MultiByteToWideChar assumes that the input string is in the current code
 page. Passing CP_UTF8 to MultiByteToWideChar won't help either, because
 that is only supported on Win98 and up.

Sorry, It's my editing mistake. My toUTF16(mstring) is not used.
toUTF16 called from writeString is std.utf.toUTF16(char[])
because D's typedef is strong.
(I mistake copied my wrong toUTF16 instead of toMBCS :-)

 This call might be a little dangerous. WriteConsoleW is not supported on
 Win9x, so there's no guarantee that it won't cause a crash on some
 systems or return an undefined result (or is there some explicit
 guarantee somewhere in the docs?).

I think ~W API return FALSE and GetLastError() = ERROR_CALL_NOT_IMPLEMENTED
on Win9x...
Will it crash or undefined result ?

YT

Nov 23 2003

"Matthew Wilson" <matthew.hat stlsoft.dot.org> writes:

 This call might be a little dangerous. WriteConsoleW is not supported on
 Win9x, so there's no guarantee that it won't cause a crash on some
 systems or return an undefined result (or is there some explicit
 guarantee somewhere in the docs?).

 I think ~W API return FALSE and GetLastError() =

ERROR_CALL_NOT_IMPLEMENTED
 on Win9x...

That would be my expectation of any unimplemented API function on Win9x (as
long as it actually exists, of course)

Nov 23 2003

Hauke Duden <H.NS.Duden gmx.net> writes:

Y.Tomino wrote:
 I think ~W API return FALSE and GetLastError() = ERROR_CALL_NOT_IMPLEMENTED
 on Win9x...
 Will it crash or undefined result ?

Maybe, maybe not. In my experience, when you're dealing with the Windows 
API you should better not rely on anything that is not explicitly stated 
in the documentation. Otherwise there will quite often be some obscure 
combination of Windows version, system language and system DLL versions 
that will violate your assumption.

So, since testing on all possible Windows configuations is close to 
impossible I usually stick to the documented stuff.

Hauke

Nov 23 2003

"Matthew Wilson" <matthew.hat stlsoft.dot.org> writes:

"Hauke Duden" <H.NS.Duden gmx.net> wrote in message
news:bprg1c$1s8k$1 digitaldaemon.com...
 Y.Tomino wrote:
 I think ~W API return FALSE and GetLastError() =


ERROR_CALL_NOT_IMPLEMENTED
 on Win9x...
 Will it crash or undefined result ?

 Maybe, maybe not. In my experience, when you're dealing with the Windows
 API you should better not rely on anything that is not explicitly stated
 in the documentation. Otherwise there will quite often be some obscure
 combination of Windows version, system language and system DLL versions
 that will violate your assumption.

It is my understanding that all unimplemented functions in the Win32 for a
given operating system cause the thread error to be set to
ERROR_CALL_NOT_IMPLEMENTED.

 So, since testing on all possible Windows configuations is close to
 impossible I usually stick to the documented stuff.

Your caution is worthy, and I agree in most cases. However, I think in this
case it is safe to go with GetLastError.

Cheers

Matthew

Nov 23 2003

Hauke Duden <H.NS.Duden gmx.net> writes:

Matthew Wilson wrote:
 It is my understanding that all unimplemented functions in the Win32 for a
 given operating system cause the thread error to be set to
 ERROR_CALL_NOT_IMPLEMENTED.

Well, that can only be true for functions that already existed when the 
operating system was shipped, right?

But I agree, if the "Ansi" version is supported, then the missing 
Unicode function will probably return NOT_IMPLEMENTED (or some other 
error - you can never be sure!).

However, the Unicode function might fail for other reasons as well and 
maybe the ANSI version doesn't. Could be a simple case of not having 
enough free memory for the Unicode strings, but just enough for the ANSI 
version. An automated fallback might cause inconsistency within the 
program and its data (e.g. mixed ANSI and Unicode data in a file or 
something similar). If you go the fallback route you'd have to at least 
check the error code. If you want to be on the safe side, that is.

I find it easier to just check for NTness. Since this boolean doesn't 
change, it can be checked once at startup and then stored, so you won't 
have to call GetVersionEx every time you have to decide between Ansi and 
Unicode versions. This might be something that could be done by Phobos - 
something like std.os.windows.isWinNT().

Hauke

Nov 24 2003

"Y.Tomino" <demoonlit inter7.jp> writes:

But I agree, if the "Ansi" version is supported, then the missing
Unicode function will probably return NOT_IMPLEMENTED (or some other
error - you can never be sure!).

A even in the case of NT/2000/XP, WriteConsoleW may fail if handle of
standard-output was redirected.
(GetLastError() = ERROR_INVALID_HANDLE)
WriteConsoleA may fail, too.

A simple way is that if WriteConsoleW fails, pass ANSI(MBCS)-converted
string to WriteFile.
WriteFile can write ANSI string to both Console and redirected file.

Thanks.
YT

Nov 24 2003

"Matthew Wilson" <matthew.hat stlsoft.dot.org> writes:

 It is my understanding that all unimplemented functions in the Win32 for


a
 given operating system cause the thread error to be set to
 ERROR_CALL_NOT_IMPLEMENTED.

 Well, that can only be true for functions that already existed when the
 operating system was shipped, right?

Sure. I don't think anyone's suggesting otherwise.

I don't understand your point.

 But I agree, if the "Ansi" version is supported, then the missing
 Unicode function will probably return NOT_IMPLEMENTED (or some other
 error - you can never be sure!).

Naturally a particular function may be incorrectly written. What I'm saying
is that it is a design feature of Win9x that a stubbed (as opposed to
entirely missing) function will set the NOT_IMPL value to the thread error.

 However, the Unicode function might fail for other reasons as well and
 maybe the ANSI version doesn't. Could be a simple case of not having
 enough free memory for the Unicode strings, but just enough for the ANSI
 version. An automated fallback might cause inconsistency within the
 program and its data (e.g. mixed ANSI and Unicode data in a file or
 something similar). If you go the fallback route you'd have to at least
 check the error code. If you want to be on the safe side, that is.

This doesn't make any kind of sense to me. Why would anyone call a function
without allocating the appropriate amount of memory, other than through
their own incompetence? And why would such incompetence only manifest when
doing Unicode programming, and not ANSI?

 I find it easier to just check for NTness. Since this boolean doesn't
 change, it can be checked once at startup and then stored, so you won't
 have to call GetVersionEx every time you have to decide between Ansi and
 Unicode versions. This might be something that could be done by Phobos -
 something like std.os.windows.isWinNT().

That's entirely true. In fact, this would be more appropriate as a robust
and consistent implementation. But, given that, why not simply use MSLU, and
take all the hassles from we poor overworked D people and utilise the
industry, late in the day though it may be, of Microsoft. The ng for MSLU is
well serviced, the library is free and redistributable, it is easy to use,
and works well.

Matthew

Nov 24 2003

Hauke Duden <H.NS.Duden gmx.net> writes:

Matthew Wilson wrote:
given operating system cause the thread error to be set to
ERROR_CALL_NOT_IMPLEMENTED.

Well, that can only be true for functions that already existed when the
operating system was shipped, right?

 
 
 Sure. I don't think anyone's suggesting otherwise.
 
 I don't understand your point.

The point I was trying to make is that you cannot generally assume that 
all unimplemented functions return ERROR_CALL_NOT_IMPLEMENTED. This is 
not really that much of an issue for Unicode functions that have an 
implemented Ansi version, but there are other functions that only exist 
on NT that may not have a stub on Win9x.

I guess I'm just saying that a consistent way to handle there issues 
would be preferable instead of trying to deduce which functions are 
"stub-unimplemented" as opposed to non-existent.

  >>However, the Unicode function might fail for other reasons as well and
maybe the ANSI version doesn't. Could be a simple case of not having
enough free memory for the Unicode strings, but just enough for the ANSI
version. An automated fallback might cause inconsistency within the
program and its data (e.g. mixed ANSI and Unicode data in a file or
something similar). If you go the fallback route you'd have to at least
check the error code. If you want to be on the safe side, that is.

 
 
 This doesn't make any kind of sense to me. Why would anyone call a function
 without allocating the appropriate amount of memory, other than through
 their own incompetence? And why would such incompetence only manifest when
 doing Unicode programming, and not ANSI?

Simple example: you have 5000 bytes of free disk space and want to write 
a 4000 character string to a file, using an imaginary 
WriteStringToFileA/W function.

Your system is Win2000, so WriteStringToFileW exists.

However, the call to WriteStringToFileW will fail because this 
implementation needs 8000 bytes of disc space. The Ansi version will 
succeed, though, since it only needs 4000 bytes. If you automatically 
fall back to the Ansi version without checking the error code, then you 
end up writing Ansi data into a file that was supposed to hold Unicode data.

I find it easier to just check for NTness. Since this boolean doesn't
change, it can be checked once at startup and then stored, so you won't
have to call GetVersionEx every time you have to decide between Ansi and
Unicode versions. This might be something that could be done by Phobos -
something like std.os.windows.isWinNT().

 
 
 That's entirely true. In fact, this would be more appropriate as a robust
 and consistent implementation. But, given that, why not simply use MSLU, and
 take all the hassles from we poor overworked D people and utilise the
 industry, late in the day though it may be, of Microsoft. The ng for MSLU is
 well serviced, the library is free and redistributable, it is easy to use,
 and works well.

Because AFAIK the MSLU is not installed on any Win9x system by default. 
Certainly not on Win95. So you'd have to ship it with every application. 
For some applications that may be acceptable, but for others it might 
not. For example, it wouldn't be possible to write a ZIP self-extractor 
in D, because the .exe file would need an additional DLL to extract its 
contents.


Hauke

Nov 24 2003

"Matthew Wilson" <matthew.hat stlsoft.dot.org> writes:

 Because AFAIK the MSLU is not installed on any Win9x system by default.

Correct

 Certainly not on Win95. So you'd have to ship it with every application.

True. And I certainly acknowledge the problems this causes.

 For some applications that may be acceptable, but for others it might
 not. For example, it wouldn't be possible to write a ZIP self-extractor
 in D, because the .exe file would need an additional DLL to extract its
 contents.

The alternative is to have the requisite amount of equivalent code build
into the library. This is an approach I've taken often.

It's a swings & roundabouts deal.

I would certainly prefer the statically bound approach, but I'm aware of
what a huge job it would be to make this work.

Matthew

Nov 24 2003

Raiko <phantom2023 hotmail.com> writes:

Hauke Duden wrote:

 Y.Tomino wrote:
 
 I think ~W API return FALSE and GetLastError() = 
 ERROR_CALL_NOT_IMPLEMENTED
 on Win9x...
 Will it crash or undefined result ?

 
 
 Maybe, maybe not. In my experience, when you're dealing with the Windows 
 API you should better not rely on anything that is not explicitly stated 
 in the documentation. Otherwise there will quite often be some obscure 
 combination of Windows version, system language and system DLL versions 
 that will violate your assumption.
 
 So, since testing on all possible Windows configuations is close to 
 impossible I usually stick to the documented stuff.
 
 Hauke
 

Just to jump in for a second.

Alot of Unicode APIs are supported in Win9x if you have the Unicode layer

ie.. WriteConsoleW (from the Platform SDK docs)

Windows Me/98/95:  WriteConsoleW is supported by the Microsoft Layer for 
Unicode. To use this, you must add certain files to your application, as 
outlined in Microsoft Layer for Unicode on Windows Me/98/95 Systems.

Sorry for being out of place :)

Nov 23 2003

"Matthew Wilson" <matthew.hat stlsoft.dot.org> writes:

 I think ~W API return FALSE and GetLastError() =
 ERROR_CALL_NOT_IMPLEMENTED
 on Win9x...
 Will it crash or undefined result ?


 Maybe, maybe not. In my experience, when you're dealing with the Windows
 API you should better not rely on anything that is not explicitly stated
 in the documentation. Otherwise there will quite often be some obscure
 combination of Windows version, system language and system DLL versions
 that will violate your assumption.

 So, since testing on all possible Windows configuations is close to
 impossible I usually stick to the documented stuff.

 Hauke

 Just to jump in for a second.

 Alot of Unicode APIs are supported in Win9x if you have the Unicode layer

 ie.. WriteConsoleW (from the Platform SDK docs)

 Windows Me/98/95:  WriteConsoleW is supported by the Microsoft Layer for
 Unicode. To use this, you must add certain files to your application, as
 outlined in Microsoft Layer for Unicode on Windows Me/98/95 Systems.

 Sorry for being out of place :)

You're not out of place! :-)

Using MSLU might be an option. It's redistributable, and pretty reliable.
(In fact, the December issue of Windows Developer Network contains an
interesting article on the issue, by one of our foremost authors ...)

Cheers

Matthew

Nov 23 2003

Hauke Duden <H.NS.Duden gmx.net> writes:

Raiko wrote:
 Alot of Unicode APIs are supported in Win9x if you have the Unicode layer

Unfortunately, that would require every D application to ship with the 
MSLU DLL. It's pretty small by todays standards, granted, but I don't 
think that it should be required.

Besides, the MSLU does have some quirks. There are quite a lot of bugs 
in there when it comes to error handling or rarely used functions. And 
Microsoft doesn't really support it well either.

And, of course, much of the GUI stuff is not included in the MSLU 
(Common Controls!).

Hauke

Nov 24 2003

"Julio C�sar Carrascal Urquijo" <adnoctum phreaker.net> writes:

 Unicode. To use this, you must add certain files to your application, as
 outlined in Microsoft Layer for Unicode on Windows Me/98/95 Systems.


There's always ICU, wich is included in Parrot (Perl 6's engine).

http://oss.software.ibm.com/icu/userguide/index.html

Nov 24 2003

D Programming

C/C++ Programming

Other

D - other languages for output.writeLine