www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 2742] New: std.stdio assumes console works in utf-8

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2742

           Summary: std.stdio assumes console works in utf-8
           Product: D
           Version: 2.025
          Platform: PC
        OS/Version: Windows
            Status: NEW
          Keywords: spec
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: bugzilla digitalmars.com
        ReportedBy: maxmo pochta.ru


This is wrong on Windows. One *can* set console codepage to utf8 and font to
Lucida Console, though this is unusual configuration and console programs can't
work out of the box. This leaves std.stdio useless. As far as I know, this
applies also to Phobos1. If this is not going to be fixed, it should be
documented.


-- 
Mar 18 2009
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2742






There was a brief discussion on this
http://tinyurl.com/c789py
but it was inconclusive.

Meanwhile, check out
http://pr.stewartsplace.org.uk/d/sutil/


-- 
Mar 30 2009
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2742






But how many DOS or Windows console apps in the real world output UTF-8? 
Presumably not many, considering that no versions of DOS and only a few 
versions of Windows support it.  There's also a causal loop in that even 
modern Windows versions don't come with the console code page set to 65001 
by default.  I don't know what is likely to break this loop, but I doubt 
that the restrictiveness of one language's standard library is going to do 
it.
There is PoshConsole http://poshconsole.codeplex.com/ It's all .net and WPF, therefore UTF-16, but it's way different architecture and interface. BTW cmd has /u switch for (redirected) unicode output, I use it sometimes. --
Mar 30 2009
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2742


Andrei Alexandrescu <andrei metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
                 CC|                            |andrei metalanguage.com
         AssignedTo|nobody puremagic.com        |andrei metalanguage.com


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 11 2009
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2742


Stewart Gordon <smjg iname.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrej.mitrovich gmail.com



*** Issue 4522 has been marked as a duplicate of this issue. ***

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 28 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2742




14:24:54 PDT ---
Any fresh ideas on how to fix this?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 26 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2742




I suppose the way to go about it is to create wrapper stream classes that
provide encoding conversion.  And have ready-made instances for stdin/out/err,
with the codepage detected at launch.

The difficulty I can see is seekability, but this probably isn't needed given
that it'll be primarily for stdio (which are inherently not seekable) and text
files (for which seeking isn't particularly useful).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 26 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2742




10:58:53 PDT ---
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=114211

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 29 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2742




11:02:43 PDT ---
This can be a good test for dchar[]-looking ranges.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 29 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2742




11:39:42 PDT ---
Looking at std.stdio, an easy fix will be to make sure all IO goes through
File.write, which calls LockingTextWriter.put, which actually tries to do the
correct transcoding. You just need to have target codepage in File, and use it
in LockingTextWriter.put. The first thing is to statically import
core.stdc.stdio to minimize and control its usage.

Though a nice design will be correctly implemented .net-way
Streams/TextStreams, whatever you want them to work like in D.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 29 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2742




20:01:07 PDT ---
According to this page http://codesnippets.joyent.com/posts/show/414
you can get and set the codepage via the
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage] key's OEMCP
value.

Setting the codepage requires a restart though. Also, changing the codepage has
other effects, e.g. using ALT+Numpad keys is handled differently (with codepage
1252 you don't have to prepend a zero when using ALT+Numkey apparently).

Here's how to fetch the value:
import std.stdio;
import std.windows.registry;

void main()
{
    Key HKLM  = Registry.localMachine;
    Key SFW = HKLM.getKey(r"SYSTEM\CurrentControlSet\Control\Nls\CodePage");    

    auto codePage = SFW.getValue("OEMCP").value_SZ();
    writeln(codePage);
}

Note that the key type is REG_SZ, a string, not a binary value. So if you want
to set the code page programmatically, you have to call:
SFW.setValue("OEMCP", "1252");

One more thing, there was this comment:
"Change the code page in your registry and you may not be able to reboot your
windows anymore."

That sounds kind of scary. Perhaps all of this should be left to the user to do
and just document it somewhere in the docs.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 24 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2742





 According to this page http://codesnippets.joyent.com/posts/show/414
 you can get and set the codepage via the
 [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage] key's OEMCP
 value.
 
 Setting the codepage requires a restart though.
Not if you do it using chcp on the command line, or (presumably) SetConsoleCP in the Windows API.
 Also, changing the codepage has other effects, e.g. using ALT+Numpad 
 keys is handled differently (with codepage 1252 you don't have to 
 prepend a zero when using ALT+Numkey apparently).
<snip> I don't have to prepend a zero anyway. It just produces a different character if I do. Traditionally at least, with a 0 it types a character from the ANSI set, and without a 0 it types a character from the OEM set. But as I test it (Win7), it depends on what font the command prompt is set to. ----- Lucida Console or Consolas ----- C:\Users\StewartGordon>chcp 850 Active code page: 850 C:\Users\StewartGordon>£úœ£ '£úœ£' is not recognized as an internal or external command, operable program or batch file. C:\Users\StewartGordon>chcp 1252 Active code page: 1252 C:\Users\StewartGordon>£úœ£ ----- Raster Fonts ----- C:\Users\StewartGordon>chcp 850 Active code page: 850 C:\Users\StewartGordon>£úo£ '£úo£' is not recognized as an internal or external command, operable program or batch file. C:\Users\StewartGordon>chcp 1252 Active code page: 1252 C:\Users\StewartGordon>ú·£ú ---------- The sequence of strange characters is Alt+0163, Alt+163, Alt+0156, Alt+156 in each case. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
May 25 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2742


Vladimir Panteleev <thecybershadow gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |thecybershadow gmail.com



05:02:20 PDT ---
Since no one seems to have mentioned this here yet:

http://msdn.microsoft.com/en-us/library/ms686036(v=vs.85).aspx

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 25 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2742




08:20:37 PDT ---
And maybe this too (for input):
http://msdn.microsoft.com/en-us/library/ms686013%28v=vs.85%29.aspx

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 26 2011
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2742


Walter Bright <bugzilla digitalmars.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|spec                        |
                 CC|                            |bugzilla digitalmars.com



00:29:35 PST ---
Not a language spec issue.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jan 23 2012