www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - extended characterset output

reply anonymous <anon ymous.org> writes:
What's the proper way to output all characters in the extended 
character set?

```d
void main()
{
     foreach(char c; 0 .. 256)
     {
        write(isControl(c) ? '.' : c);
     }
}
```

Expected output:
```
................................ 

```

Actual output:
```
................................ 

```

Works as expected in python.

Thanks
Apr 07 2022
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 4/7/22 23:13, anonymous wrote:
 What's the proper way to output all characters in the extended character
 set?
It is not easy to answer because there are a number of concepts here that may make it trivial or complicated. The configuration of the output device matters. Is it set to Windows-1252 or are you using Unicode strings in Python?
 ```d
 void main()
 {
      foreach(char c; 0 .. 256)
'char' is wrong there because 'char' has a very special meaning in D: A UTF-8 code unit. Not a full Unicode character in many cases, especially in the "extended" set. I think your problem will be solved simply by replacing 'char' with 'dchar' there: foreach (dchar c; ... However, isControl() below won't work because isControl() only knows about the ASCII table. It would miss the unprintable characters above 127.
      {
         write(isControl(c) ? '.' : c);
      }
 }
 ```
This works: import std.stdio; bool isPrintableLatin1(dchar value) { if (value < 32) { return false; } if (value > 126 && value < 161) { return false; } return true; } void main() { foreach (dchar c; 0 .. 256) { write(isPrintableLatin1(c) ? c : '.'); } writeln(); // import std.encoding; // foreach(ubyte c; 0 .. 256) { // if (isPrintableLatin1(c)) { // Latin1Char[1] from = [ cast(Latin1Char)c ]; // string to; // transcode(from, to); // write(to); // } else { // write('.'); // } // } // writeln(); } I left some code commented-out, which I experimented with. (That works as well.) Ali
Apr 08 2022
next sibling parent reply anonymous <anon ymous.org> writes:
On Friday, 8 April 2022 at 08:36:33 UTC, Ali Çehreli wrote:
 On 4/7/22 23:13, anonymous wrote:
 What's the proper way to output all characters in the
extended character
 set?
It is not easy to answer because there are a number of concepts here that may make it trivial or complicated. The configuration of the output device matters. Is it set to Windows-1252 or are you using Unicode strings in Python?
I'm running Ubuntu and my default language is en_US.UTF-8.
 ```d
 void main()
 {
      foreach(char c; 0 .. 256)
'char' is wrong there because 'char' has a very special meaning in D: A UTF-8 code unit. Not a full Unicode character in many cases, especially in the "extended" set. I think your problem will be solved simply by replacing 'char' with 'dchar' there: foreach (dchar c; ...
I tried that. It didn't work.
 However, isControl() below won't work because isControl() only 
 knows about the ASCII table. It would miss the unprintable 
 characters above 127.

      {
         write(isControl(c) ? '.' : c);
      }
 }
 ```
Oh okay, that may have been the reason.
 This works:

 import std.stdio;

 bool isPrintableLatin1(dchar value) {
   if (value < 32) {
     return false;
   }

   if (value > 126 && value < 161) {
     return false;
   }

   return true;
 }

 void main() {
   foreach (dchar c; 0 .. 256) {
     write(isPrintableLatin1(c) ? c : '.');
   }
Nope... running this code, I get a bunch of digits as the output. The dot's don't even show up. Maybe I'm drunk or lacking sleep. Weird, I got this strange feeling that this problem stemmed from the compiler I'm using (GDC) so I installed DMD. Would you believe everything worked fine afterwords? To include the original version where I used isControl and 'dchar' instead of 'char'. I wonder why that is? Thanks Ali.
Apr 08 2022
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 4/8/22 02:51, anonymous wrote:

 Weird, I got this strange feeling that this problem stemmed from the
 compiler I'm using (GDC)
Some distribution install an old gdc. What version is yours? Ali
Apr 08 2022
parent anonymous <anon ymous.com> writes:
On Friday, 8 April 2022 at 15:06:41 UTC, Ali Çehreli wrote:
 On 4/8/22 02:51, anonymous wrote:

 Weird, I got this strange feeling that this problem stemmed
from the
 compiler I'm using (GDC)
Some distribution install an old gdc. What version is yours? Ali
Not sure actually. I just did "apt install gdc" and assumed the latest available. Let me check. Here's the version output (10.3.0?): anon ymous:~/$ gdc --version gdc (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0 Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Apr 08 2022
prev sibling parent anonymous <anon ymous.org> writes:
On Friday, 8 April 2022 at 08:36:33 UTC, Ali Çehreli wrote:
[snip]
 However, isControl() below won't work because isControl() only 
 knows about the ASCII table. It would miss the unprintable 
 characters above 127.
[snip] This actuall works because I'm using std.uni.isControl() instead of std.ascii.isControl().
Apr 08 2022