www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - UTF-8 strings and endianness

reply "denizzzka" <4denizzz gmail.com> writes:
Hi!

How to convert D's string to big endian?
How to convert to D's string from big endian?
Oct 29 2012
next sibling parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
UTF-8 isn't affected by endianness.
Oct 29 2012
next sibling parent "denizzzka" <4denizzz gmail.com> writes:
On Monday, 29 October 2012 at 15:22:39 UTC, Adam D. Ruppe wrote:
 UTF-8 isn't affected by endianness.
Ok, thanks!
Oct 29 2012
prev sibling parent reply "Jesse Phillips" <Jessekphillips+D gmail.com> writes:
On Monday, 29 October 2012 at 15:22:39 UTC, Adam D. Ruppe wrote:
 UTF-8 isn't affected by endianness.
If this is true why does the BOM have marks for big and little endian? http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding
Oct 30 2012
parent reply "Tobias Pankrath" <tobias pankrath.net> writes:
On Tuesday, 30 October 2012 at 17:12:41 UTC, Jesse Phillips wrote:
 On Monday, 29 October 2012 at 15:22:39 UTC, Adam D. Ruppe wrote:
 UTF-8 isn't affected by endianness.
If this is true why does the BOM have marks for big and little endian? http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding
UTF8 has only one?
Oct 30 2012
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
10/30/2012 5:17 PM, Tobias Pankrath пишет:
 On Tuesday, 30 October 2012 at 17:12:41 UTC, Jesse Phillips wrote:
 On Monday, 29 October 2012 at 15:22:39 UTC, Adam D. Ruppe wrote:
 UTF-8 isn't affected by endianness.
If this is true why does the BOM have marks for big and little endian? http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding
UTF8 has only one?
Even Wiki knows the simple truth:
 Byte order has no meaning in UTF-8, [5] so its only use in UTF-8 is 
to signal at the start that the text stream is encoded in UTF-8 -- Dmitry Olshansky
Oct 30 2012
prev sibling parent "Jesse Phillips" <Jessekphillips+D gmail.com> writes:
On Tuesday, 30 October 2012 at 17:17:36 UTC, Tobias Pankrath 
wrote:
 On Tuesday, 30 October 2012 at 17:12:41 UTC, Jesse Phillips 
 wrote:
 On Monday, 29 October 2012 at 15:22:39 UTC, Adam D. Ruppe 
 wrote:
 UTF-8 isn't affected by endianness.
If this is true why does the BOM have marks for big and little endian? http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding
UTF8 has only one?
oops, mixed up and thought he just said "UTF isn't ..."
Oct 30 2012
prev sibling parent reply Jordi Sayol <g.sayol yahoo.es> writes:
Al 29/10/12 16:17, En/na denizzzka ha escrit:
 Hi!
 
 How to convert D's string to big endian?
 How to convert to D's string from big endian?
 
 
UTF-8 is always big emdian. -- Jordi Sayol
Oct 29 2012
next sibling parent "denizzzka" <4denizzz gmail.com> writes:
On Monday, 29 October 2012 at 15:46:43 UTC, Jordi Sayol wrote:
 Al 29/10/12 16:17, En/na denizzzka ha escrit:
 Hi!
 
 How to convert D's string to big endian?
 How to convert to D's string from big endian?
 
 
UTF-8 is always big emdian.
Yes. (I thought that the problem in this place but the problem was different.)
Oct 29 2012
prev sibling parent "denizzzka" <4denizzz gmail.com> writes:
On Monday, 29 October 2012 at 15:46:43 UTC, Jordi Sayol wrote:
 Al 29/10/12 16:17, En/na denizzzka ha escrit:
 Hi!
 
 How to convert D's string to big endian?
 How to convert to D's string from big endian?
 
 
UTF-8 is always big emdian.
oops, what? Q: Is the UTF-8 encoding scheme the same irrespective of whether the underlying processor is little endian or big endian? A: Yes. Since UTF-8 is interpreted as a sequence of bytes, there is no endian problem as there is for encoding forms that use 16-bit or 32-bit code units. Where a BOM is used with UTF-8, it is only used as an ecoding signature to distinguish UTF-8 from other encodings — it has nothing to do with byte order.
Oct 29 2012