www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Unexpected behaviour using remove on char[]

reply IGotD- <nise nise.com> writes:
I have a part in my code that use remove

buffer.remove(tuple(0, size));

with

char[] buffer

What I discovered is that remove doesn't really remove size 
number of bytes but also removed entire multibyte characters and 
consider that one step. The result was of course that I got out 
of bounds exceptions as it went past the end.

When I changed char[] to ubyte[] my code started to work 
correctly again.

According to the documentation a char is an "unsigned 8 bit 
(UTF-8 code unit)" so you really believe you are working on 
bytes. I presume that under the hood there are range iterators at 
work and those work multibyte characters. However you can iterate 
over one byte characters as well as an option and you don't know 
what happens underneath.

I'm a bit confused, when should I expect that the primitives work 
with single versus multibyte chars in array operations?
Oct 25 2020
parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 10/25/20 12:29 PM, IGotD- wrote:

 What I discovered is that remove doesn't really remove size number of=20
 bytes but also removed entire multibyte characters and consider that on=
e=20
 step. The result was of course that I got out of bounds exceptions as i=
t=20
 went past the end.
This is the infamous "auto decode" at work.
 When I changed char[] to ubyte[] my code started to work correctly agai=
n. Instead of changing the type, you can temporarily treat them as ubyte[]=20 with std.string.representation: import std.stdio; import std.string; void print(R)(R range) { writefln!"%-(%s, %)"(range); } void main() { auto s =3D "abc=C3=A7d".dup; print(s); print(s.representation); } Prints: a, b, c, =C3=A7, d 97, 98, 99, 195, 167, 100 Ali
Oct 25 2020