www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - String convention

reply Niklas Ulvinge <Niklas_member pathlink.com> writes:
I could't find any info about it so I'm asking here...

I just looked at D and it sounds rather interesting.

Now to my Q:
Strings in D starts with some data that is defining the length of the string.
Why did they decide to use this aproach?
What is this 'data' at the beginning of the string?

This has a limitation, strings can't be longer than 'data' allows.
Is there a way around this?


An idea, I got when I wrote a dynamic array (in C), was to use s[-1] as the size
for array s (and s[-2] for capacity, but that isn't necesary here...).

Couldn't this be used with strings?
Then this would work:
string s = "IDK\0";
printf("%s",s);

Niklas Ulvinge
aka IDK wishes
everyone happy
programming!!!
Jul 01 2006
next sibling parent reply Kirk McDonald <kirklin.mcdonald gmail.com> writes:
Niklas Ulvinge wrote:
 I could't find any info about it so I'm asking here...
 
 I just looked at D and it sounds rather interesting.
 
 Now to my Q:
 Strings in D starts with some data that is defining the length of the string.
 Why did they decide to use this aproach?
 What is this 'data' at the beginning of the string?
 
This is an implementation detail, and shouldn't matter to your code.
 This has a limitation, strings can't be longer than 'data' allows.
 Is there a way around this?
 
I think you are somewhat confused. Strings in D are dynamic arrays of type char. They may be of any length, so long as you have enough RAM. http://www.digitalmars.com/d/arrays.html
 
 An idea, I got when I wrote a dynamic array (in C), was to use s[-1] as the
size
 for array s (and s[-2] for capacity, but that isn't necesary here...).
 
 Couldn't this be used with strings?
 Then this would work:
 string s = "IDK\0";
The D syntax is: char[] s = "IDK"; The \0 is not needed as strings in D are not null-terminated. The length of the string may be retrieved with "s.length".
 printf("%s",s);
 
 Niklas Ulvinge
 aka IDK wishes
 everyone happy
 programming!!!
-- Kirk McDonald Pyd: Wrapping Python with D http://dsource.org/projects/pyd/wiki
Jul 01 2006
next sibling parent reply Niklas Ulvinge <Niklas_member pathlink.com> writes:
In article <e86l86$1364$1 digitaldaemon.com>, Kirk McDonald says...
Niklas Ulvinge wrote:
 I could't find any info about it so I'm asking here...
 
 I just looked at D and it sounds rather interesting.
 
 Now to my Q:
 Strings in D starts with some data that is defining the length of the string.
 Why did they decide to use this aproach?
 What is this 'data' at the beginning of the string?
 
This is an implementation detail, and shouldn't matter to your code.
 This has a limitation, strings can't be longer than 'data' allows.
 Is there a way around this?
 
I think you are somewhat confused. Strings in D are dynamic arrays of type char. They may be of any length, so long as you have enough RAM. http://www.digitalmars.com/d/arrays.html
 
 An idea, I got when I wrote a dynamic array (in C), was to use s[-1] as the
size
 for array s (and s[-2] for capacity, but that isn't necesary here...).
 
 Couldn't this be used with strings?
 Then this would work:
 string s = "IDK\0";
The D syntax is: char[] s = "IDK"; The \0 is not needed as strings in D are not null-terminated. The length of the string may be retrieved with "s.length".
Thanks for clearing things up a bit. First, if the data don't have a terminator, then they can't have a big(infinity) size. This is becouse, s.length need to be a variable. And that variable has got to have a size, wich makes it impossible to make big arrays. I only need to rewrite my Q's... What type is s.length(or for that matter any dynamic array's size)? Or is dynamic arrays implemented in a different way? Niklas Ulvinge aka IDK wishes everyone happy programming!!!
Jul 01 2006
parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
See also my other post, made just before I saw this one. This is a summary.

Niklas Ulvinge wrote:
 In article <e86l86$1364$1 digitaldaemon.com>, Kirk McDonald says...
 Thanks for clearing things up a bit.
 
 First, if the data don't have a terminator, then they can't have a
big(infinity)
 size.
Nor can it *with* a terminator. Your computer's memory has a finite size, deal with it :).
 This is becouse, s.length need to be a variable. And that variable has got to
 have a size, wich makes it impossible to make big arrays.
 
 I only need to rewrite my Q's...
 What type is s.length(or for that matter any dynamic array's size)?
Should be a size_t. size_t.max >= (addressable memory).sizeof - 1 So the theoretical maximum length string is as large as it could be with a terminator. Assuming your OS lets you use that much, which it typically won't.
Jul 01 2006
prev sibling parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Kirk McDonald wrote:
 Niklas Ulvinge wrote:
 I just looked at D and it sounds rather interesting.
Always good to hear.
 Now to my Q:
 Strings in D starts with some data that is defining the length of the 
 string.
Actually, the /reference/ to the (dynamic) string begins with that data (i.e. what in C would be the pointer to it is in D twice as long, the first half containing the length). With static strings, the length is encoded in the type (i.e. a char[3] has length 3) and doesn't need to be stored separately.
 Why did they decide to use this aproach?
Having constant-time access to the length of a string makes a lot of operations more efficient. Having it be separate from the string data enables you to do cool things like efficient string slicing. (see http://www.digitalmars.com/d/arrays.html#slicing ) Oh, and everything I'm saying about strings goes for *any* array type.
 What is this 'data' at the beginning of the string?
An integer value: the length of the string.
 This is an implementation detail, and shouldn't matter to your code.
 
 This has a limitation, strings can't be longer than 'data' allows.
'data' is a size_t. The 'limit' you refer to is therefore /at least/ (the maximum size of addressable memory) - 1 (more on machines with weird pointer types or for array of elements with size > 1).
 Is there a way around this?
Buy a computer that can address more memory at the same time (like a 64-bit one, if you're coming from a 32-bit machine) and making sure your compiler generates appropriate code (i.e. use a 64-bit-aware compiler for a 64-bit platform).
 Couldn't this be used with strings?
 Then this would work:
 string s = "IDK\0";
The D syntax is: char[] s = "IDK"; The \0 is not needed as strings in D are not null-terminated. The length of the string may be retrieved with "s.length".
Well, he wants to use it with printf("%s", ...), so then adding a null terminator at the end would probably be a good idea:
 printf("%s",s);
Though, of course, writef is a better alternative.
Jul 01 2006
prev sibling next sibling parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Niklas Ulvinge" <Niklas_member pathlink.com> wrote in message 
news:e86km2$12ar$1 digitaldaemon.com...

 This has a limitation, strings can't be longer than 'data' allows.
 Is there a way around this?
Keep in mind that the "length" member of an array is the word size of the machine, so that the longest array possible would take up the entire memory space :S
 printf("%s",s);
Never ever ever ever use printf() in D. Please. The spec uses it profusely, but that's because a lot of the examples were written before std.stdio.writefln() was written. Use that instead. You don't even have to have a format string with writefln, i.e. import std.stdio; ... char[] s = "hi"; writefln(s); writefln(4, ", ", 5); writefln("hello");
Jul 01 2006
prev sibling next sibling parent reply "Derek Parnell" <derek psych.ward> writes:
On Sun, 02 Jul 2006 06:07:30 +1000, Niklas Ulvinge  =

<Niklas_member pathlink.com> wrote:

 I could't find any info about it so I'm asking here...

 I just looked at D and it sounds rather interesting.

 Now to my Q:
 Strings in D starts with some data that is defining the length of the =
=
 string.
No, that is a misunderstanding. Strings in D are a variable-length array= = of characters, and variable-length arrays consist of two data items. The= = first is the array reference and this is a pseudo-struct with two member= s = : the length (uint of 32-bits) and a pointer to the first array element = = (void *), the second data item is the array data itself which is a = contiguous block of RAM that will hold at least the number of elements = specified in the 'length' member. But you as a coder don't need to worry about this because the compiler = handles all the manipulation for you.
 Why did they decide to use this aproach?
Because it makes for very fast an flexible dynamic arrays. Slices become= = easy to implement and fast.
 What is this 'data' at the beginning of the string?
There is no data at the beginning of the string data. There is a separat= e = array reference though.
 This has a limitation, strings can't be longer than 'data' allows.
Currently, utf8 strings are limited to 4Gigabytes. This might change on = = 64-bit architectures. But if you are dealing with strings that big you = probably need to rethink you algorithms anyhow ;-)
 Is there a way around this?
Solve the problem when you get to it. Are you actually running into = limitations already?
 An idea, I got when I wrote a dynamic array (in C), was to use s[-1] a=
s =
 the size
 for array s (and s[-2] for capacity, but that isn't necesary here...).=
That's right, it isn't.
 Couldn't this be used with strings?
 Then this would work:
 string s =3D "IDK\0";
 printf("%s",s);
Do not use the C function 'printf'. Use the D function 'writef' and your= = formatting issues will disappear. alias string char[]; string s =3D "IDK"; writef("%s", s); -- = Derek Parnell Melbourne, Australia
Jul 01 2006
next sibling parent "Derek Parnell" <derek psych.ward> writes:
On Sun, 02 Jul 2006 07:13:30 +1000, Derek Parnell <derek psych.ward> wrote:

Oops ! I just woke up and haven't my moring coffee yet ;)

The alias shoulld be

  alias char[] string;


-- 
Derek Parnell
Melbourne, Australia
Jul 01 2006
prev sibling parent reply Niklas Ulvinge <Niklas_member pathlink.com> writes:
Thanks for all replies, now I understand most of what I wanted to know.
(although the Q about the internal structure of dynamic arrays still remains...)


In article <op.tb03wsb06b8z09 ginger.vic.bigpond.net.au>, Derek Parnell says...
But you as a coder don't need to worry about this because the compiler  =

handles all the manipulation for you.
I think as 'real programmers' ;) : "Real programmers can write assembly langauge in any language" This is very hard to do in D, but really easy in C. The foreach statemente as an example. In D, the compiler handles the implementation. I want to know how it is implemented. In languages where "a" + "b" = "ab" works there could be programmers who doesn't see that concating is much more complex than adding a couple of numbers. In D, this is a little better, becouse it's hard to find the concating char (I don't have it now, becouse of an odd bug in firefox). In C/C++ this is better, becouse it was a func, wich indicated how hard it was to do. Some programmers may instead of using: writef(a,b,c) concate them. Wich would be very bad.
Jul 01 2006
next sibling parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Niklas Ulvinge" <Niklas_member pathlink.com> wrote in message 
news:e86qp9$1e3t$1 digitaldaemon.com...
 Thanks for all replies, now I understand most of what I wanted to know.
 (although the Q about the internal structure of dynamic arrays still 
 remains...)
OK. This is the definition for a char[]. This is the same for ALL array types in D, not just strings; just replace "char" with any other type, and it's the same. struct CharArray { private size_t _length = 0; private char* _ptr = null; public size_t length(size_t l) { if(_ptr is null && l > 0) _ptr = malloc(char.sizeof * l); else { if(l > _length) _ptr = realloc(_ptr, char.sizeof * l); } _length = l; return _length; } public size_t length() { return _length; } public char* ptr() { return _ptr; } } There are other methods, such as .dup and .sort, but I won't list them. When you write char[] s; You get CharArray s; It has length 0 and pointer null. So you set its length: s.length = 5; This actually means "s.length(5)". This is because of the property syntax in D. So it allocates enough space for 5 characters. Is that satisfactory?
 The foreach statemente as an example.
 In D, the compiler handles the implementation.
 I want to know how it is implemented.
You worry too much about things that you shouldn't care about. But if you really must know, foreach is implemented as a nested function for the actual foreach body. You can see kind of how it works by looking up how to overload opApply.
 In languages where "a" + "b" = "ab" works there could be programmers who 
 doesn't
 see that concating is much more complex than adding a couple of numbers.
 In D, this is a little better, becouse it's hard to find the concating 
 char (I
 don't have it now, becouse of an odd bug in firefox).
 In C/C++ this is better, becouse it was a func, wich indicated how hard it 
 was
 to do.
Not for std::string; that used + for string concatenation. Sure hides the implementation details.
 Some programmers may instead of using:
 writef(a,b,c)
 concate them. Wich would be very bad.
Unless you really _need_ to concatenate strings, such as to store in a new string.
Jul 01 2006
prev sibling parent Don Clugston <dac nospam.com.au> writes:
Niklas Ulvinge wrote:
 Thanks for all replies, now I understand most of what I wanted to know.
 (although the Q about the internal structure of dynamic arrays still
remains...)
 
 
 In article <op.tb03wsb06b8z09 ginger.vic.bigpond.net.au>, Derek Parnell says...
 But you as a coder don't need to worry about this because the compiler  =

 handles all the manipulation for you.
I think as 'real programmers' ;) : "Real programmers can write assembly langauge in any language" This is very hard to do in D, but really easy in C. The foreach statemente as an example. In D, the compiler handles the implementation. I want to know how it is implemented. In languages where "a" + "b" = "ab" works there could be programmers who doesn't see that concating is much more complex than adding a couple of numbers. In D, this is a little better, becouse it's hard to find the concating char (I don't have it now, becouse of an odd bug in firefox). In C/C++ this is better, becouse it was a func, wich indicated how hard it was to do.
A crucial difference between C++ and D, is that the compiler understands the concept of concatenation. This means that you can concatenate at compile time. const char [] str1 = "a"; const char [] str2 = str1 ~ "b" creates str2 as a compile-time constant. In this case the concatenation has zero run-time cost. You can use it anywhere that a compile-time string literal can be used. You couldn't do this with function calls.
 Some programmers may instead of using:
 writef(a,b,c)
 concate them. Wich would be very bad.
Jul 03 2006
prev sibling parent Hasan Aljudy <hasan.aljudy gmail.com> writes:
D strings are dynamic arrays.
The "length" does not limit anything. you can change it anytime anyhow.

char[] s = "....."; //some string

.
.
.

s.length = s.length + XYZ; //change the length of the array to whatever 
thing you like
s = "exxxaa"; //or like this

I can't think of any limitation.

P.S. Don't use printf with D. Use writef instead, and don't forget to 
import std.stdio

Niklas Ulvinge wrote:
 I could't find any info about it so I'm asking here...
 
 I just looked at D and it sounds rather interesting.
 
 Now to my Q:
 Strings in D starts with some data that is defining the length of the string.
 Why did they decide to use this aproach?
 What is this 'data' at the beginning of the string?
 
 This has a limitation, strings can't be longer than 'data' allows.
 Is there a way around this?
 
 
 An idea, I got when I wrote a dynamic array (in C), was to use s[-1] as the
size
 for array s (and s[-2] for capacity, but that isn't necesary here...).
 
 Couldn't this be used with strings?
 Then this would work:
 string s = "IDK\0";
 printf("%s",s);
 
 Niklas Ulvinge
 aka IDK wishes
 everyone happy
 programming!!!
Jul 01 2006