www.digitalmars.com

D Programming Language 2.0

Last update Wed Apr 11 21:24:35 2012

std.string

String handling functions. Objects of types string, wstring, and dstring are value types and cannot be mutated element-by-element. For using mutation during building strings, use char[], wchar[], or dchar[]. The *string types are preferable because they don't exhibit undesired aliasing, thus making code more robust.

License:
Boost License 1.0.

Authors:
Walter Bright, Andrei Alexandrescu, and Jonathan M Davis

Source:
std/string.d

IMPORTANT NOTE: Beginning with version 2.052, the following symbols have been generalized beyond strings and moved to different modules. This action was prompted by the fact that generalized routines belong better in other places, although they still work for strings as expected. In order to use moved symbols, you will need to import the respective modules as follows:

Symbol Comment
cmp Moved to std.algorithm.cmp and generalized to work for all input ranges and accept a custom predicate.
count Moved to std.algorithm.count and generalized to accept a custom predicate.
ByCodeUnit Removed.
insert Use std.array.insertInPlace instead.
join Use std.array.join instead.
repeat Use std.array.replicate instead.
replace Use std.array.replace instead.
replaceSlice Use std.array.replace instead.
split Use std.array.split instead.

class StringException: object.Exception;
Exception thrown on errors in std.string functions.

this(string msg, string file = __FILE__, uint line = cast(uint)__LINE__, Throwable next = null);
Parameters:
string msg The message for the exception.
string file The file where the exception occurred.
uint line The line number where the exception occurred.
Throwable next The previous exception in the chain of exceptions, if any.

deprecated immutable char[16u] hexdigits;
Deprecated. It will be removed in August 2012. Please use std.ascii.hexDigits instead.

0..9A..F

deprecated immutable string digits;
Deprecated. It will be removed in August 2012. Please use std.ascii.digits instead.

0..9

deprecated immutable char[8u] octdigits;
Deprecated. It will be removed in August 2012. Please use std.ascii.octDigits instead.

0..7

deprecated immutable char[26u] lowercase;
Deprecated. It will be removed in August 2012. Please use std.ascii.lowercase instead.

a..z

deprecated immutable char[52u] letters;
Deprecated. It will be removed in August 2012. Please use std.ascii.letters instead.

A..Za..z

deprecated immutable char[26u] uppercase;
Deprecated. It will be removed in August 2012. Please use std.ascii.uppercase instead.

A..Z

deprecated alias whitespace;
Deprecated. It will be removed in August 2012. Please use std.ascii.whitespace instead.

ASCII whitespace.

deprecated dchar LS;
Deprecated. It will be removed in August 2012. Please use std.uni.lineSep instead.

UTF line separator.

deprecated dchar PS;
Deprecated. It will be removed in August 2012. Please use std.uni.paraSep instead.

UTF paragraph separator.

deprecated alias newline;
Deprecated. It will be removed in August 2012. Please use std.ascii.newline instead.

Newline sequence for this system.

deprecated bool iswhite(dchar c);
Deprecated. It will be removed in August 2012. Please use std.ascii.isWhite or std.uni.isWhite instead.

Returns true if c is ASCII whitespace or unicode LS or PS.

int icmp(alias pred = "a < b", S1, S2)(S1 s1, S2 s2);
Compares two ranges of characters lexicographically. The comparison is case insensitive. Use XREF algorithm, cmp for a case sensitive comparison. icmp works like XREF algorithm, cmp except that it converts characters to lowercase prior to applying ($D pred). Technically, icmp(r1, r2) is equivalent to cmp!"std.uni.toLower(a) < std.uni.toLower(b)"(r1, r2).

< 0 s1 < s2
= 0 s1 == s2
> 0 s1 > s2

pure nothrow immutable(char)* toStringz(const(char)[] s);
pure nothrow immutable(char)* toStringz(string s);
Returns a C-style 0-terminated string equivalent to s. s must not contain embedded 0's as any C functions will treat the first 0 that it sees a the end of the string. I s is null or empty, then a string containing only '\0' is returned.

Important Note: When passing a char* to a C function, and the C function keeps it around for any reason, make sure that you keep a reference to it in your D code. Otherwise, it may go away during a garbage collection cycle and cause a nasty bug when the C code tries to use it.

enum CaseSensitive;
Flag indicating whether a search is case-sensitive.

pure sizediff_t indexOf(Char)(in Char[] s, dchar c, CaseSensitive cs = CaseSensitive.yes);
Returns the index of the first occurence of c in s. If c is not found, then -1 is returned.

cs indicates whether the comparisons are case sensitive.

sizediff_t indexOf(Char1, Char2)(const(Char1)[] s, const(Char2)[] sub, CaseSensitive cs = CaseSensitive.yes);
Returns the index of the first occurence of sub in s. If sub is not found, then -1 is returned.

cs indicates whether the comparisons are case sensitive.

sizediff_t lastIndexOf(Char)(const(Char)[] s, dchar c, CaseSensitive cs = CaseSensitive.yes);
Returns the index of the last occurence of c in s. If c is not found, then -1 is returned.

cs indicates whether the comparisons are case sensitive.

sizediff_t lastIndexOf(Char1, Char2)(const(Char1)[] s, const(Char2)[] sub, CaseSensitive cs = CaseSensitive.yes);
Returns the index of the last occurence of sub in s. If sub is not found, then -1 is returned.

cs indicates whether the comparisons are case sensitive.

pure nothrow auto representation(Char)(Char[] s);
Returns the representation type of a string, which is the same type as the string except the character type is replaced by ubyte, ushort, or uint depending on the character width.

Example:
string s = "hello";
static assert(is(typeof(representation(s)) == immutable(ubyte)[]));

S tolower(S)(S s);
Deprecated. It will be removed in August 2012. Please use toLower instead.

Convert string s[] to lower case.

pure @trusted S toLower(S)(S s);
Returns a string which is identical to s except that all of its characters are lowercase (in unicode, not just ASCII). If s does not have any uppercase characters, then s is returned.

void tolowerInPlace(C)(ref C[] s);
Deprecated. It will be removed in August 2012. Please use toLowerInPlace instead.

Converts s to lowercase in place.

void toLowerInPlace(C)(ref C[] s);
Converts s to lowercase (in unicode, not just ASCII) in place. If s does not have any uppercase characters, then s is unaltered.

S toupper(S)(S s);
Deprecated. It will be removed in August 2012. Please use toUpper instead.

Convert string s[] to upper case.

pure @trusted S toUpper(S)(S s);
Returns a string which is identical to s except that all of its characters are uppercase (in unicode, not just ASCII). If s does not have any lowercase characters, then s is returned.

void toupperInPlace(C)(ref C[] s);
Deprecated. It will be removed in August 2012. Please use toUpperInPlace instead.

Converts s to uppercase in place.

void toUpperInPlace(C)(ref C[] s);
Converts s to uppercase (in unicode, not just ASCII) in place. If s does not have any lowercase characters, then s is unaltered.

pure @trusted S capitalize(S)(S s);
Capitalize the first character of s and conver the rest of s to lowercase.

S capwords(S)(S s);
Deprecated. It will be removed in August 2012.

Capitalize all words in string s[]. Remove leading and trailing whitespace. Replace all sequences of whitespace with a single space.

S repeat(S)(S s, size_t n);
Deprecated. It will be removed in March 2012. Please use std.array.replicate instead.

Repeat s for n times.

S[] splitlines(S)(S s);
Deprecated. It will be removed in August 2012. Please use splitLines instead.

Split s[] into an array of lines, using CR, LF, or CR-LF as the delimiter. The delimiter is not included in the line.

enum KeepTerminator;
S[] splitLines(S)(S s, KeepTerminator keepTerm = KeepTerminator.no);
Split s into an array of lines using '\r', '\n', "\r\n", std.uni.lineSep, and std.uni.paraSep as delimiters. If keepTerm is set to KeepTerminator.yes, then the delimiter is included in the strings returned.

String stripl(String)(String s);
Deprecated. It will be removed in August 2012. Please use stripLeft instead.

Strips leading whitespace.

pure @safe S stripLeft(S)(S s);
Strips leading whitespace.

String stripr(String)(String s);
Deprecated. It will be removed in August 2012. Please use stripRight instead.

Strips trailing whitespace.

S stripRight(S)(S s);
Strips trailing whitespace.

S strip(S)(S s);
Strips both leading and trailing whitespace.

S chomp(S)(S s);
S chomp(S, C)(S s, const(C)[] delimiter);
Returns s sans the trailing delimiter, if any. If no delimiter is given, then any trailing '\r', '\n', "\r\n", std.uni.lineSep, or std.uni.paraSeps are removed.

C1[] chompPrefix(C1, C2)(C1[] longer, C2[] shorter);
If longer.startsWith(shorter), returns longer[shorter.length .. $]. Otherwise, returns longer.

S chop(S)(S s);
Returns s sans its last character, if there is one. If s ends in "\r\n", then both are removed.

S ljustify(S)(S s, size_t width);
Deprecated. It will be removed in August 2012. Please use leftJustify instead.

Left justify string s[] in field width chars wide.

@trusted S leftJustify(S)(S s, size_t width, dchar fillChar = ' ');
Left justify s in a field width characters wide. fillChar is the character that will be used to fill up the space in the field that s doesn't fill.

S rjustify(S)(S s, size_t width);
Deprecated. It will be removed in August 2012. Please use rightJustify instead.

Left right string s[] in field width chars wide.

@trusted S rightJustify(S)(S s, size_t width, dchar fillChar = ' ');
Right justify s in a field width characters wide. fillChar is the character that will be used to fill up the space in the field that s doesn't fill.

@trusted S center(S)(S s, size_t width, dchar fillChar = ' ');
Center s in a field width characters wide. fillChar is the character that will be used to fill up the space in the field that s doesn't fill.

S zfill(S)(S s, int width);
Deprecated. It will be removed in August 2012. Please use rightJustify with a fill character of '0' instead.

Same as rjustify(), but fill with '0's.

S insert(S)(S s, size_t index, S sub);
Deprecated. It will be removed in March 2012. Please use std.array.insertInPlace instead.

Insert sub[] into s[] at location index.

S expandtabs(S)(S str, size_t tabsize = 8);
Deprecated. It will be removed in August 2012. Please use detab instead.

Replace tabs with the appropriate number of spaces. tabsize is the distance between tab stops.

pure @trusted S detab(S)(S s, size_t tabSize = 8);
Replace each tab character in s with the number of spaces necessary to align the following character at the next tab stop where tabSize is the distance between tab stops.

pure @trusted S entab(S)(S s, size_t tabSize = 8);
Replaces spaces in s with the optimal number of tabs. All spaces and tabs at the end of a line are removed.

Parameters:
s String to convert.
tabSize Tab columns are tabSize spaces apart.

@safe C1[] translate(C1, C2 = immutable(char))(C1[] str, dchar[dchar] transTable, const(C2)[] toRemove = null);
@safe C1[] translate(C1, S, C2 = immutable(char))(C1[] str, S[dchar] transTable, const(C2)[] toRemove = null);
Replaces the characters in str which are keys in transTable with their corresponding values in transTable. transTable is an AA where its keys are dchar and its values are either dchar or some type of string. Also, if toRemove is given, the characters in it are removed from str prior to translation. str itself is unaltered. A copy with the changes is returned.

See Also:
tr std.array.replace

Parameters:
str The original string.
transTable The AA indicating which characters to replace and what to replace them with.
toRemove The characters to remove from the string.

Examples:
dchar[dchar] transTable1 = ['e' : '5', 'o' : '7', '5': 'q'];
assert(translate("hello world", transTable1) == "h5ll7 w7rld");

dchar[dchar] transTable2 = ['e' : '5', 'o' : '7', '5': 'q'];
assert(translate("hello world", transTable2, "low") == "h5 rd");

string[dchar] transTable3 = ['e' : "5", 'o' : "orange"];
assert(translate("hello world", transTable3) == "h5llorange worangerld");

string maketrans(in char[] from, in char[] to);
Scheduled for deprecation in March 2012.

Construct translation table for translate().

BUGS:
only works with ASCII

string translate()(in char[] s, in char[] transtab, in char[] delchars);
Scheduled for deprecation in March 2012. Please use the version of translate which takes an AA instead.

Translate characters in s[] using table created by maketrans(). Delete chars in delchars[].

BUGS:
only works with ASCII

string format(...);
Format arguments into a string.

char[] sformat(char[] s,...);
Format arguments into string s which must be large enough to hold the result. Throws RangeError if it is not.

Returns:
s

bool inPattern(S)(dchar c, in S pattern);
See if character c is in the pattern.

Patterns:
A pattern is an array of characters much like a character class in regular expressions. A sequence of characters can be given, such as "abcde". The '-' can represent a range of characters, as "a-e" represents the same pattern as "abcde". "a-fA-F0-9" represents all the hex characters. If the first character of a pattern is '^', then the pattern is negated, i.e. "^0-9" means any character except a digit. The functions inPattern, countchars, removeschars, and squeeze use patterns.

Note:
In the future, the pattern syntax may be improved to be more like regular expression character classes.

bool inPattern(S)(dchar c, S[] patterns);
See if character c is in the intersection of the patterns.

size_t countchars(S, S1)(S s, in S1 pattern);
Count characters in s that match pattern.

S removechars(S)(S s, in S pattern);
Return string that is s with all characters removed that match pattern.

S squeeze(S)(S s, in S pattern = null);
Return string where sequences of a character in s[] from pattern[] are replaced with a single instance of that character. If pattern is null, it defaults to all characters.

S1 munch(S1, S2)(ref S1 s, S2 pattern);
Finds the position pos of the first character in s that does not match pattern (in the terminology used by inPattern). Updates s = s[pos..$]. Returns the slice from the beginning of the original (before update) string up to, and excluding, pos.

Example:
string s = "123abc";
string t = munch(s, "0123456789");
assert(t == "123" && s == "abc");
t = munch(s, "0123456789");
assert(t == "" && s == "abc");
The munch function is mostly convenient for skipping certain category of characters (e.g. whitespace) when parsing strings. (In such cases, the return value is not used.)

S succ(S)(S s);
Return string that is the 'successor' to s[]. If the rightmost character is a-zA-Z0-9, it is incremented within its case or digits. If it generates a carry, the process is repeated with the one to its immediate left.

C1[] tr(C1, C2, C3, C4 = immutable(char))(C1[] str, const(C2)[] from, const(C3)[] to, const(C4)[] modifiers = null);
Replaces the characters in str which are in from with the the corresponding characters in to and returns the resulting string.

tr is based on Posix's tr, though it doesn't do everything that the Posix utility does.

Parameters:
str The original string.
from The characters to replace.
to The characters to replace with.
modifiers String containing modifiers.

Modifiers:
Modifier Description
'c' Complement the list of characters in from
'd' Removes matching characters with no corresponding replacement in to
's' Removes adjacent duplicates in the replaced characters

If the modifier 'd' is present, then the number of characters in to may be only 0 or 1.

If the modifier 'd' is not present, and to is empty, then to is taken to be the same as from.

If the modifier 'd' is not present, and to is shorter than from, then to is extended by replicating the last charcter in to.

Both from and to may contain ranges using the '-' character (e.g. "a-d" is synonymous with "abcd.) Neither accept a leading '^' as meaning the complement of the string (use the 'c' modifier for that).

bool isNumeric(const(char)[] s, in bool bAllowSep = false);
[in] string s can be formatted in the following ways:

Integer Whole Number: (for byte, ubyte, short, ushort, int, uint, long, and ulong) ['+'|'-']digit(s)[U|L|UL]

Examples:
123, 123UL, 123L, +123U, -123L

Floating-Point Number: (for float, double, real, ifloat, idouble, and ireal) ['+'|'-']digit(s)[.][digit(s)][[e-|e+]digit(s)][i|f|L|Li|fi]] or [nan|nani|inf|-inf]

Examples:
+123., -123.01, 123.3e-10f, 123.3e-10fi, 123.3e-10L

(for cfloat, cdouble, and creal) ['+'|'-']digit(s)[.][digit(s)][[e-|e+]digit(s)][+] [digit(s)[.][digit(s)][[e-|e+]digit(s)][i|f|L|Li|fi]] or [nan|nani|nan+nani|inf|-inf]

Examples:
nan, -123e-1+456.9e-10Li, +123e+10+456i, 123+456

[in] bool bAllowSep False by default, but when set to true it will accept the separator characters "," and "" within the string, but these characters should be stripped from the string before using any of the conversion functions like toInt(), toFloat(), and etc else an error will occur.

Also please note, that no spaces are allowed within the string anywhere whether it's a leading, trailing, or embedded space(s), thus they too must be stripped from the string before using this function, or any of the conversion functions.

deprecated bool isNumeric(...);
Deprecated. It will be removed in August 2012.

Allow any object as a parameter

deprecated bool isNumeric(TypeInfo[] _arguments, va_list _argptr);
Deprecated. It will be removed in August 2012.

Check only the first parameter, all others will be ignored.

char[] soundex(const(char)[] string, char[] buffer = null);
Soundex algorithm.

The Soundex algorithm converts a word into 4 characters based on how the word sounds phonetically. The idea is that two spellings that sound alike will have the same Soundex value, which means that Soundex can be used for fuzzy matching of names.

Parameters:
const(char)[] string String to convert to Soundex representation.
char[] buffer Optional 4 char array to put the resulting Soundex characters into. If null, the return value buffer will be allocated on the heap.

Returns:
The four character array with the Soundex result in it. Returns null if there is no Soundex representation for the string.

See Also:
Wikipedia, The Soundex Indexing System

BUGS:
Only works well with English names. There are other arguably better Soundex algorithms, but this one is the standard one.

string[string] abbrev(string[] values);
Construct an associative array consisting of all abbreviations that uniquely map to the strings in values.

This is useful in cases where the user is expected to type in one of a known set of strings, and the program will helpfully autocomplete the string once sufficient characters have been entered that uniquely identify it.

Example:
 import std.stdio;
 import std.string;

 void main()
 {
    static string[] list = [ "food", "foxy" ];

    auto abbrevs = std.string.abbrev(list);

    foreach (key, value; abbrevs)
    {
       writefln("%s => %s", key, value);
    }
 }
produces the output:
 fox => foxy
 food => food
 foxy => foxy
 foo => food
 

size_t column(S)(S str, size_t tabsize = 8);
Compute column number after string if string starts in the leftmost column, which is numbered starting from 0.

S wrap(S)(S s, size_t columns = 80, S firstindent = null, S indent = null, size_t tabsize = 8);
Wrap text into a paragraph.

The input text string s is formed into a paragraph by breaking it up into a sequence of lines, delineated by \n, such that the number of columns is not exceeded on each line. The last line is terminated with a \n.

Parameters:
s text string to be wrapped
columns maximum number of columns in the paragraph
firstindent string used to indent first line of the paragraph
indent string to use to indent following lines of the paragraph
tabsize column spacing of tabs

Returns:
The resulting paragraph.

S outdent(S)(S str);
S[] outdent(S)(S[] lines);
Removes indentation from a multi-line string or an array of single-line strings.

This uniformly outdents the text as much as possible. Whitespace-only lines are always converted to blank lines.

A StringException will be thrown if inconsistent indentation prevents the input from being outdented.

Works at compile-time.

Example:
 writeln(q{
     import std.stdio;
     void main() {
         writeln("Hello");
     }
 }.outdent());

Output:
 import std.stdio;
 void main() {
     writeln("Hello");
 }