digitalmars.D - string splitting funcs
- spir (40/40) Jan 22 2011 While we're at tweaking std.string:
While we're at tweaking std.string: When writing string libs or types (like Text recently), I implement 3 string splitting methods. This may --or not-- be useful for D's string module. The core point is: what to do with empty parts? They may be generated when: * the separator is present at either end of the source string * successive separators occur in the source string Thus, split("--abc-----def----", "--") basically returns ["","abc,"","def","",""] This may be or not what we expect. But why? I ended up considering there are 2 distinct use cases where we need to split a string: 1. it is like a record (fields) 2. it is like a list (elements) In the first case, we want to keep empty fields so that each field has a constant index, and sometimes empty fields are meaningful. For instance, in name--phone--email, when phone is absent, we still want email as third field. In the case of a list instead, most commonly empty elements are irrelevant, actually often due to flexibility of the grammar (not always formal). For instance, lists of words / numbers / tokens; or more simply lines: we will rarely keep blank ones for further process. This leads to 2 different string splitting funcs, eg string[] listElements (string sep) string[] recordFields (string sep) (names discussable ;-) The first func is symmetric to join. The second one may simply filter the first one's results, or instead drop empty elements on the fly. Finally, there is a third, different, use case, which may well be the most common one, and requires yet another func: string[] split (string whitespace=" \t\n") which indeed splits on any whitespace. Usually, the expected behaviour is any combination or repetition of ws chars is considered a single separator; but ws at start/end well generates an empty part. Makes sense? Denis _________________ vita es estrany spir.wikidot.com
Jan 22 2011