digitalmars.D.learn - Using regular expressions when reading a file
- Alexander Zhirov (36/36) May 05 2022 I want to use a configuration file with external settings. I'm
- H. S. Teoh (22/41) May 05 2022 Your regex already matches the `Property = Value` pattern; why not just
- Alexander Zhirov (10/11) May 05 2022 Yes, it looks more attractive. Thanks! I just don't quite
- H. S. Teoh (7/20) May 05 2022 You don't have to. Just add a `$` to the end of your regex, and it
- Alexander Zhirov (6/9) May 05 2022 In fact, it turned out to be much easier. It was just necessary
- =?UTF-8?Q?Ali_=c3=87ehreli?= (21/33) May 05 2022 Couldn't help myself from improving. :) The following regex works in my
- Alexander Zhirov (2/22) May 05 2022 It will need to be sorted out with a fresh head. 😀 Thanks!
- forkit (24/29) May 05 2022 regex never looks right ;-)
- Alexander Zhirov (2/3) May 06 2022 Well, only if as a strict form :)
I want to use a configuration file with external settings. I'm trying to use regular expressions to read the `Property = Value` settings. I would like to do it all more beautifully. Is there any way to get rid of the line break character? How much does everything look "right"? **settings.conf:** ```sh host = 127.0.0.1 port = 5432 dbname = database user = postgres ``` **code:** ```d auto file = File("settings.conf", "r"); string[string] properties; auto p_property = regex(r"^\w+ *= *.+", "s"); while (!file.eof()) { string line = file.readln(); auto m = matchAll(line, p_property); if (!m.empty()) { string property = matchAll(line, regex(r"^\w+", "m")).hit; string value = replaceAll(line, regex(r"^\w+ *= *", "m"), ""); properties[property] = value; } } file.close(); writeln(properties); ``` **output:** ```sh ["host":"127.0.0.1\n", "dbname":"mydb\n", "user":"postgres", "port":"5432\n"] ```
May 05 2022
On Thu, May 05, 2022 at 05:53:57PM +0000, Alexander Zhirov via Digitalmars-d-learn wrote:I want to use a configuration file with external settings. I'm trying to use regular expressions to read the `Property = Value` settings. I would like to do it all more beautifully. Is there any way to get rid of the line break character? How much does everything look "right"?[...]```d auto file = File("settings.conf", "r"); string[string] properties; auto p_property = regex(r"^\w+ *= *.+", "s"); while (!file.eof()) { string line = file.readln(); auto m = matchAll(line, p_property); if (!m.empty()) { string property = matchAll(line, regex(r"^\w+", "m")).hit; string value = replaceAll(line, regex(r"^\w+ *= *", "m"), ""); properties[property] = value; } }Your regex already matches the `Property = Value` pattern; why not just use captures to extract the relevant parts of the match, insteead of doing it all over again inside the if-statement? // I added captures (parentheses) to extract the property name // and value directly from the pattern. auto p_property = regex(r"^(\w+) *= *(.+)", "s"); // I assume you only want one `Property = Value` pair per input // line, so you really don't need matchAll; matchFirst will do // the job. auto m = matchFirst(line, p_property); if (m) { // No need to run a match again, just extract the // captures string property = m[1]; string value = m[2]; properties[property] = value; } T -- "You are a very disagreeable person." "NO."
May 05 2022
On Thursday, 5 May 2022 at 18:15:28 UTC, H. S. Teoh wrote:auto m = matchFirst(line, p_property);Yes, it looks more attractive. Thanks! I just don't quite understand how `matchFirst` works. I seem to have read the [description](https://dlang.org/phobos/std_regex.html#Captures), but I can't understand something. And yet I have to manually remove the line break: ```sh ["host":"192.168.100.236\n", "dbname":"belpig\n", "user":"postgres", "port":"5432\n"] ```
May 05 2022
On Thu, May 05, 2022 at 06:50:17PM +0000, Alexander Zhirov via Digitalmars-d-learn wrote:On Thursday, 5 May 2022 at 18:15:28 UTC, H. S. Teoh wrote:You don't have to. Just add a `$` to the end of your regex, and it should match the newline. If you put it outside the capture parentheses, it will not be included in the value. T -- In a world without fences, who needs Windows and Gates? -- Christian Surchiauto m = matchFirst(line, p_property);Yes, it looks more attractive. Thanks! I just don't quite understand how `matchFirst` works. I seem to have read the [description](https://dlang.org/phobos/std_regex.html#Captures), but I can't understand something. And yet I have to manually remove the line break: ```sh ["host":"192.168.100.236\n", "dbname":"belpig\n", "user":"postgres", "port":"5432\n"] ```
May 05 2022
On Thursday, 5 May 2022 at 18:58:41 UTC, H. S. Teoh wrote:You don't have to. Just add a `$` to the end of your regex, and it should match the newline. If you put it outside the capture parentheses, it will not be included in the value.In fact, it turned out to be much easier. It was just necessary to use the `m` flag instead of the `s` flag: ```d auto p_property = regex(r"^(\w+) *= *(.+)", "m"); ```
May 05 2022
On 5/5/22 12:05, Alexander Zhirov wrote:On Thursday, 5 May 2022 at 18:58:41 UTC, H. S. Teoh wrote:Couldn't help myself from improving. :) The following regex works in my Linux console. No issues with '\n'. (?) It also allows for leading and trailing spaces: import std.regex; import std.stdio; import std.algorithm; import std.array; import std.typecons; import std.functional; void main() { auto p_property = regex(r"^ *(\w+) *= *(\w+) *$"); const properties = File("settings.conf") .byLineCopy .map!(line => matchFirst(line, p_property)) .filter!(not!empty) // OR: .filter!(m => !m.empty) .map!(m => tuple(m[1], m[2])) .assocArray; writeln(properties); } AliYou don't have to. Just add a `$` to the end of your regex, and it should match the newline. If you put it outside the capture parentheses, it will not be included in the value.In fact, it turned out to be much easier. It was just necessary to use the `m` flag instead of the `s` flag: ```d auto p_property = regex(r"^(\w+) *= *(.+)", "m"); ```
May 05 2022
On Thursday, 5 May 2022 at 19:19:26 UTC, Ali Çehreli wrote:Couldn't help myself from improving. :) The following regex works in my Linux console. No issues with '\n'. (?) It also allows for leading and trailing spaces: import std.regex; import std.stdio; import std.algorithm; import std.array; import std.typecons; import std.functional; void main() { auto p_property = regex(r"^ *(\w+) *= *(\w+) *$"); const properties = File("settings.conf") .byLineCopy .map!(line => matchFirst(line, p_property)) .filter!(not!empty) // OR: .filter!(m => !m.empty) .map!(m => tuple(m[1], m[2])) .assocArray; writeln(properties); }It will need to be sorted out with a fresh head. 😀 Thanks!
May 05 2022
On Thursday, 5 May 2022 at 17:53:57 UTC, Alexander Zhirov wrote:I want to use a configuration file with external settings. I'm trying to use regular expressions to read the `Property = Value` settings. I would like to do it all more beautifully. Is there any way to get rid of the line break character? How much does everything look "right"?regex never looks right ;-) try something else perhaps?? // ------------ module test; import std; void main() { auto file = File("d:\\settings.conf", "r"); string[string] aa; // create an associate array of settings -> [key:value] foreach (line; file.byLine().filter!(a => !a.empty)) { auto myTuple = line.split(" = "); aa[myTuple[0].to!string] = myTuple[1].to!string; } // write out all the settings. foreach (key, value; aa.byPair) writefln("%s:%s", key, value); writeln; // write just the host value writeln(aa["host"]); } // ------------
May 05 2022
On Friday, 6 May 2022 at 05:40:52 UTC, forkit wrote:auto myTuple = line.split(" = ");Well, only if as a strict form :)
May 06 2022
On Friday, 6 May 2022 at 07:51:01 UTC, Alexander Zhirov wrote:On Friday, 6 May 2022 at 05:40:52 UTC, forkit wrote:well.. a settings file should be following a strict format. ..otherwise...anything goes... and good luck with that... regex won't help you either in that case... e.g: user =som=eu=ser (how you going to deal with this ?)auto myTuple = line.split(" = ");Well, only if as a strict form :)
May 06 2022
imho, regexp is overkill here. as for me, i usually just split line for first '=', then trim spaces left and right parts.
May 06 2022