digitalmars.D.announce - Breaking news: std.uni changes!
- Richard (Rikki) Andrew Cattermole (46/46) Dec 24 2022 Hello one and all on this merry of all days!
- Dom Disc (7/9) Dec 25 2022 Hurray!
- Robert Schadek (1/1) Dec 26 2022 Awesome work, thank you
- Walter Bright (1/1) Dec 26 2022 A big thank you!
- Dukc (10/17) Dec 27 2022 This is a big service for us at Symmetry. Getting Unicode support
- Richard (Rikki) Andrew Cattermole (9/17) Dec 27 2022 I had no idea that this was becoming an issue for you guys. It wasn't in...
- Dukc (7/26) Jan 02 2023 (Sorry for the late answer)
- Richard (Rikki) Andrew Cattermole (8/11) Jan 02 2023 I've done word break, "lazy" normalization (so can stop at any point),
- H. S. Teoh (7/21) Jan 02 2023 Is there a way to make these tables pay-as-you-go? As in, if you never
- Richard (Rikki) Andrew Cattermole (5/8) Jan 02 2023 This should already be the case. I saw some stuff involving Rainer 10
- Adam D Ruppe (4/6) Jan 03 2023 I said this on the discord chat but you should really just
- Richard (Rikki) Andrew Cattermole (3/10) Jan 03 2023 Ideally. We still need an implementation for CTFE though. Its just a lot...
- Dukc (8/15) Jan 03 2023 Can't wait to see them in master!
- Richard (Rikki) Andrew Cattermole (5/24) Jan 03 2023 I probably won't be adding any new features to std.uni. Only finishing
Hello one and all on this merry of all days! Today unfortunately I bring all but joy. For std.uni has had a bout of work! - Unicode tables have been updated to 15 from 6.2 (and with that the generator is now in Phobos!). - Unicode categories C aka Other have been brought in line with TR44 specification. E.g. ``unicode.C``. In both cases if you use std.uni directly or indirectly (say std.regex), you may find yourself with code breakage on next release. If you do find yourself with problems, first check that you are not referencing the C category, if you are, here is some code to mitigate your circumstance however it would be better to prevent such need. ```d property auto loadPropertyOriginal(string name)() pure { import std.uni : unicode; static if (name == "C" || name == "c" || name == "other" || name == "Other") { auto target = unicode.Co; target |= unicode.Lo; target |= unicode.No; target |= unicode.So; target |= unicode.Po; return target; } else return unicode.opDispatch!name; } ``` Lastly, the tables updating have already brought much joy to MIR, with a broken test. A character that was being tested wasn't allocated in 6.2 but was in 7 therefore results were different. If your test suite is not part of the Phobos runners, please be aware that once you update you may experience failed tests. These are not avoidable due to external specification its based upon. However in even worse news the table generator was not kept in a working condition in the last 10 years, so there is a chance that something may have been missed. In all cases, please do contact me if you need assistance. I'm available on Discord, OFTC #d and of course N.G. or even email if you really need it (firstname lastname.co.nz). --- Happy holidays to those that are currently enjoying them or about to!
Dec 24 2022
On Saturday, 24 December 2022 at 21:26:40 UTC, Richard (Rikki) Andrew Cattermole wrote:- Unicode tables have been updated to 15 from 6.2 (and with that the generator is now in Phobos!).Hurray! Whatever problems this may cause, its problems in very very outdated code that would already need an overhaul, so what. But it's super to have finally tables that are (at least now) up to date!
Dec 25 2022
On Saturday, 24 December 2022 at 21:26:40 UTC, Richard (Rikki) Andrew Cattermole wrote:Hello one and all on this merry of all days! Today unfortunately I bring all but joy. For std.uni has had a bout of work! - Unicode tables have been updated to 15 from 6.2 (and with that the generator is now in Phobos!). - Unicode categories C aka Other have been brought in line with TR44 specification. E.g. ``unicode.C``.This is a big service for us at Symmetry. Getting Unicode support up to date was needed, we would have had to switch libraries at some point or update it ourselves. But now, nothing to do except perhaps dealing with a bit of breakage. Thank you! I see it's not quite Unicode 15 though. `graphemeStride` does not take Emoji sequences and prepend characters into account. I'm going to contribute a bit now since it's holiday, and this is a good task for me. PR coming soon unless I run into issues!
Dec 27 2022
On 28/12/2022 12:13 AM, Dukc wrote:This is a big service for us at Symmetry. Getting Unicode support up to date was needed, we would have had to switch libraries at some point or update it ourselves. But now, nothing to do except perhaps dealing with a bit of breakage. Thank you!I had no idea that this was becoming an issue for you guys. It wasn't in any of the meeting notes and I haven't seen it brought up anywhere. So if there is anything more like this, please talk about it!I see it's not quite Unicode 15 though. `graphemeStride` does not take Emoji sequences and prepend characters into account. I'm going to contribute a bit now since it's holiday, and this is a good task for me. PR coming soon unless I run into issues!Yeah, there will be tons of small stuff currently missed out due to such a big jump and of course ping me rikkimax, when you have something to review. Loads of other work available such as culling all the version specific information out of the docs :)
Dec 27 2022
(Sorry for the late answer) On Wednesday, 28 December 2022 at 00:10:36 UTC, Richard (Rikki) Andrew Cattermole wrote:On 28/12/2022 12:13 AM, Dukc wrote:Yes, I should have done that.This is a big service for us at Symmetry. Getting Unicode support up to date was needed, we would have had to switch libraries at some point or update it ourselves. But now, nothing to do except perhaps dealing with a bit of breakage. Thank you!I had no idea that this was becoming an issue for you guys. It wasn't in any of the meeting notes and I haven't seen it brought up anywhere. So if there is anything more like this, please talk about it!Other things coming to mind: Bidirectional grapheme iteration, Word break and line break algorithms, lazy normalisation. Indeed, lots of improvement potential.I see it's not quite Unicode 15 though. `graphemeStride` does not take Emoji sequences and prepend characters into account. I'm going to contribute a bit now since it's holiday, and this is a good task for me. PR coming soon unless I run into issues!Yeah, there will be tons of small stuff currently missed out due to such a big jump and of course ping me rikkimax, when you have something to review. Loads of other work available such as culling all the version specific information out of the docs :)
Jan 02 2023
On 03/01/2023 10:24 AM, Dukc wrote:Other things coming to mind: Bidirectional grapheme iteration, Word break and line break algorithms, lazy normalisation. Indeed, lots of improvement potential.I've done word break, "lazy" normalization (so can stop at any point), and lazy case insensitive comparison with normalization. But: Bidirectional grapheme iteration makes my eye twitch lol. My main concern for adding new features is increasing the size of Phobos binary for the tables. Most people don't need a lot of these optional algorithms, but they do need things like casing to work correctly (which makes increased size worth it).
Jan 02 2023
On Tue, Jan 03, 2023 at 05:13:53PM +1300, Richard (Rikki) Andrew Cattermole via Digitalmars-d-announce wrote:On 03/01/2023 10:24 AM, Dukc wrote:Is there a way to make these tables pay-as-you-go? As in, if you never call a function that depends on a table, it would not be pulled into the binary? T -- They say that "guns don't kill people, people kill people." Well I think the gun helps. If you just stood there and yelled BANG, I don't think you'd kill too many people. -- Eddie Izzard, Dressed to KillOther things coming to mind: Bidirectional grapheme iteration, Word break and line break algorithms, lazy normalisation. Indeed, lots of improvement potential.I've done word break, "lazy" normalization (so can stop at any point), and lazy case insensitive comparison with normalization. But: Bidirectional grapheme iteration makes my eye twitch lol. My main concern for adding new features is increasing the size of Phobos binary for the tables. Most people don't need a lot of these optional algorithms, but they do need things like casing to work correctly (which makes increased size worth it).
Jan 02 2023
On 03/01/2023 6:13 PM, H. S. Teoh wrote:Is there a way to make these tables pay-as-you-go? As in, if you never call a function that depends on a table, it would not be pulled into the binary?This should already be the case. I saw some stuff involving Rainer 10 years ago who helped improve it along these lines. The main concern would be shared libraries, which Phobos should be able to be distributed as on all platforms by all compilers.
Jan 02 2023
On Tuesday, 3 January 2023 at 05:23:55 UTC, Richard (Rikki) Andrew Cattermole wrote:The main concern would be shared libraries, which Phobos should be able to be distributed as on all platforms by all compilers.I said this on the discord chat but you should really just dynamic load the system icu if it is available.
Jan 03 2023
On 04/01/2023 2:58 AM, Adam D Ruppe wrote:On Tuesday, 3 January 2023 at 05:23:55 UTC, Richard (Rikki) Andrew Cattermole wrote:Ideally. We still need an implementation for CTFE though. Its just a lot of work to shoehorn it in now.The main concern would be shared libraries, which Phobos should be able to be distributed as on all platforms by all compilers.I said this on the discord chat but you should really just dynamic load the system icu if it is available.
Jan 03 2023
On Tuesday, 3 January 2023 at 04:13:53 UTC, Richard (Rikki) Andrew Cattermole wrote:On 03/01/2023 10:24 AM, Dukc wrote:Can't wait to see them in master!Other things coming to mind: Bidirectional grapheme iteration, Word break and line break algorithms, lazy normalisation. Indeed, lots of improvement potential.I've done word break, "lazy" normalization (so can stop at any point), and lazy case insensitive comparison with normalization.But: Bidirectional grapheme iteration makes my eye twitch lol.I did write a reverse grapheme iterator for Symmetry. It isn't fit for Phobos as-is since it only accepts UTF-8 strings (not other ranges) and is modeled after the Phobos grapheme walker, not the 15.0 standard. But I could ask for permission to give it to you if it'd help.
Jan 03 2023
On 04/01/2023 2:51 AM, Dukc wrote:On Tuesday, 3 January 2023 at 04:13:53 UTC, Richard (Rikki) Andrew Cattermole wrote:I probably won't be adding any new features to std.uni. Only finishing off the things that annoy me and reviewing other peoples work. I've got enough on my plate just building my own "standard library" https://github.com/Project-Sidero/basic_memory :)On 03/01/2023 10:24 AM, Dukc wrote:Can't wait to see them in master!Other things coming to mind: Bidirectional grapheme iteration, Word break and line break algorithms, lazy normalisation. Indeed, lots of improvement potential.I've done word break, "lazy" normalization (so can stop at any point), and lazy case insensitive comparison with normalization.But: Bidirectional grapheme iteration makes my eye twitch lol.I did write a reverse grapheme iterator for Symmetry. It isn't fit for Phobos as-is since it only accepts UTF-8 strings (not other ranges) and is modeled after the Phobos grapheme walker, not the 15.0 standard. But I could ask for permission to give it to you if it'd help.
Jan 03 2023