digitalmars.D - Why Bloat Is Still =?UTF-8?B?U29mdHdhcmXigJlz?= Biggest Vulnerability
- tim (2/2) Feb 12 2024 I thought I would get a discussion started on software bloat.
- tim (3/5) Feb 12 2024 oops forgot link to article,
- Paolo Invernizzi (4/10) Feb 12 2024 Agreed .. two days ago I needed to pull a 13GB docker image from
- H. S. Teoh (147/156) Feb 12 2024 No amount of D innovation is going to stop programmers infected with the
- Paolo Invernizzi (10/23) Feb 12 2024 Hey, at the end the title of the post is: Why Bloat Is Still
- H. S. Teoh (48/61) Feb 12 2024 I'm skeptical whether it's the biggest. There are many holes in a cheese
- M. M. (4/13) Feb 12 2024 I enjoyed reading this. I largely agree with what you said. I
- deadalnix (10/19) Feb 12 2024 "Funny" example of that.
- H. S. Teoh (34/56) Feb 12 2024 Recently while working on my minimal druntime for wasm (one primary
- monkyyy (5/22) Feb 12 2024 Nothing can stop irresponsibility completely, but you can go a
- Richard (Rikki) Andrew Cattermole (8/22) Feb 12 2024 What? Dub doesn't upgrade dependencies for you without you asking for it...
- H. S. Teoh (21/42) Feb 12 2024 And that's the point, *by default* you get the bad behaviour, you have
- Walter Bright (1/2) Feb 12 2024 Well, that'll fix it!
- deadalnix (2/5) Feb 12 2024 This software has dependencies, do you agree?
- Kagamin (15/16) Feb 13 2024 I hate bloated software too. Someone said here phobos reaches
I thought I would get a discussion started on software bloat. Maybe D can be part of the solution to this problem?
Feb 12 2024
On Monday, 12 February 2024 at 14:49:02 UTC, tim wrote:I thought I would get a discussion started on software bloat. Maybe D can be part of the solution to this problem?oops forgot link to article, https://spectrum.ieee.org/lean-software-development
Feb 12 2024
On Monday, 12 February 2024 at 15:03:01 UTC, tim wrote:On Monday, 12 February 2024 at 14:49:02 UTC, tim wrote:Agreed .. two days ago I needed to pull a 13GB docker image from Nvidia repository ... a totally out of control mess. /PI thought I would get a discussion started on software bloat. Maybe D can be part of the solution to this problem?oops forgot link to article, https://spectrum.ieee.org/lean-software-development
Feb 12 2024
On Mon, Feb 12, 2024 at 03:55:50PM +0000, Paolo Invernizzi via Digitalmars-d wrote:On Monday, 12 February 2024 at 15:03:01 UTC, tim wrote:No amount of D innovation is going to stop programmers infected with the madness of dynamic remote dependencies that pull in an arbitrary number of external modules. Potentially a different set of them every time you build. Tools like cargo or dub actively encourage this model of software development. Which is utterly crazy, if you think about it. Unless you pin every dependency to exact versions (who even does that?!), every time you build your code you're potentially getting a (subtly) different set of dependencies. That means the program you've been trying to debug 5 mins ago may not even be the same program you're debugging now. Now of course it's possible to turn off this behaviour while debugging, but still, the fact that that's the default behaviour is just nuts. Over the long term, this means that you cannot reliably reproduce older versions of your software -- because the versions of dependencies version 1.0 depended on may not even exist anymore, now that your program is at version 2.0. If your customer reports a problem, you have no way of debugging it; you can't even reproduce the exact image your customer is running anymore, let alone make any fixes to it. The only thing left to do is to tell them "just upgrade to the latest version". Which is the kind of insanity that's familiar to everyone of us these days. Nevermind the fallacy that "newer == better". Especially not in the current atmosphere of software development, where so-called "patch" releases are not patch releases at all, but full-featured new releases complete with full-fledged new, untested features (because why waste resources making a patch release + a separate new feature release, when you can just bundle the two together, save development costs, and give Marketing all the more excuse to push new features onto customers and thereby making more money). The number of bugs introduced with each "patch" release may well exceed the number of bugs fixed. All this not even to mention the insanity that sometimes specifying just *one* dependency will pull in tens or even hundreds of recursive dependencies. A hello world program depends on a standard I/O package, which in turn depends on a date-formatting package, which in turn depends on the locales package, which in turn depends on the internet timeserver client package, which depends on the crytography package, ad nauseaum. And so it takes a totally insane amount of packages just to print Hello World on the screen. Not to mention the whole concept of depending on some 3rd party code that exists on some remote server somewhere out there on the wild wild west (www) of the 'net is just crazy. The article linked below alludes to obsolete NPM / Node packages being taken over by malicious actors in order to inject malicious code into unwitting software. There's also the problem that your code is not compilable if for whatever reason you lost network connectivity. Which means if you suddenly find yourself in an emergency and have to make a small fix to your program, you won't be able to recompile it. Good luck.On Monday, 12 February 2024 at 14:49:02 UTC, tim wrote:I thought I would get a discussion started on software bloat. Maybe D can be part of the solution to this problem?[...] Reducing code size is, to paraphrase Walter, to plug one hole in a cheese grater. There are so other many things wrong with the present state of software that code size doesn't even begin to address. Today's web app scene is exemplary of the insanity in software development. It takes GBs of memory and multicore GHz CPUs to run a ridiculously complex web browser in order to be able to run some bloated web app with tons of external dependencies at the *same speed* as an equivalent lean native program in the 80's used to run on 64 KB of memory and a 16 kHz single-core CPU. What's wrong with the picture here? And don't even get me started on the IoT scene, which is a mind-bogglingly insane concept in and of itself. Why does my toaster need to run a million LoC operating system sporting an *internet connection*?! Or indeed, a *stuffed animal toy* that some well-meaning parent give my son as a "gift", that has a built-in internet interface that can be used for downloading audio clips (it's cute, it downloaded a clip of my son's name so that the toy could address him by name -- WHY OH WHY... argh). I betcha said OS running on this thing has not been updated (and isn't ever going to be) for at least 5 years, and carries who knows how many unpatched security vulnerabilities. I wouldn't be surprised if a good chunk of today's botnets consist of exploited household appliances running far too much more software than they actually require for their primary operations. Perhaps this internet-"enabled" stuffed animal is among the esteemed members of such a botnet. (Thankfully the battery has run out since -- and I'm not planning to replace it, ever. Sorry, botnet.) These are just milder examples of the IoT madness. Don't get me started on internet-enabled webcams that can be (and have been) used for far more nefarious purposes than running some script kiddie's botnet. Years ago, if somebody had told me that some random car driving by the house could hack into my babycam and make it emit a scary noise to scare the baby, I'd have laughed them out of the house as some delusive paranoid. Unfortunately, today this is actual reality, no thanks to insecure misconfigured WiFi routers whose OS haven't been updated in eons and household appliances having internet access that they have no business to. In principle, the same thing applies to Docker images that contain far more stuff than they rightly should. No thanks to these non-solutions to security issues, nowadays it's no longer enough to keep up with your OS's security patches, because patching the host OS does not patch the OSes bundled with each Docker image. And for many applications, nobody's gonna patch their Docker images (the whole reason they went the route of Docker is because they can't be bothered with actual, proper integration with their host OS, they just want to target a static known OS that works for their broken code, and therefore have zero incentive to make any changes at all now that their code works). So your host OS may very well be completely patched, but thanks to these needlessly bloated Docker images your PC still has as many security holes as a cheese grater. // And there's the totally insane concept of running arbitrary code from unknown, untrusted online sources. Javascript, ActiveX, scripting in emails, in documents, etc.. Eye-candy for the customer, completely unnecessary functionally-speaking, and an absolute catastrophe security-wise. The entire concept is flawed to begin with, and things like sandboxing, etc., are merely afterthoughts, bandages that don't actually fix the festering wound underneath. Sooner or later something will give. And the past 20 or so years of internet history proves this over and over again, to this very day. But in spite of the countless arbitrary-code execution vulnerabilities, nobody is ready to tackle the root of the problem: 3rd party code from unknown, untrusted online sources have NO BUSINESS running on my PC. But almost every major application these days are literally dying in their eagerness to run such code -- by default. Your browser, your email reader, your word processor, your spreadsheet app, just about everything, really, just can't wait to get their hands on some fresh unknown 3rd party code in order to run it at the user's expense. And the usual anemic response when a major exploit happens shows that what the security community is doing -- all they can do given the circumstances, really -- is, to quote Walter again, merely plugging individual holes in a cheese grater. // The underlying problem is that the incentives in software development are all wrong these days. Instead of incentivising code quality, security, and conservation of resources, the primary incentive is money. I.e., ship software as early as possible in order to beat your competitors, which in practice means do as little work as you can possibly get away with in order to get the product out the door. Code quality is a secondary concern (we're gonna throw it all out by next release anyway), conservation of resources is a non-issue (resources are cheap, just tell the customer to buy the latest and greatest hardware, our hardware partners will give us a kick-back for the free promotion), and security isn't even on the list. Developing software the "right" way is not profitable; questionable practices like importing millions of LoC from dynamic remote dependencies get the job done faster and leads to more profit, therefore that's what people will do. And of course, this state of incentives is good for big companies that are making huge profits off it, so they're not going to let things change for the better as long as they have a say in it. And they're the ones that are employing and paying programmers to produce this trash, so anyone who doesn't agree with them won't last very long in this career. Therefore guess what kind of code the majority of programmers are producing every day. Definitely not lean, security-conscious code. As someone once joked, the most profitable software venture is a business of two departments: virus writers and anti-virus development. Welcome to software development hell. T -- Life is complex. It consists of real and imaginary parts. -- YHLhttps://spectrum.ieee.org/lean-software-developmentAgreed .. two days ago I needed to pull a 13GB docker image from Nvidia repository ... a totally out of control mess.
Feb 12 2024
On Monday, 12 February 2024 at 17:30:23 UTC, H. S. Teoh wrote:On Mon, Feb 12, 2024 at 03:55:50PM +0000, Paolo Invernizzi via Digitalmars-d wrote:<snips>On Monday, 12 February 2024 at 15:03:01 UTC, tim wrote:On Monday, 12 February 2024 at 14:49:02 UTC, tim wrote:Hey, at the end the title of the post is: Why Bloat Is Still Software’s __Biggest__ Vulnerability Let's start to plug the biggest! :-P Long story short, the docker images was the last resource after having lost a three hours battle against PIP and conflicting dependencies, trying run 2 years old code (python ML environments sometimes is just crazy). Note that also using PIP involved GB of download, tensorflow, keras, etc[...] Reducing code size is, to paraphrase Walter, to plug one hole in a cheese grater. There are so other many things wrong with the present state of software that code size doesn't even begin to address.https://spectrum.ieee.org/lean-software-developmentAgreed .. two days ago I needed to pull a 13GB docker image from Nvidia repository ... a totally out of control mess.
Feb 12 2024
On Mon, Feb 12, 2024 at 05:48:32PM +0000, Paolo Invernizzi via Digitalmars-d wrote:On Monday, 12 February 2024 at 17:30:23 UTC, H. S. Teoh wrote:[...]I'm skeptical whether it's the biggest. There are many holes in a cheese grater; plugging each one individually will always leave you with more holes afterwards. And they are all more-or-less the same size. :-D However nobody seems willing to entertain the possibility of removing the cheese grater altogether, which would be a much better solution.Reducing code size is, to paraphrase Walter, to plug one hole in a cheese grater. There are so other many things wrong with the present state of software that code size doesn't even begin to address.Hey, at the end the title of the post is: Why Bloat Is Still Software’s __Biggest__ Vulnerability Let's start to plug the biggest! :-PLong story short, the docker images was the last resource after having lost a three hours battle against PIP and conflicting dependencies, trying run 2 years old code (python ML environments sometimes is just crazy). Note that also using PIP involved GB of download, tensorflow, keras, etcWhich is why I said that these are all just holes in a cheese grater. Conflicting dependencies and the inability to compile old code are well-known (to me) symptoms of today's model of software development. I won't go so far as to say that anything requiring GBs of downloads is inherently broken -- perhaps for some applications, large amounts of code / data *is* unavoidable. But I can't believe that the *majority* of dependencies would require such incommensurate amounts of resources. At the most I'd expect one or two specialised dependencies that might need this, not every other package in your typical online code repo. // When I was in college in the 90's code reuse was a big topic. Everyone was talking about coding for libraries so that you don't have to reinvent the wheel. Eventually that led to DLL hell in the Windows world and .so hell in the Posix world. After 30 years, people are moving away from OS-level dependencies (DLLs and shared libs) to the likes of cargo, npm, dub, and the like. However, the underlying problem of dependency hell has not been solved. I'm at the point where I'm ready to call BS on the whole concept of code reuse. So I've gradually come to the conclusion that code reuse, i.e., dependencies, is inherently evil, and should be avoided like the plague unless you absolutely have no other choice. And where it can't be avoided, it should be as shallow as possible. The best dependencies are single-file dependencies like Adam's arsd libs, where you can literally copy the file into your workspace and just compile. The second best dependency is the single package, where you copy/clone the files into some subdir in your workspace and off you go. The worst kind of dependency is the one that recursively depends on other packages. These should be avoided as much as possible, because it's here that NP-completeness and dependency hell begin, and it's here where madness like multi-GB docker images is born. Copy-pasta is oft-maligned, and I agree that it's evil when it happens within a project. But I'm at the point where I'm almost ready to declare that copy-pasta is actually good and beneficial when it happens across projects. Much better to just copy the darned code into your local repo and modify it to whatever you need it to do, than to declare a dreaded dependency that's the beginning of the slippery slope into dependency hell and the inclusion of millions of lines of code bloat into your project. T -- What are you when you run out of Monet? Baroque.
Feb 12 2024
On Monday, 12 February 2024 at 17:30:23 UTC, H. S. Teoh wrote:On Mon, Feb 12, 2024 at 03:55:50PM +0000, Paolo Invernizzi via Digitalmars-d wrote:I enjoyed reading this. I largely agree with what you said. I also agree with your later post about ideal dependencies (like single files from arsd or single packages).No amount of D innovation is going to stop programmers infected with the madness of dynamic remote dependencies that pull in an arbitrary number of external modules. Potentially a different set of them every time you build. Tools like cargo or dub actively encourage this model of software development. [...][...]
Feb 12 2024
On Monday, 12 February 2024 at 17:30:23 UTC, H. S. Teoh wrote:All this not even to mention the insanity that sometimes specifying just *one* dependency will pull in tens or even hundreds of recursive dependencies. A hello world program depends on a standard I/O package, which in turn depends on a date-formatting package, which in turn depends on the locales package, which in turn depends on the internet timeserver client package, which depends on the crytography package, ad nauseaum. And so it takes a totally insane amount of packages just to print Hello World on the screen."Funny" example of that. I wanted to learn of to do a react project from scratch. Not using a framework or anything, just pieces the stuff together to make it work myself. So babel, webpack, react, jest for testing and stylex for CSS. That's it. Arguably a lot by some standard, but by no means something wild, the JS equivalent of a build system and a test framework. The project currently has 1103 dependencies. Voila. Pure madness.
Feb 12 2024
On Mon, Feb 12, 2024 at 11:49:31PM +0000, deadalnix via Digitalmars-d wrote:On Monday, 12 February 2024 at 17:30:23 UTC, H. S. Teoh wrote:Recently while working on my minimal druntime for wasm (one primary motivation for which is to let me write D code when I absolutely can't escape the tyranny of the browser instead of having to deal with the madness that is the JS ecosystem), I noticed a lot of cruft in druntime and Phobos, stuff that got piled on because we added this or that new feature / type modifier / etc.. Code that used to be straightforward acquired layers of complexity because now we have to deal with this or that case that we didn't need to worry about before. Also, past mistakes that we're still paying for, like the ubiquity of TypeInfos in internal APIs dating from when D didn't have templates. The recent push to templatize druntime has actually been a great saver, though: I got things like array operations working without incurring the bloat of TypeInfo's thanks to the current compiler emitting template calls instead of TypeInfo-dependent static calls for said operations. I think this is a very important step to move Phobos/druntime toward a pay-as-you-use model instead of the upfront cost of TypeInfo's. If only more projects are built with the pay-as-you-use model instead of the blanket "I need this dependency, let's pull in the whole hairball of recursive dependencies too". In an ideal world, things like std.stdio would only import things like floating-point formatting code only if you actually use %f and pass a float/double to format(). Otherwise it won't even import the module and you won't pull in anything that you don't actually need. (I actually have an incomplete replica of std.format written according to this philosophy -- it doesn't even instantiate floating-point formatting code unless you actually passed a float to format() at some point. In an ideal world the whole of druntime / phobos would be built this way, tiny standalone pieces that only get pulled in with actual use. Not like the last time I checked, where a Hello World program for some inscrutable reason pulled in BigInt code into the executable.) T -- Democracy: The triumph of popularity over principle. -- C.BondAll this not even to mention the insanity that sometimes specifying just *one* dependency will pull in tens or even hundreds of recursive dependencies. A hello world program depends on a standard I/O package, which in turn depends on a date-formatting package, which in turn depends on the locales package, which in turn depends on the internet timeserver client package, which depends on the crytography package, ad nauseaum. And so it takes a totally insane amount of packages just to print Hello World on the screen."Funny" example of that. I wanted to learn of to do a react project from scratch. Not using a framework or anything, just pieces the stuff together to make it work myself. So babel, webpack, react, jest for testing and stylex for CSS. That's it. Arguably a lot by some standard, but by no means something wild, the JS equivalent of a build system and a test framework. The project currently has 1103 dependencies. Voila. Pure madness.
Feb 12 2024
On Monday, 12 February 2024 at 17:30:23 UTC, H. S. Teoh wrote:On Mon, Feb 12, 2024 at 03:55:50PM +0000, Paolo Invernizzi via Digitalmars-d wrote:Nothing can stop irresponsibility completely, but you can go a long way to simplify and encourage responsibility and sanity. A c program will have less dependencies then a js one.On Monday, 12 February 2024 at 15:03:01 UTC, tim wrote:No amount of D innovation is going to stop programmers infected with the madness of dynamic remote dependencies that pull in an arbitrary number of external modules. Potentially a different set of them every time you build. Tools like cargo or dub actively encourage this model of software development.On Monday, 12 February 2024 at 14:49:02 UTC, tim wrote:I thought I would get a discussion started on software bloat. Maybe D can be part of the solution to this problem?Reducing code size is, to paraphrase Walter, to plug one hole in a cheese grater. There are so other many things wrong with the present state of software that code size doesn't even begin to address.I dont get this medafore, surely less grates makes clogging easier
Feb 12 2024
On 13/02/2024 6:30 AM, H. S. Teoh wrote:No amount of D innovation is going to stop programmers infected with the madness of dynamic remote dependencies that pull in an arbitrary number of external modules. Potentially a different set of them every time you build. Tools like cargo or dub actively encourage this model of software development. Which is utterly crazy, if you think about it. Unless you pin every dependency to exact versions (who even does that?!), every time you build your code you're potentially getting a (subtly) different set of dependencies. That means the program you've been trying to debug 5 mins ago may not even be the same program you're debugging now. Now of course it's possible to turn off this behaviour while debugging, but still, the fact that that's the default behaviour is just nuts.What? Dub doesn't upgrade dependencies for you without you asking for it. It either has to be missing, or you ran ``dub upgrade``. To prevent that being an issue long term, you can vendor your cache into your repository. ``dub build --cache=local``. Unfortunately you have to provide that on cli every time. There are solutions here for those who care about it. If you don't care about it, of course it isn't a solved problem.
Feb 12 2024
On Tue, Feb 13, 2024 at 01:56:08PM +1300, Richard (Rikki) Andrew Cattermole via Digitalmars-d wrote:On 13/02/2024 6:30 AM, H. S. Teoh wrote:[...]And that's the point, *by default* you get the bad behaviour, you have to work to make it do the right thing. You have the put in the effort to learn to use `--cache=local` (and you have to know enough to even be aware that you might need to use it in the first place -- most people wouldn't even care 'cos they don't even know this is an issue). I'm not singling out dub here, I'm talking about the entire philosophy behind dub and similar tools. The defaults are very much designed such that you would just pull in hairball dependencies automatically without needing to give it so much as a thought. There is little, if any at all, incentive to make do with as little as possible to get your job done. On the contrary, the whole idea is very much to "buy the package deal", so to speak, download (and build and link) the entire bundle of stuff which gives you everything, bells and whistles and all, even if you actually only need to use less than 10% of it. A million LoC OS just to open the garage door, as the linked article puts it. T -- Long, long ago, the ancient Chinese invented a device that lets them see through walls. It was called the "window".Which is utterly crazy, if you think about it. Unless you pin every dependency to exact versions (who even does that?!), every time you build your code you're potentially getting a (subtly) different set of dependencies. That means the program you've been trying to debug 5 mins ago may not even be the same program you're debugging now. Now of course it's possible to turn off this behaviour while debugging, but still, the fact that that's the default behaviour is just nuts.What? Dub doesn't upgrade dependencies for you without you asking for it. It either has to be missing, or you ran ``dub upgrade``. To prevent that being an issue long term, you can vendor your cache into your repository. ``dub build --cache=local``. Unfortunately you have to provide that on cli every time. There are solutions here for those who care about it. If you don't care about it, of course it isn't a solved problem.
Feb 12 2024
The European Union has launched three pieces of legislation to this effectWell, that'll fix it!
Feb 12 2024
On Monday, 12 February 2024 at 18:45:26 UTC, Walter Bright wrote:This software has dependencies, do you agree?The European Union has launched three pieces of legislationto this effect Well, that'll fix it!
Feb 12 2024
On Monday, 12 February 2024 at 15:03:01 UTC, tim wrote:https://spectrum.ieee.org/lean-software-developmentI hate bloated software too. Someone said here phobos reaches 64000 symbol limit so can't grow larger. Wait but what functionality phobos has? It's mostly templated on top of that. I wrote an experimental library, it has allocator, bitmanip, prng, constant time hex (for lulz), collections, logging, some crypto, hashes, file i/o, json, xml, networking (tcp, ssl, async), processes and threads, runtime, abstract stream, time and stopwatch, 319KB of text in total including unittests. I wonder if I'm doing it wrong. Encrypted shadow file exchange tool compiles to 24KB, I wrote it because I wanted to send files between machines running different OSes. HTTP client is 400 lines of code, it's small enough I didn't bother to extract it into a library, just copy it on demand, mostly because it's not clear what to do with pipelining across several requests.
Feb 13 2024