Blog: On the OS and package management

On the OS and package management
On the OS and package management
Date: 2018-Jan-28 01:25:50 EST

One of the things I've seen as a pattern in infrastructure recently is a firm decision, both from software vendors and inside big tech companies, to break from their OS vendor in how software is packaged. In the past I was bothered by this, but I've come to see an undesirable tension between existing Linux distro maintainers and users, and systems that manage this pretty well.

The theory is that if you can bundle up most userland stuff close to the app, you no longer need to care about OS upgrades. Of course, you still need to understand (and do security audits on) bundles under this regime; it's not an excuse to avoid hiring some systems programmers to make sure you're doing things sanely.

Many programming languages have had their own library managers for awhile. Perl has CPAN, Python has pip, and most languages made recently have their own. You can still find rpms or debs for some of these libraries - the bits used by the OS itself, but a lot of people use these extra managers to pull things in that are not in their OSs package repo. With some risk, in that any changes to something the OS depends on could create havoc unless the package manager provides isolation.

Far on one side of the equation is containers; thick containers provide what's essentially its own OS userland isolated from the host operating system; if you want a new package in there, you can install it because the paths and binaries are kept separate. You still need to do this work yourself though.

If you're not using containers, you might decide to package up a bunch of interpreters and essential libraries and stick them in a path alongside some system linker pagic and environment variable wrappers. Then your application is a directory bundle with a wrapper script, your "binary", and any libraries it needs. Google does this with its GRTE. It can work, although it can be awkward if you want to attach a debugger to any of this, and you need to buy into a lot of developer effort to initially setup a project and get Blaze to make those bundles. Not that bad if you're writing new Google code, but not fun if you're trying to port something complicated.

One system I recently have come to like started out as a python environment manager but grew out to manage arbitrary packages (including Go) in such environments; it's called Conda. You define the dependencies for the environment you want in a yaml file (give me tensorflow, python 3.5, tiffile, and 5 other things) and it will fetch does into an environment dir. You can then enter and leave environments using the environment manager (which tweaks things in your shell). It all works, it's smooth, and if you want to extend it (I have some extra env vars for this environment), it's documented and easy. Unless there's a good reason not to, if I were to need to design something like this for someplace I worked, I'd use it. We're using it for some research projects in my current workplace (that we want to distribute to other research groups in other institutions).