Ah, the grand cycle of layering and integration. You build a machine that does a thing, modify it to do another, and another, and another, until eventually someone says
We could have a layer that did all of this!
The layer is introduced, applications are rewritten to target the layer, and people slowly lose touch with what the world looks like beneath The Interface.
And some things that should not have been forgotten were lost. History became legend. Legend became myth.
As people target The Interface, it grows into a more and more general purpose machine. Layers of indirection build up. Once simple tasks must propagate up and down a big stack.
Until one day, someone stumbles across the layer beneath. Gosh, the underlying system does 99% of what we need, out of the box. Why do we need this layer at all? We can do most of what we need with a couple single purpose tools. And the rest of the complexity can be taken up by the application layer. I don't mind doing a little more configuration there if I can get a huge performance and complexity win.
And the developers, frustrated with how big and bloated their layers have been feeling, flock to this new simple tool, and they port their applications, with a little extra boilerplate and big complexity wins. And then they port another, and another. Until someone realizes
We could have a layer that did all of this!
And it might seem pointless, when I write it up in this snarky way, but it absolutely is not. This is the process by which we discover the fundamental building blocks of software. Each time we add a layer, and each time we take one away, we learn something new about what information is. I love it.
I don't think that's really the issue with Android/IOS operating systems. The issue is not that they can't be general purpose operating systems, it's that they aren't used that way. People seem to want applications that do one or two things at most and cost 1$ that they have to re-buy every 2 months.
That does not result in people writing software that is very useful for a large part of a particular field, but takes 2-3 years to write.
But the fault is in the app store and the lemon market it creates, not in the operating system.
In the Vernor Vinge books, where there is a multi-thousand-year spacefaring civilisation, there is the profession of "software archaeologist" for this digging through the layers.
Hiding something behind a layer of abstraction is, essentially, removing something from view. So, from one point of view, you're adding another layer (of abstraction), and from another point of view you're removing unnecessary detail. Same thing.
This makes sense. The server OS underneath ought to be a lot smaller than Linux or Windows. No interactive logon support, no display support, no printer support, no hot plugging support, few drivers, and no battery management. And, most importantly, few changes. The OS underneath ought to be simple enough to be installed for the life of the hardware.
The real "operating system" is the container orchestration system.
technically doesn't come by default with Linux (the kernel) anyways
> no display support
can be configured out. I'm pretty sure you can even configure out the whole TTY subsystem if you want, so you can't even use a serial port to debug your madness.
> no printer support
who still uses parallel ports? for anything else (except usblp which basically never works) you need CUPS anyways
> no hot plugging support
last I checked, hotplug is mandatory on x86 (but only for certain components, so you could configure out say USB support if you wanted)
> few drivers
sure, you can compile your own kernel if you want
> no battery management
I don't really know what "battery management" means.
> The OS underneath ought to be simple enough to be installed for the life of the hardware.
> I don't really know what "battery management" means
I assume he means "power management" (i.e. ACPI on x86). You can run the system without taking control of the ACPI hardware, but this is a terrible idea. It'll affect your ability to run the CPU properly, even assuming you don't care about power usage (and large server deployments sure do!). Not to mention all the nasty bugs and problems you'll run into because the hardware wasn't designed to run for long periods of time without ACPI active.
But would that matter, given more and more of our machines are running on a hypervisor of some sort?
I know it's almost mandatory for the base machine running on actual hardware, but given that most of the servers we use now are just virtualized, how bad would removing ACPI support be? Also, would it be useful to remove that?
Not really. You'd have to come up with your own standard for conveying configuration information and shutdown/restart at least (for this thin guest I'm guessing you wouldn't mind losing suspend). I believe you'd also lose memory hotplug (although I don't think you'd lose memory ballooning) unless you implemented that too. That's a new standard, and new code for your guest and hypervisor.
Also I remember reading something about some hypervisors using the guest's decisions with regards to CPU power states to inform the hypervisor's setting of the same on the real hardware. I'm not sure if this is implemented in mainstream hypervisors, or if it's really useful, but that's another thing you'd lose.
There are benefits to using the same components both inside and outside of the VM, just as there are benefits to using the same codebase bother on the server side and on the client side. There's less code to maintain, and you don't have to spend time worrying about compatibility.
I wouldn't mind the small inefficiency of carrying extra drivers and subsystems inside of my VM if it helps make my stack as a whole more uniform and predictable. We really don't need yet another Linux distribution.
The article is not talking about the kernel, it is talking about the whole installed system. It is saying people are making minimal distros with custom kernels for single functions.
Then you put all your dependencies on top and realise how large that is and that by bundling their dependencies with those dependencies security becomes a problem keeping everything up to date so someone comes up with the idea of sharing some of those dependencies and since they are common across multiple things they need a snappy name...hmm, sharing and re-usable how about..libraries, shared libraries in fact.
I jest but as an industry we do seem have an almost metronome-like tendency to lurch from one thing and back.
This was already invented, it's called z/VM. You have CP, which is a hypervisor (for any operating system), and CMS, which is a single-user OS that handles talking to the user.
This seems like another point on the divergence from the traditional security model. In the 70s, the software on a computer was entirely controlled by the system administrator; the software was presumed secure and the threat was from the users. Users needed to be partitioned. In the present day, there's only one user who is also the system administrator, but the software is the threat to itself and others.
Yes! This is an argument I've been making for several years now: the user-centric security model is obsolete and unhelpful, because most computers have either one or zero users in the traditional sense. The whole unix-style reduced-privileges-plus-sudo approach that Windows and MacOS have copied is a nuisance which doesn't really solve the problem; the permissions systems in Android and iOS are a little closer to the mark. Qubes is a good step forward. The real problem is exactly what you said: we can't fully trust software, even software which is not explicitly malicious, because software can be exploited, and because software authors sometimes want to be "helpful" in ways we'd really rather they weren't.
Every piece of software should run within a sandbox, and the human user should have complete control over which resources are or are not exposed to each sandbox; that's the future operating system I want to see. I did some exploration around the idea of doing this with hypervisors and unikernels (http://www.github.com/marssaxman/fleet) but it got to look too much like rewriting all the software in the world. Containers are less elegant, but seem to be a more practical way of moving in the right direction.
Windows NT didn't copy the Unix security model. Indeed, people have remarked for the past 30 years that the Windows NT model has nothing like set-UID, for starters. People who think that the Windows NT model is like the Unix one, do not know Windows NT. There are some very interesting differences, including:
* Variable-length security IDs, with a much larger namespace.
* The ability of a process to employ multiple tokens during the course of its existence, including kernel-enforced restrictions on how it can employ tokens borrowed from service clients.
* Nonce security IDs, employed (for example) to partition a single user from xyrself when it comes to granting access to window stations and desktops.
* Universal security IDs for truly universal things, such as "Everyone" and "Creator Group", that are the same everywhere. "nobody" is one UID on FreeBSD and Debian, and a different UID on OpenBSD.
* Local security IDs for local things, such as all security IDs for domain user accounts incorporating the unique ID of the domain (controllers) making the domain user account SIDs globally unique. Two BSD/Linux systems can have two different users with the same UID 1001.
* Universal RIDs for things that every domain has, like "this domain's printer operator", "this domain's policy administrator", and "this domain's backup operator". FreeBSD's/OpenBSD's "operator" has the UID of Debian's "bin"; FreeBSD's/OpenBSD's "bin" has the UID of Debian's "sys"; FreeBSD's/OpenBSD's "operator" group has the GID of Debian's "tty" group; FreeBSD's/OpenBSD's "tty" group has the GID of Debian's "adm" group.
* Standard security IDs for granting permissions to log on locally, via a network, in batch, as a service, or via dialup.
* A Hurd-like mechanism for obtaining process tokens: an RPC transaction to a distinguished server process inside the TCB -- /hurd/passwd meet Local Security Authority Subsystem Service
I have very limited experience with Windows and don't claim to know much of anything about its security. I was referring to the User Account Control system introduced in Windows 7. Whatever might be happening underneath, the system appears to end up doing the same thing: the user operates in a reduced-privileges mode until specifically authorizing a local, temporary elevation.
You exaggerate, but only slightly. Whatever the theoretical properties of the windows security model, it's a failure of usability, and that means most of the time none of it gets used.
Windows is not alone in this, SELinux suffers from exactly the same problem.
The user-centric model is actually still relevant if you use them as the means to software isolation. This is a major aspect to the security models of OpenBSD and (last I checked) Android, and is generally effective (unless you're deliberately subverting it, as is common on "rooted" Android devices).
However, said model still has connotations of a specific actual user, which is no longer entirely accurate in such applications of that model. It'd be nice to have a sort of "subuser" system where - within, say, the user for my own desktop account - I could further divide the software I run into "users" for things like Firefox or Spotify or what have me.
Basically, I'm in agreement that everything should be sandboxed. We have the technology to do it, and in fact have had the technology to do it for decades (maybe not quite as well as we can do now, but confining daemons to their own users has been possible for a long time).
My own dream system would be one where every "package" for my operating system is a filesystem image with `/bin`, `/lib`, `/etc`, and possibly `/var`. One of these packages would provide the root filesystem with a microkernel and the minimum supporting libraries and executables required to get a container-oriented `init` equivalent running; then, `init` would spin up each service in complete isolation by spinning up `chroot`s or something with various packages union-mounted on top of one another. One of these services could be for a graphical login, in which case said service would spin up another isolated container of sorts for my login session, and inside that I could run applications built up from the same union-mount approach with the same sort of isolation.
Plan 9 From Bell Labs is probably the closest thing to that ideal world.
> My own dream system would be one where every "package" for my operating system is a filesystem image with `/bin`, `/lib`, `/etc`, and possibly `/var`.
Take a look at NixOS and the Nix package manager. It works along these lines [0].
> then, `init` would spin up each service in complete isolation by spinning up `chroot`s
Systemd employs container features, some automatically like cgroups and some via configuration. There was an article on HN just the other day about some of its new sandbox features [1]. Parameters such as PrivateNetwork=, ProtectSystem=strict, PrivateTmp=yes, SystemCallFilter= provide chroot-like capabilities in a number of different namespace dimensions. It has support for chroot as well through the RootDirectory= parameter. See [2] for a good overview of these featuers and [3] for the reference.
Hopefully as these features mature and gain adoption, system software will operate more like what you're describing over time.
However, said model still has connotations of a specific actual user, which is
no longer entirely accurate in such applications of that model. It'd be nice to
have a sort of "subuser" system where - within, say, the user for my own desktop
account - I could further divide the software I run into "users" for things like
Firefox or Spotify or what have me.
This way Firefox is run as a separate user within the current user's session, with all that entails. Of course, that would be a serious pita for your average user to setup.
> Every piece of software should run within a sandbox
I sort of agree with this but I think more in terms of projects/activities (e.g. casual media consumption, banking, working on different projects) rather than applications. To do different activities I need different combinations of tools but often the tools overlap. For example I want to sandbox the browser I use for watching cat videos on youtube from the browser I use for banking from the various browsers I use for work. So that some drive by on a casual browsing site can't attack my banking. But I don't want two browsers installed I want the same program in the same version with some common and some different config - but I don't want to manage the common config in 5 different places.
The multi-user model allows for some attempt at this if one sets up multiple user accounts for the different activities. It certainly isn't sufficient but I'm not sure that sandboxing each application will actually get us there either.
I've been thinking about this many times before and while I want to like the idea, I always come to the conclusion that multi user setups sharing "some" data easily becomes too complex for users to handle.
Your example of the browser for example, how do you know what is common config? What if changing a common config in cat-mode exposes a security vulnerability in bank-mode? And are you really going to log out of cat mode into common mode just to change a small setting?
Sharing files is another one of my favorites, say for example that me and my brother like the same music and share one computer with different log ins. We want to save space on the hard drive so we share all our mp3 with each other. But I might have some music or recordings I dont want to share, how should I handle that.
Windows XP tried very hard to get this working, for example your desktop and start menu was a blend of what was installed as a shared application (remember "install for all users/only for me"?) and what was installed for one user. Later if someone renamed things on what they thought was their desktop it would be renamed on everybody else's desktop also, it was very confusing even for me as a developer who knows how it works. Windows also has the concept of a dedicated shared folder and possibility to share additional folders, sounds so simple but I've never seen anyone set it up smoothly. Everything looks so good and simple on paper but requires just a tiny bit of user education and that's where the whole concept falls.
% Every piece of software should run within a sandbox
What about shared libraries? How do you define the access rights or set of processes that glibc is allowed to run with? (you might have to break up glibc into dozens of smaller libraries for that - and then manage their dependencies, the resulting web will not be too different from monolithic glibc)
Also it does not solve the problem completely (not possible to do that with security anyway) - see rowhammer exploits.
There's a faction trying to kill shared libraries. Nobody outside mobile really cares anymore. Go and Rust compile to giant shared blobs. The GTK people don't care about binary compatibility anymore. Musl doesn't support dlclose.
I don't think Musl's lack of dlclose should cause any issues in real world use. The rationale is:
Either behavior conforms to POSIX, but only the musl behavior can satisfy the robustness conditions musl aims to provide. In particular:
Under glibc's approach, libraries not designed with dlclose in mind (which may not be the libraries directly loaded with dlopen, but rather their dependencies) may leave around references to themselves in such a way that removing them from the address space results in a crash (or worse) later on. This cannot happen under musl.
Managing storage for thread-local objects is much more difficult if dlclose unloads libraries. If space is to be reserved in advance (needed to guarantee no late/unrecoverable failures), supporting unloading in dlclose seems to necessitate leaking TLS memory, which largely defeats the purpose of doing the unloading in the first place.
More people complain that statically linked binaries in Musl cannot call dlopen, which some things rely on.
What really bugs me about Musl's refusal to support dclose is that the rationale is that _other_ libraries don't suppose safe unloading, so Musl shouldn't let you try. I don't like my libc to be paternalistic.
If Windows can do leak-free TLS with FreeLibrary, Musl can too. I'm sick of hearing "X is impossible" when some other system has been doing X for decades.
Ok, in this event you need to recompile all services when a vulnerability is discovered in one of the dependent libraries; i suspect that this means that the system will not get updated in time - and that's not good at all. At least with shared libraries you can swap the so file - if the external interface has not been modified by the fix.
A counter argument to this is recompilation should not be painful: it should be standard.
And this is already somewhat true anyway: binary package distributions are a little like this. Someone else handles the compiling.
So take it further: distribute binary diffs for packages to control file size, kill shared libraries dead. Have the compiler write dedupe-friendly executables so your filesystem removes duplicate blocks in binaries efficiently and let the kernel page sharing system sort out runtime.
That essentially means a universal i/o between all programs ever built but at a high level. All up to user discretion on how they want to wire up all their "programs".
I dunno, seems more like the issue is that the outside attacker's path has gone from "dumb" to "smart".
Where before access was done via a limited terminal protocol, these days it is done via a much richer protocol (or stack of protocols, what with tcp carrying http carrying json or some such).
I don't think it was ever that local software was presumed secure. After all you wanted as little as possible to be suid, and if it was it should be as small a piece of code as possible to limit the chance of exploitable bugs.
There is ample evidence that we cannot write software that can be installed, run and updated in a simple and secure manner. Software is too complex with too many dependencies. Hackers are are better than us almost every time -- and once in the can bounce around at will.
I just want to run my own containers or those written by others on commodity services from Amazon, Google or Microsoft. Or even spin up my own environment if it suits me.
I want to be able to install and remove software quickly, completely and whenever I want. I especially want to be able to install software I don't fully trust with no fear of consequences. I want to be able to install multiple versions and even multiple instances.
I want my PC (Mac/Linux/Windows) to be a thin hypervisor with everything in a container, including IO and UI.
I still want open source projects on GitHub that I can fork and contribute changes to at will.
I want containers to be small and ultra simple.
All of this means that my "OS" only needs to manage containers and resources. All traditional elements sould be in their own container such as logging, authentication, window managers.
Replace "containers" with "virtual machines" (because really, the containers are secure only until the next kernel exploit, which happens fairly often), and you get Qubes OS:
https://www.qubes-os.org/doc/architecture/
It tries to move everything into containers, including hardware drivers, and if you read their architecture paper you will see their design ends up being nontrivial.
You seem to be implying that containers are somehow immune to hackers...
Truth is, it is relatively easy to keep a machine secure. Just not if you want it to be a convenient machine.
You think you want containers all isolated from each other. Except for the uncountable times and ways you want to exchange data between them. This, at the least, includes the data of who you are, as the user.
I want to be able to install and remove software quickly, completely and whenever I want. I want to be able to install multiple versions and even multiple instances.
sounds like nixos might have some of the properties you want, without containerizing everything
I don't really agree (I'm one of the developers of ocid / cri-o). Even with a container-manager-as-init you still need to administer your control plane (updates, configuration, so on). Now, you could go the CoreOS way of creating an "administration 'container'" which bindmounts / to somewhere inside the container -- that works but I would still consider it to not be any different from ssh-ing into the machine as root (it removes the need for sshd and means that you can manage the server through the same tooling you manage containers with, but the purpose is the same -- give you a general purpose OS environment you can use to administer things).
Not to mention that your actual containers will still contain general purpose OS images.
Also, as an aside, cri-o would be a very bad choice for PID 1. It's specifically designed so that you don't need it to be a long-running daemon (which is something that Docker cannot do and is actually a very useful feature that we hope to keep). And you can't have your PID 1 just exit whenever it wants to.
Even rancher no longer uses container manager as pid 1; these systems being discussed are minimal not single binary. (Only systemd is putting some "container management" features into pid 1).
My point was that even with everything on a system running in a container, you still need a general-purpose OS inside the container to do management. Not that in particular that a single-binary container-manager-as-init was the only case where that's true.
machinectl and systemd-nspawn. Namely, systemd now manages container processes if you ask it to. My experience with the systemd cgroup handling doesn't fill me with much faith on this topic.
How do you think Docker adds unneeded complexity? As far as I can see it simplifies a non-trivial task (automated deployment).
A while ago I discovered that I can build my app (Haskell) as a Docker image, using the "stack" build tool. As far as I can see, Docker adds value by providing a layer of abstraction that makes building deployment images as easy as building the executable.
I feel that author is very enthusiastic, but there are some pretty serious errors:
"... fight between Docker and systemd is inevitable" -- this fight has been going for a while (at least a year): https://lwn.net/Articles/676831/
"... The reason why you'll do this, rather than compose everything yourself, is compatability. Whether it's kernel versions, file system drivers, operating system variants or a hundred variations that make your OS build different from mine. Building and testing software that runs everywhere is a sisyphean task. " -- kernel versions, file system drivers, operating system variants (in form of docker daemon version) are still going to be around and would still affect containers. So you traded "test on Fedora, RHEL, Ubuntu" with "test on AWS docker, Tectonic, OpenShift". Yes, you no longer have to worry about .so versions, but there are still plenty of reasons to test in multiple environment.
"... the operating system is an implementation detail of the higher level software. It's not intended to be directly managed ..." -- yeah, so I have this proprietary SAN.. or a dual 10G cards which need to be tuned and bonded... or a mix of fast and slow disks... or even any non-trivial RAID setup... Those things are the most annoying parts of managing your own servers and unfortunately they are not going anywhere.
There is a proliferation of OS-native hypervisors (Windows 10, macOS, FreeBSD, OpenBSD, Linux) which use hardware virtualization to isolate workloads which may be VMs, containers, unikernels, apps or even processes.
There is a monumental amount of "faster horse" in modern OS deployment. Hypervisors, and their ugly cousin - secure monitors, are an example of Conway's Law. They exist because there are different groups involved in writing the kernel, user land, and server deployment.
The author seems to feel that everything will concentrate at the hypervisor level - kernel and user land just a "detail" as they are single-purpose. However, you should spot that really the kernel and userland can be compressed into one layer. So why are there 3 layers at all? Then we have kernel and user land again and hey we just came full circle.
Again, it's just a reflection of how the organizations are arranged. Given a single organization with authority over the entire stack, it would be a terrible waste to have that 3 layer stack.
Wise man once said the only thing new is the history you don't already know. The process is a perfectly good unit of distribution, does not require orchestration, integrates nicely with PID1, leverages OS services for getting work done, can talk over the network, start other processes, etc. The reason containers have taken a foothold is not because the process is not a good unit of abstraction and distribution. The reason is dependencies have gotten out of hand.
Ask anyone that deploys fat jars to run on the JVM. They've been leveraging containers for ages now. Ask the golang folks how they like deploying single binaries. The OS is not going away. Better process isolation is the future so I'm gonna say the author has a slightly too futuristic view of things. The current container ecosystem and the churn around it is a half-measure for better process management.
The history of virtualization has shown us that it is far easier to containerize existing software than rewrite it to support strict isolation. Most OSes (sans macOS) aren't really setup for sandbox-by-default either.
The mainframe industry already went through this years ago. They invented virtualization and leveraged it to transition through major new CPU architectures, complete re-designs of IO stacks, and switching out entire operating systems. None of these are new arguments. Virtualization won then and it will win again.
You could say the entire history of the industry is increasing levels of abstraction; containers and virtualization are just another step along that path.
The day I moved from C++ back to Java and other languages with richer runtimes at work, was when I stopped caring about the underlying OS.
A JEE container, is already a full OS adding the remaining services that a plain JSE installation might still lack, it doesn't need yet another layer to waste resources and add more administration work in maintenance and security.
I write in C++ and I do not care about the underlying 98% of the time anyway. That 2% of the time I might write something that needs a kernel call, but I will make a class or some other abstraction that represents it and put all my different implementations in there.
For code that cares about word size or something similarly pervasive but platform dependent there are templates and constepxr to have the compiler evaluate things. I won't be putting constants like 32 or 64 in my I will do things like "align_to(system_details::cache_line_size)" and let my compiler handle the details.
Which is kind of true if you can stick with the standard library and control which compiler gets used.
Back in those days, we were deploying code across heterogeneous OSes (not all POSIX), using the OS vendor provided compilers, which were still catching up with C++98, let alone C++03.
So this really restricts how much you can make use of the standard library and which third party libraries to use in a portable way.
I covered exactly this. I write code for windows/Linux/Mac OS X. Write a class to contain those ifdef. I did this before C++11 as well.
This isn't a new technique, but for some reason people adopting C++11 and C++14 seem more willing to do it than people who wish C++ was really just C with classes.
Also, write unit tests to test the class. Run the Tests in CI on every platform and a variety of configs. It is not hard and there great free or cheap tools Like Jenkins, TravisCI and Appveyor.
> This isn't a new technique, but for some reason people adopting C++11 and C++14 seem more willing to do it than people who wish C++ was really just C with classes.
This was exactly part of the problem.
Most of the enterprise code I used to deal with was not even that, rather compiling C with C++ compiler.
However my point with rich runtimes was that you shouldn't bother to even do that, the runtime is the OS, kind of.
You can treat your minimum version of C++ as its own language if you want. Then a platform not supporting it is the same thing as a platform that doesn't support your chosen rich runtime.
Nothing forces you to start using non-portable code to reach more devices. It's strictly an option, and you can emulate a rich runtime by always answering "no".
Processes? Seriously if there's only a single one, why would you waste resources on process management and process information? Just get over it ;) http://unikernel.orghttps://mirage.io :D
The end of era, yes. But not what you think. Linux, *BSD and Android (Linux) are in so many devices. And new general purpose operating systems like Google Fuchsia are in the pipeline. It's the end of old closed OS, their time is over
We could have a layer that did all of this!
The layer is introduced, applications are rewritten to target the layer, and people slowly lose touch with what the world looks like beneath The Interface.
And some things that should not have been forgotten were lost. History became legend. Legend became myth.
As people target The Interface, it grows into a more and more general purpose machine. Layers of indirection build up. Once simple tasks must propagate up and down a big stack.
Until one day, someone stumbles across the layer beneath. Gosh, the underlying system does 99% of what we need, out of the box. Why do we need this layer at all? We can do most of what we need with a couple single purpose tools. And the rest of the complexity can be taken up by the application layer. I don't mind doing a little more configuration there if I can get a huge performance and complexity win.
And the developers, frustrated with how big and bloated their layers have been feeling, flock to this new simple tool, and they port their applications, with a little extra boilerplate and big complexity wins. And then they port another, and another. Until someone realizes
We could have a layer that did all of this!
And it might seem pointless, when I write it up in this snarky way, but it absolutely is not. This is the process by which we discover the fundamental building blocks of software. Each time we add a layer, and each time we take one away, we learn something new about what information is. I love it.