I'm sorry, but shell scripts *suck* as a language for booting the system. You ne...

thaumaturgy · on Aug 15, 2012

I'm a daily laptop linux user, specifically Chakra Linux, http://chakra-linux.org/.

My wish list goes something like this: better wireless drivers, improved sleep/suspend/hibernate/resume, better power management, a better package manager, more up-to-date applications ...

At the very, very bottom of that list -- the very last item, so far at the bottom of the list that it's in danger of falling off entirely -- is "faster boot times".

Dredmorbius is spot on, at least for me and my daily usage and the couple dozen or so servers that I'm responsible for. If things are so pooched that I have to reboot it, then it doesn't really matter to me anymore whether it takes 30 seconds or a minute to start up. I would much prefer not having to reboot it in the first place.

Since Chakra was (I think) forked from Arch Linux, I'll have to check and see if they're gonna do this too.

I hope not.

(edit: none of this is intended as a criticism of Chakra's development team, who have been doing an amazing job of putting together a system that, despite its warts, I genuinely enjoy using every day.)

rogerbinns · on Aug 15, 2012

Note that a better way of saying things is that systemd deals with state changes a lot better. Booting is one big state change, but when you have a laptop you go through a heck of a lot of other state changes (suspend, hibernate, resume), docking, connectivity changes (eg wifi coming and going), storage added and removed etc. You may let other people use your system (more state changes).

systemd can also ensure that only services you use actually use get started. For example printing is done as a server on Linux (cups) so systemd can ensure it doesn't start until you need it. This reduces power consumption.

Because of the way systemd manages services it can also do a better job of isolating them and dealing with unexpected issues. For example if the print server crashes, or someone attacks it while in Starbucks you'll be better off. (Its chrooting is easier to use, as well as the way things are put into control groups.)

All the things you list require developer time and attention. If systemd lets developers spend less time on startup scripts, then they will have more time to devote to the things on your list. (If you've ever had to write startup scripts you'll know how long it takes to develop and debug them.)

dredmorbius · on Aug 15, 2012

You can add state-change management to your system without mucking with really solid, stable, low-level, critical code like init.

There's already hotplug support, xinetd, ifupdown's pre/post up/down stanzas, and the like (though networkmanager's screwing that bit up wonderfully). Chroot jails too. I'm not saying that these are perfect (and some are a very pale shadow of perfect indeed), but they're independent of init.

Systemd mashes a whole bunch of crap in one place. Most of which I really don't want to have to worry about.

Now, if Arch and Fedora want to serve as test beds for this stuff -- and either perfect them or reject them as nonviable, well. Yeah, I suppose I can live with that. Though I'm definitely not a fan.

You see, there's a few things here.

For my own systems, I really like not having to fuck with useless shit. Currently I'm managing networking manually on my laptop as NetworkMangler has gone to crap again. So I run "ifconfig" and "route" from a root shell (yay for shell history and recursive reverse search).

For servers, part of my performance evaluation is based on how many nines I can deliver. Not having shit get fucked up does really nice things to my nines. Having shit change does crap things to my nines. I like my nines. I really hate change. It's an ops thing. Where I've got to have change, I like to have it compartmentalized, modularized, with loosely-linked parts and well-defined interfaces.

Startup scripts are a very much mostly solved problem. Debian gives you a nice template in /etc/init.d/skeleton. Play with it. Yes, I've written startup scripts.

rogerbinns · on Aug 15, 2012

No one is stopping you from making your own distro that meets your own needs. And servers generally do not have state changes, and it would generally be acceptable to just reboot on any of them.

You may enjoy micro-managing your networking etc - good for you. Some of us don't like doing that. To give one example of stuff that certainly doesn't just work, I was trying to run Squid on my (Ubuntu) laptop and it certainly can't handle state changes well, and neither can Ubuntu's ifup/down and init system. I often ended up having to manually do stuff that the system should have been able to handle well.

I'm personally delighted with systemd's functionality - the way it captures output from services would have saved me hours in the past from services that wouldn't startup cleanly and avoided providing useful information as to why.

(Separately: my kingdom for a simple caching web proxy server)

bkor · on Aug 15, 2012

Systemd on opensuse as well as Mageia is pretty damn reliable. It is not 'test bed'. Various distributions have been using this.

As a result, there are way less differences now between distributions. Which means configuration becomes easier.

In any case, if you really care about things not changing, then I assume you're using a distribution which doesn't change this suddenly. So I don't see why your so awfully negative.

jwcacces · on Aug 15, 2012

I'm a daily laptop linux user, and like you I know how to go a long time between reboots, so I can wait for my computer to start up.

But forget about us, we're already converts, we don't matter. My grandfather (93 years old) is also a daily laptop linux user. When he presses that power button, that laptop better be booted and ready /yesterday/. And when he pushes it again, it better be off before he closes the lid. Slow startup and shutdown times are simply not an acceptable user experience; they are literally the difference between enjoying and wanting to use the computer, and not wanting to bother with it.

And don't think for a minute he's going to learn about suspend, hibernate, power savings, battery life, or whatever. It's just not going to happen. His laptop lives in the closet, so it's going to be off (either by his doing, or the battery running out). When he sees something on tv and wants to read about it, he takes the laptop out, plugs it in, and turns it on. If it's not ready for him when he's ready for it (i.e. now) then he just won't use it.

However, since I've got that sucker booting from power button to firefox home page load complete in under 7 seconds, he uses it all the time. And it's amazing how it enriches his life. You simply can't get computer use to penetrate into lives like his without fast booting and an easy user experience.

dredmorbius · on Aug 15, 2012

The sane thing is to tie power management to the power button.

Light press: hybrid suspend suspends to RAM, also saves state to disk -- system spins down quickly and, so long as it's not been hibernating long enough to drain battery, restores in a second or so. Longer and it will do a boot/restore from disk.

Long press: powerdown.

Many devices have separate "suspend" and "poweroff" hardware (or soft controls) as well.

The OS and tools do all the magic bits.

jwcacces · on Aug 15, 2012

That's lovely, but you're not paying attention. It doesn't matter how it's set up. It matters how it preforms.

To the non-enthusiast / casual user, closing the lid, pressing the power button, doing a system shutdown, inactivity sleep timeout, and the battery running out are all the same thing: the computer was "on", now it's "off". Asking someone like this to think about how the reason it came to be "off" affects how fast it will be ready for them later is a fool's errand. It needs to be fast in every circumstance.

Normal people just want to get something done. They judge their computer by how easy it is to use and how fast it responds to what they do. That included cold boots, launching program, and downloading webpages. Even if they're doing something "the wrong way", they will still judge it with the same criteria and the same harshness. I want my grandfather to use linux because I can quickly help him and fix things from afar, and because there are very few ways for him to mess it up. He uses it because he really thinks it's better then windows, and that's purely because it's fast and easy, every way he uses it.

For the record, I set it up so the power button does a shutdown, and everything else results in a hybrid sleep. What he understands that he can shut it down if he wants, otherwise no matter what happens (lid closed or not) everything will be the way he left it, even if he forgets about it for a few days or doesn't charge it.

That kind of simplicity is what allows people to think of linux as something they can use, not just some super complicated tool for "hackers" and "computer geniuses". I'm not saying it should be dumbed down or have options removed, but I am saying that making it enjoyable for everyone results in more people using it, and that benefits us all.

ch0wn · on Aug 15, 2012

That is really cool, both your grandfather using a Linux laptop and the boot time. Could you name a few components you used?

jwcacces · on Aug 15, 2012

Thinkpad T61, Ubuntu 12.04 LTS, SSD. Trim down the services you don't need. Cold boot and hibernate restore take about the same time.

Honestly, I think the SSD has the most to do with it.

FooBarWidget · on Aug 15, 2012

Your message seems to have the hidden assumption that development resources are now being redirected from wireless drivers/suspend/power management to improving boot times. This is false. Different components are handled by different people, and axing a project does not mean that the people responsible will automatically work in one of the other fields that you prioritize.

jlgreco · on Aug 15, 2012

I am not sure I understand how systemd's development inhibits progress of any of the things on your wish-list.

nodata · on Aug 15, 2012

> My wish list goes something like this: better wireless drivers, improved sleep/suspend/hibernate/resume, better power management, a better package manager, more up-to-date applications ...

Use a different distro. None of these is a problem on a modern distro with reasonably modern hardware.

dredmorbius · on Aug 15, 2012

Exactly how precious are those CPU cycles? I mean, really. Can you put a dollar figure on them?

And then contrast that with the dollar figure for consultant / employee / remote hands time to figure out WTF went wrong?

There are numerous systems for managing services: monit is the best known, mon and several proprietary systems also exist. Nagios can tell you if the service is running or not (though it doesn't handle the start/stop logic).

These are small details and extensions on top of the existing SysV init foundation.

Ubuntu's boot time is already down to 8.6 seconds -- a restore from suspend is barely less than that (and restore from disk is considerably longer), though both restores preserve user state. You know, what applications / files you had open, and what was in them when you left off, positions of windows on your desktop. All that jazz. http://www.jamesward.com/2010/09/08/ubuntu-10-10-boots-in-8-...

The socket management is kind of nifty, but doesn't add a whole lot that xinetd didn't already offer (systemd does allow multi-socket services and d-bus-initiated services). I'm not convinced these couldn't be hacked into xinetd while preserving the simplicity and stability of init.

My desktop state (and its preservation) is worth a lot more than fast boot.

Yes. I've heard of inane gratuitous questions. As I said: if you're forcing average users to reboot with any frequency, you're Doing It Wrong.

jaylevitt · on Aug 15, 2012

No, monit doesn't manage services. Monit tries to follow clues you've given it about what's running, it polls them once in a while, and if something appears to be not running (as measured by the instructions you've given it), it runs the one-liner you've given it that should start the thing up again.

Monit does a thing that approximates managing a process, for certain values of "approximates", "managing", and "process". Supervisory process management is one of Linux's absolute weakest points. I cut my teeth on fault-tolerant HA minicomputers, and it pains me to think that 30 years later, we still don't have a way to say "make sure apache is always running. period."

As a great blog pointed out, there is exactly one process that KNOWS when a service has stopped running, and it doesn't need .pid files or polling or anything else to tell it: process 1.

I'm not a systemd advocate - I don't know enough about it, and we're using Ubuntu so I'll end up learning upstart anyway - but read this, it's way more eloquent that I can be:

http://dustin.github.com/2010/02/28/running-processes.html

dredmorbius · on Aug 15, 2012

Fair points. And thanks, by the way, for actually advancing the discussion.

Init can and does manage processes. Somewhat crudely, mostly via the 'respawn' directive. One thing it isn't particularly good at is telling if a process is doing something useful (say, serving out web pages successfully), but it will let you know that it's running. There was a semi-popular hack some years back to run sshd out of init (via respawn) to ensure you always had an SSH daemon on your box (Dustin mentions this). The downside is that while it will ensure sshd is running, it doesn't give you much flexibility over the process (you've got to edit inittab and 'init q' to make changes).

What monit and kin can do, above and beyond process-level monitoring, is check that the service attributes of a process are sane. That a webserver, say, kicks out a 200 OK response rather than a 4## or 5## error, and restart the service if this isn't the case. Checking for correct operation can be more useful than simply verifying a process is running (though going too far overboard in defining "correctness" can also cause problems).

For realtime/HA tools, attacking things on the single-system level is probably the wrong way to roll. You want a load balancer in front of multiple hosts with response detection -- is host A still up or not? Whether or not this ties into mitigation (restart) or alerting (notifications to staff) is another matter.

There are also places other than init you can watch things from. /proc contains within it multitudes, including a lot of interesting/useful process state. Daemons can be written with control/monitoring sockets instrumented directly into themselves. Debuggers, strace, ltrace, dtrace, and systemtap all provide resolution inside a running process/thread. Creating something sane, effective, efficient, and sufficient out of all these tools ... interesting problem.

solenskiner · on Aug 15, 2012

>Ubuntu's boot time is already down to 8.6 seconds

Well, ubuntu doesnt use sysvinit either.

0xbadcafebee · on Aug 15, 2012

How long does it take your servers to finish POST? Shaving CPU cycles on boot is not something I ever worry about, because just getting to the boot loader takes minutes.

Also, shell scripts rock for an init system language. It's a language that almost everyone knows and can debug without being a CS major. The only reason you 'have no idea what happened' is because the scripts are written poorly, and code in any language would be hard to debug if it's written poorly.

Fork and exec, seriously? You're worried about functions that take microseconds to finish? Look again - the huge sleep cycles to wait for drivers to finish initializing takes up a lot more time.

I have written my own init systems three times in three languages, and examined countless distros' versions. Trust me, shell is the best compromise.

alrs · on Aug 15, 2012

I have a Debian laptop.

ls -alh | wc -l returns 89. I can subtract "..", ".", and the "totals" lines, so that's 86 init scripts.

Big O for 86 scripts is 86 * n, which simplifies to "n". I'm not concerned.

dredmorbius · on Aug 15, 2012

'ls -A | wc -l' will spare you having to account for the '.' and '..' lines. Omitting the '-l' (redundant for your case) also spares the "totals" line.

Ralith · on Aug 15, 2012

"Oh, and because each script runs in less than an hour, we can just say n = 1 hour, and that's a constant, so it simplifies down to instant!"

Yeah, that's not how computational complexity works.