Building FreeBSD for multiple machines

With the advent of topaz, I had a reasonably fast machine to complement the slow machines crimson and chrome. With this I could use the FreeBSD handbook-described technique on Tracking for Multiple Machines.

My first attempt, using the upgrade to FreeBSD 6.2 as the target, was not entirely successful.

First, I build the world and kernel (GENERIC) for topaz and installed without problem. Next, I copied the config files for crimson and chrome to topaz, then built the kernels to match these configs. However, I noticed it was building everything, not the pared-down kernels I needed. Ah, I had forgotten to use the /etc/make.conf file to match the kernel configs. I copied these to topaz and used them to make the kernels e.g.:

  make buildkernel KERNCONF=CRIMSON  __MAKE_CONF=/path/to/CRIMSON/make.conf

[Sharp-eyed readers may have already noticed my error at this point.]

I then proceeded to NFS mount topaz:/usr/src and topaz:/usr/obj over the chrome equivalents. I dropped chrome to single user mode, via shutdown now, and ran mergemaster -p. I noticed the following message:

  Unable to find mtree database. Skipping auto-upgrade.

I'd never seen this message from mergemaster before, but decided to ingore it and proceed with the install of the kernel and world. On running the post-install mergemaster, I noticed the same error message, but everything appeared to proceed as normal.

I rebooted into multiuser mode. Everything seemed fine, until I tried to login in. Login told me that pam_unix.so could not be found. The first signs of panic began to set in. This occured for every login id I tried. This was not good. Panic level increased. I rebooted into single user mode and took a look at /etc/pam.d and /usr/lib where the pam_ files are located. Nope, everything looked fine. Panic level was now dangerously high...

I thought for a time that the problem had something to do with the mtree message from mergemaster, and some config file was mangled. However, I then remembered that I had built the kernel and world with different make.conf settings. I rebuilt the world on topaz using the same make.conf as the kernel, and tried another installworld. Following this, to my relief, on reboot into multi-user mode login worked properly. Lesson learned.

FreeBSD 6.2 Mergemaster

Afterwards, I discovered that the mtree message from mergemaster is new in 6.2 (so I must have seen it when I installed topaz, but had memory loss soon after). There is a new auto-upgrade feature (invoked via the -U flag), which uses a mtree database located in /var/db/mergemaster.mtree. When moving from 6.1, the mtree file does not exist, but the 6.2 mergemaster creates it at the end of the first run. The auto-upgrade feature looks useful, so I'll give it a try when I next upgrade. Almost worth the red herring it provided!

Addendum - 28th May, 2011

I was bitten by this again today. Since crimson's transubstantiation into a much faster body, it is now used as the FreeBSD build machine, then installing the kernel and world on topaz using NFS. Some months ago, I changed the /etc/src.conf file (which is where the WITHOUT_ knobs were moved in 7.0) to include NIS in the build, as I didn't want to bother with having to change the default settings in /etc/nsswitch.conf (to remove NIS) when upgrading. If you don't make this change in nsswitch.conf, when NIS is not built various error messages are emitted related to NSSWITCH, e.g.

Mar  1 18:00:00 chrome cron[2184]: NSSWITCH(nss_method_lookup): nis, group_compa  t, endgrent, not found

Naturally, I forgot to propagate this change to the topaz version of /etc/src.conf. What should have been a simple upgrade to 8.1-p2 became slightly panicky, as I witnessed the following message when attempting to login:

topaz login: in openpam_load_module(): no pam_unix.so found
topaz login: pam_start(): system error

After some poking around and re-reading this article, I finally remembered/realised what I'd done. Changing /etc/src.conf on topaz to match the build machine and re-performing the make installworld step in single-user mode was sufficient to get login working again.