SSH and DNS

A short while ago, I noticed logging into crimson via ssh from other machines on my home network was occasionally very slow. I knew that the most common cause of delays in ssh were DNS issues, but (irrationally) I had a hard time believing this was the cause. I thought it more likely that my recent upgrade of crimson to FreeBSD 5.3 was the root of the problem. Well, that opinion was complete bollocks...

The reason I suspected the upgrade of being the root cause was due to changes I had made to the kernel configuration. I had stripped everything out of the kernel and world that I did not need, by setting the following make variables in /etc/make.conf:

  # make.conf for CRIMSON
  #
  # Modification History:
  # Date      Who
  # 20050311  mpw
  #   Added options to prevent unused functionality being built
  #   Added BATCH=true

  # build for i586
  CPUTYPE=i586

  # standard CFLAGS
  CFLAGS= -O -pipe

  # no profiled libraries
  NOPROFILE=true

  # just need a kernel - crimson uses no modules
  NO_MODULES=true

  # omit these elements from buildkernel/buildworld
  NO_BIND=true
  NO_IPFILTER=true
  NO_PF=true
  NO_AUTHPF=true
  NOATM=true
  NO_USB=true
  NO_LPR=true
  NO_ACPI=true
  NO_VINUM=true
  NO_BLUETOOTH=true
  NO_I4B=true
  NO_OBJC=true
  NO_FORTRAN=true
  NO_GPIB=true
  NO_SHAREDOCS=true
  NO_NIS=true
  # can't use AAAA queries against the Alcatel DNS
  NOINET6=true

  # Build ports in unattended mode
  BATCH=true

  # added by use.perl 2005-03-09 12:06:02
  PERL_VER=5.8.6
  PERL_VERSION=5.8.6

I had then found the following sshd error in the system log file for each ssh login:

  sshd[n]: login_getclass: unknown class 'root'

Didn't seem to stop me logging in, but annoying just the same. After a little digging, I found someone else had reported the problem, which turned out to be due to the removal of NIS from the kernel via a custom /etc/make.conf. The default system /etc/nsswitch.conf settings assumed that NIS was present. By modifying the /etc/nsswitch.conf to the contents shown below, the sshd errors no longer occurred.

  # Modified by mpw
  # 18/05/2005 - remove nis from settings, as NO_NIS is specified in
  #              /etc/make.conf
  group: files
  hosts: files dns
  networks: files
  passwd: files
  shells: files
  # original settings
  #group: compat
  #group_compat: nis
  #hosts: files dns
  #networks: files
  #passwd: compat
  #passwd_compat: nis
  #shells: files

Back to the topic in hand... One evening, the delays in ssh were really significant, and continued to be reproducible for a long period of time. At last I had a chance to perform some diagnostics. Firstly, to convince myself that the problems were not DNS related, I put the IP addresses of the two machines I was testing into the /etc/hosts file, in order to ensure that DNS would not be used. Hmm, no delays when connecting via ssh. OK, I was convinced it must be a DNS issue.

What puzzled me was that using nslookup to check the DNS servers seemed to work fine - no delays or errors were discernable. What was going on?

I thought I ought to check that the DNS addresses provided by my ISP were correct. What do you know? They'd changed the addresses and omitted to tell me. Talk about impolite... To check that this was the problem, I hacked the new addresses into /etc/resolv.conf and, as if by magic, the delays disappeared. Obviously the old DNS addresses worked, but were a lot slower to react to the queries from sshd.

All I had to do was change the DHCP server (an Alcatel Speedtouch 510) to send the new DNS addresses when giving out a DHCP address to the local network machines. See this article for details.