SSH no more

Ok, panic! I was trying to SSH into the RPi 3 of my drone (as usual), but a worrying

Connection closed by remote host

was the only message I was getting. Why panic? Because the HDMI port of the Pi is buried deep inside the drone, covered by cables and there is no chance to reach it without opening up the frame and moving a bunch of cables…

Ok, no panic, mount the USB pen on the desktop PC (it mounts), and take a look at auth.log and syslog.

Ah-ah! auth.log is filled with these weird messages!

Jul 18 14:34:38 localhost sshd[1948]: PAM unable to dlopen(pam_unix.so): /lib/security/pam_unix.so: cannot open shared object file: No such file or directory
Jul 18 14:34:38 localhost sshd[1948]: PAM adding faulty module: pam_unix.so
Jul 18 14:34:38 localhost sshd[1948]: fatal: Access denied for user pelan by PAM account configuration [preauth]

Strange, on the Pi there is no /lib/security dir altogether and I don’t even have /lib/security/pam_unix.so on my amd64 desktop PC…

#> ll /lib/security/
total 36K
-rw-r--r-- 1 root root 19K Jul 13  2016 pam_ecryptfs.so
-rw-r--r-- 1 root root 15K Nov 12  2014 pam_freerdp.so

Ok, a guy on Unix & Linux on StackExchange says that editing /etc/ssh/sshd_config and setting UsePAM no will make me login. And it was true, but it seems that PAM rules over a lot of stuff, sudo for example. 😢

The message this time was a even more cryptic:

sudo: policy plugin failed session initialization

I tried to google for it for some time, but then gave up and decided to reconnect brain. And that was the winning choice this time, because… I totally forgot to check syslog!

And in fact it was covered by:

Jul 18 15:32:44 localhost kernel: [  949.206136] BTRFS warning (device sda2): csum failed root 5 ino 322973 off 36864 csum 0x46e98bcc expected csum 0xa9f61a90 mirror 1
Jul 18 15:32:57 localhost kernel: [  962.173692] BTRFS warning (device sda2): csum failed root 5 ino 322973 off 36864 csum 0x46e98bcc expected csum 0xa9f61a90 mirror 1
Jul 18 15:34:23 localhost rsyslogd-2007: action 'action 10' suspended, next retry is Wed Jul 18 15:35:23 2018 [v8.16.0 try http://www.rsyslog.com/e/2007 ]
Jul 18 15:34:23 localhost kernel: [ 1048.132095] BTRFS warning (device sda2): csum failed root 5 ino 322973 off 36864 csum 0x46e98bcc expected csum 0xa9f61a90 mirror 1
Jul 18 15:34:44 localhost kernel: [ 1068.852398] BTRFS warning (device sda2): csum failed root 5 ino 322973 off 36864 csum 0x46e98bcc expected csum 0xa9f61a90 mirror 1
Jul 18 15:34:46 localhost kernel: [ 1071.078087] BTRFS warning (device sda2): csum failed root 5 ino 322973 off 36864 csum 0x46e98bcc expected csum 0xa9f61a90 mirror 1
Jul 18 15:34:52 localhost kernel: [ 1076.894146] BTRFS warning (device sda2): csum failed root 5 ino 322973 off 36864 csum 0x46e98bcc expected csum 0xa9f61a90 mirror 1

So move the USB pen back to the PC and start a BTRFS scrub with sudo btrfs scrub start /mnt/test2/. Then with:

> sudo btrfs scrub status /mnt/test2 -R
scrub status for 956245f3-b968-4a69-a1b0-54665461792e
	scrub started at Wed Jul 18 17:48:46 2018 and finished after 00:03:51
	data_extents_scrubbed: 132946
	tree_extents_scrubbed: 20132
	data_bytes_scrubbed: 4227706880
	tree_bytes_scrubbed: 329842688
	read_errors: 0
	csum_errors: 2
	verify_errors: 33
	no_csum: 272
	csum_discards: 0
	super_errors: 0
	malloc_errors: 0
	uncorrectable_errors: 2
	unverified_errors: 0
	corrected_errors: 33
	last_physical: 7927234560

I realized that I had a couple of unrecoverable errors. Is there a way to see which file has been corrupted (e.g. like with debugfs -R "ncheck $inode" $partition + ecryptfs-find?)

Fortunately it’s even easier!

dmesg | grep "checksum error at" | tail -4 | cut -d\  -f24- | sed 's/.$//'

Found, guess where? On Unix & Linux!.

Also guess who’s the culprit?

offset 32768, length 4096, links 1 (path: lib/arm-linux-gnueabihf/security/pam_unix.so
offset 36864, length 1876, links 1 (path: lib/arm-linux-gnueabihf/security/pam_unix.so

Yes, exactly him!

So now? Well, now that the problem has shown, it should be easy to fix. First: check which package this dynamic library belongs to (is this correct english, yes? 😆)

#> dlocate /lib/x86_64-linux-gnu/security/pam_unix.so 
libpam-modules:amd64: /lib/x86_64-linux-gnu/security/pam_unix.so

Fine. Second: which version?

#> dpkg -l *libpam-modules*
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                            Version                      Architecture                 Description
+++-===============================================-============================-============================-===================================================================================================
ii  libpam-modules:amd64                            1.1.8-3.2ubuntu2.1           amd64                        Pluggable Authentication Modules for PAM
ii  libpam-modules-bin                              1.1.8-3.2ubuntu2.1           amd64                        Pluggable Authentication Modules for PAM - helper binaries

Excellent. Third: we need it for armhf, let’s try to find it… here it is! Actually, Google showed me the i386 version, but it’s sufficient to change the i386 in the url with armhf.

libpam-modules-1.1.8-3.2ubuntu2.1

Ok, copy link location, then

#> mkdir -p /tmp/libpam
#> cd /tmp/libpam
#> wget 'http://launchpadlibrarian.net/364006221/libpam-modules_1.1.8-3.2ubuntu2.1_armhf.deb'
#> dpkg-deb --extract libpam-modules_1.1.8-3.2ubuntu2.1_armhf.deb .
#> cd lib/arm-linux-gnueabihf/security
#> for i in *.so ; do echo $i ; md5sum $i /mnt/test2/lib/arm-linux-gnueabihf/security/$i ; done

Let’s take a look at the results. Ok, everything is identical, except… him.

9ff6356da6e8be0b0a4ea76a67dc9068  pam_umask.so
9ff6356da6e8be0b0a4ea76a67dc9068  /mnt/test2/lib/arm-linux-gnueabihf/security/pam_umask.so
pam_unix.so
6ccaaa1c7abf863e7df915cc7b31362d  pam_unix.so
md5sum: /mnt/test2/lib/arm-linux-gnueabihf/security/pam_unix.so: Input/output error
pam_userdb.so
bf55dfb6db4adebfe77ea60fa11c5711  pam_userdb.so
bf55dfb6db4adebfe77ea60fa11c5711  /mnt/test2/lib/arm-linux-gnueabihf/security/pam_userdb.so

Let’s go on and replace it with a fresh copy.

#> sudo rm /mnt/test2/lib/arm-linux-gnueabihf/security/pam_unix.so
#> sudo cp pam_unix.so /mnt/test2/lib/arm-linux-gnueabihf/security/
#> sync
#> sudo umount /mnt/test2

D’oh, forgot to uncomment UsePAM yes in /mnt/test2/etc/ssh/sshd_config.

But, eventually…

#> pelan-drone
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.14.18-v7+ armv7l)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  Get cloud support with Ubuntu Advantage Cloud Guest:
    http://www.ubuntu.com/business/services/cloud

77 packages can be updated.
35 updates are security updates.


Last login: Wed Jul 18 15:46:31 2018 from 10.17.0.155

The hooman wins again! 😆

But it’s also true that I need to find a slower but safer way to shutdown the distro than simply unplug the batteries.