Linux Follies

2017-10-21

Getting tilde in a Linux VirtualBox guest on Linux via X11 on Mac OS

I only just discovered this. On a Linux server at work (call it "workserver"), I run a VirtualBox VM. When I work on my Mac at home, I launch the XQuartz X11 server, and use VirtualBox on workserver displaying to my Mac.

This is not something I do often, so I had never encountered what I am about to describe until now. In a terminal on the guest, typing the ` key on my Mac gives <. And typing Shift-` (which is usually ~) gives >.

Oddly, this does not happen with a guest that is running directly on the Mac as the host.

The kluge, which I found at this UK-based blog for surgeons, is to remap the key using xmodmap. (I thought I was done with xmodmap about 10 years ago.) NB they have a typo: it should be “tilde” rather than “tilda”. Create the file ~/.xmodmaprc with the following line:

keycode 94 = grave asciitilda

And then:

$ xmodmap ~/.xmodmaprc

What I don’t quite understand is the value of the keycode, 94. On the guest, I do:

$ sudo showkey -s

and then type the key I want to see. It emits the code 0x56 on press, and 0xd6 on release. I thought that would be the keycode (after conversion to decimal), but it is not.

My guess is that it is an issue with X11. In XQuartz, I tried the 4 combinations of setting and unsetting these two:

Follow system keyboard layout
Enable key equivalents under X11

but they did not change the way the grave/tilde key worked, i.e. it still emitted </>.

I also tried setting the keyboard locale:

$ localectl status System Locale: LANG=en_US.UTF-8 VC Keymap: us X11 Layout: us X11 Model: pc105+inet X11 Options: terminate: ctrl_alt_bksp$ localectl set-keymap us-mac$ localectl status System Locale: LANG=en_US.UTF-8 VC Keymap: us-mac X11 Layout: us X11 Model: pc105+inet X11 Options: terminate: ctrl_alt_bksp

but that did nothing, either.

2017-10-19

Setting up a software RAID with sgdisk and mdadm

I wanted to set up a RAID0 (striped array) on two HDDs to servce as cache for Duplicity backups. And I wanted to use GPT and only command line tools: sgdisk(8) for partitioning, and mdadm(8) for creating the software RAID. (I have usually just used Gparted, a GUI partitioning tool.)

All of this was done on Red Hat Enterprise Linux 6.5.

So, I have two (spinning disc) HDDs, each 931 GB, mounted as

/dev/sda
/dev/sdb

First, zap any partitioning information they may have. (In all the examples below, "X" should be replaced by "a" or "b".)

# sgdisk -Z /dev/sdX

Next, partition them. The partitions have to be of type 0xFD00 "Linux RAID". You can do "sgdisk -L" to see a list of all available types. These type codes are not the same as the type codes used by fdisk(8).

The partitions will be 512 GB, leaving some for other uses.

# sgdisk -n 0:0:+512G -c 0:"cache" -t 0:0xFD00 /dev/sdX
# sgdisk -n 0:0:0 -c 0:"misc" /dev/sdX

The "0" first digit of the argument to "-n", "-c", and "-t" is shorthand for the first available partition number. In this case, the first line would be "1" and the second line would be "2". (N.B. this is automatically set; "1" and "2" do not need to be used in the commandline.)

In the second line, note that "-n 0:0:0" uses the default of starting at the first unallocated sector, and ending at the last allocateable sector on the drive, thereby using up the rest of the HDD for the "misc" partition 2. Leaving out the type specification, "-t", gives the default 0x8300 "Linux filesystem."

Print out the partition info to check:

# sgdisk -p /dev/sdX
Disk /dev/sdX: 1953525168 sectors, 931.5 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): xxxxxxxxxxxxxxx-xxx-xxxxxxxxxxxxxxxx
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 1953525134
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)

Number Start (sector)    End (sector) Size       Code Name
   1            2048      1073743871   512.0 GiB   FD00 cache
   2      1073743872      1953525134   419.5 GiB   8300 misc

And we see that it has done what we expected.

Next, we create the RAID0 from /dev/sda1 and /dev/sdb1:

# mdadm -v -C /dev/md/mycomputer:cache -l stripe -n 2 /dev/sda1 /dev/sdb1

This creates a device /dev/md127 with a symbolic link for the readable name:

lrwxrwxrwx 1 root root 8 2017-10-18 18:11:31 -0400 /dev/md/mycomputer:cache -> ../md127

In mdadm v3.3.4 in RHEL6, I found that the name given to the "-C/--create" option would always end up being "mycomputer:something", where "something" was determined by what you actually give it. This name comes up after reboot.

The "-v" is for verbose output, "-l" is for the RAID level (which can be specified by integer, or string), "-n" is the number of devices, and the positional arguments are a list of the devices to be used.

Also, the integer N in /dev/mdN is determined by the system. It seems to start with 127.

For instance, doing

# mdadm -C /dev/md0

after rebooting gave this:

lrwxrwxrwx 1 root root 8 2017-10-18 18:11:31 -0400 /dev/md/mycomputer:0 -> ../md127

And doing

# mdadm -C /dev/md/cache

gave

lrwxrwxrwx 1 root root 8 2017-10-18 18:11:31 -0400 /dev/md/mycomputer:cache -> ../md127

I wised up on my third time through, and named it what it was going to pick, anyway.

The RAID needs to be "assembled" and activated at boot time. This is not done by default. To do this, a file /etc/mdadm.conf must be created. (Other distros may have a different location for this file.)

Assuming there is no such file, start by using mdadm(8) to output the array specification to the file:

# mdadm -Ds /dev/md/mycomputer:cache > /etc/mdadm.conf
# cat /etc/mdadm.conf
ARRAY /dev/md/mycomputer:cache metadata=1.2 name=mycomputer:cache UUID=xxxxxxxx

Very important: this UUID will not be the same as the UUID of the filesystem we will create later.

Add DEVICE, MAILADDR, and AUTO lines to /etc/mdadm.conf, resulting in:

DEVICE /dev/sda1 /dev/sdb1
MAILADDR myname@myemail.net
AUTO +all
ARRAY /dev/md/mycomputer:cache metadata=1.2 name=mycomputer:cache UUID=xxxxxxxx

I did the next bit in single-user mode as I wanted this to be mounted as /var/cache, which is also used by several other things. Also, since it gets tiresome writing out the whole device name, I used the short name /dev/md127.

# telinit 1
# mkfs.ext4 /dev/md127

Next, I mounted the device in a temporary location to transfer the existing contents:

# mkdir /mnt/tmpmnt
# mount /dev/md127 /mnt/tmpmnt
# cd /var/cache
# tar cf - * | ( cd /mnt/tmpmnt ; tar xvf - )

Get the UUID of this new filesystem for use in /etc/fstab:

# blkid /dev/md127
yyyyyyyyy-yyyyyyyy-yyyyyyyyyy-yyyyyyyyy

And create an entry in /etc/fstab:

UUID=yyyyyyyyy-yyyyyyyy-yyyyyyyyyy-yyyyyyyyy /var/cache ext4   defaults    0 2

And reboot!

2017-10-04

Apache Spark integration with Grid Engine (update for Spark 2.2.0)

Apache Spark is a popular (because it is fast) big data engine. The speed comes from keeping data in memory. This is an update to my older post: it is still Spark in standalone mode, using the nodes assigned by GE as the worker nodes. I have an update for using Spark 2.2.0, with Java 1.8.0.

It is mostly the same, except only one file needs to be modified: sbin/slaves.sh The Parallel Environment (PE) startup script update only adds an environment variable for defining where the worker logs go. (Into a Grid Engine job-specific directory under the job directory.) And it now specifies Java 1.8.0.

As before, the modifications to sbin/slaves.sh handle using the proper spark-env script based on the user's shell. Since that spark-env script is set up by the PE script to generate job-specific conf and log directories, everything job-specific is separated.

2017-09-30

Why kernel development still uses email

Good post on why email is ideal for kernel development process:

As Rusty Russell once said, if you want to get smarter, the thing to do is to hang out with smart people. An email-based workflow lets developers hang out with a project's smart people, making them all smarter. Greg wants Linux to last a long time, so wants to see the kernel project use tools that help to bring in new developers. Email, for all its flaws, is still better than anything else in that regard.

2017-09-22

Shorewall setup for VirtualBox host-only interface

VirtualBox has a networking mode called "host-only" which allows guests to communicate with each other, and the host to communicate with the guests.

To do this, a host-only network (interface) must be defined on the host. It can be done via GUI:

or via the commandline (needs sudo because this creates a new network interface on the host):

$ sudo vboxmanage hostonlyif create

This creates a host-only virtual interface on the host, named vboxnetN (N starts at 0 and increments for each new one):

$ ip addr list
...
12: vboxnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether ...
inet 192.168.56.1/24 brd 192.168.56.255 scope global vboxnet0
inet6 fe80::800:27ff:fe00:0/64 scope link
valid_lft forever preferred_lft forever

There are three things to do in Shorewall: define a zone, place the host-only interface into that zone, and write a rule.

In /etc/shorewall/zones define the new zone:

# /etc/shorewall/zones
#ZONE TYPE OPTIONS IN OUT
# OPTIONS OPTIONS
vh ipv4

In /etc/shorewall/interfaces put the host-only interface vboxnet0 in that zone:

# /etc/shorewall/interfaces

#ZONE INTERFACE BROADCAST OPTIONS

vh vboxnet0 detect dhcp

And finally, in /etc/shorewall/rules allow all traffic in the vh zone:

# /etc/shorewall/rules

ACCEPT vh:192.168.56.0/24 fw all

On the guest, create a new adapter, and either use DHCP or assign it a static IP in 192.168.56.0/24 (excluding 192.168.56.1, which is the host's IP address). Attach the adapter to the Host-only Adapter:

Or use the command line:

$ vboxmanage modifyvm myguest --nic2 hostonly

Restart the shorewall service, and that should do it. Test it out by ssh'ing into the guest from the host.

2017-09-09

Proposed fix for duplicity Azure backend breakage

At work, I just got a Microsoft Azure Cool Blob Storage allocation for doing off-site backups. The Python-based duplicity software is supposed to be able to use Azure Blob storage as a backend. It does this by using the azure-storage Python module provided by Microsoft.

Unfortunately, a recent update of azure-storage broke duplicity. The fix was not to hard to implement; mostly minor changes in class names, and one simplification in querying blob properties. It took me a few hours to make a fix, and I just submitted my changes as a merge request to duplicity. ~~The proposed merge can be found at Launchpad.~~

UPDATE Unfortunately, I made a mistake and made my changes against the 0.7.14 release rather than trunk. It looks like there is already a lot of work in trunk to deal with the current azure-storage version. So, I withdrew the merge request. I'll work from the 0.8 series branch, instead. Currently, it looks like 0.8 all works as is.

2016-11-10

Linux daemon using Python daemon with PID file and logging

The python-daemon package (PyPI listing, Pagure repo) is very useful. However, I feel it has suffered a bit from sparse documentation, and the inclusion of a "runner" example, which is in the process of being deprecated as of 2 weeks ago (2016-10-26).

There are several questions about it on StackOverflow, going back a few years: 2009, 2011, 2012, and 2015. Some refer to the included runner.py as an example, which is being deprecated.

So, I decided to figure it out myself. I wanted to use the PID lockfile mechanism provided by python-daemon, and also the Python logging module. The inline documentation for python-daemon mention the files_preserve parameter, a list of file handles which should be held open when the daemon process is forked off. However, there wasn't an explicit example, and one StackOverflow solution for logging under python-daemon mentions that the file handle for logging objects may not be obvious:

for a StreamHandler, it's logging.root.handlers[0].stream.fileno()
for a SyslogHandler, it's logging.root.handlers[1].socket.fileno()

After a bunch of experiments, I think I have sorted it out to my own satisfaction. My example code is in GitHub: prehensilecode/python-daemon-example. It also has a SysV init script.

The daemon itself is straigtforward, doing nothing but logging timestamps to the logfile. The full code is pasted here:

#!/usr/bin/env python3.5
import sys
import os
import time
import argparse
import logging
import daemon
from daemon import pidfile

debug_p = False

def do_something(logf):
    ### This does the "work" of the daemon

    logger = logging.getLogger('eg_daemon')
    logger.setLevel(logging.INFO)

    fh = logging.FileHandler(logf)
    fh.setLevel(logging.INFO)

    formatstr = '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
    formatter = logging.Formatter(formatstr)

    fh.setFormatter(formatter)

    logger.addHandler(fh)

    while True:
        logger.debug("this is an DEBUG message")
        logger.info("this is an INFO message")
        logger.error("this is an ERROR message")
        time.sleep(5)


def start_daemon(pidf, logf):
    ### This launches the daemon in its context

    global debug_p

    if debug_p:
        print("eg_daemon: entered run()")
        print("eg_daemon: pidf = {}    logf = {}".format(pidf, logf))
        print("eg_daemon: about to start daemonization")

    ### XXX pidfile is a context
    with daemon.DaemonContext(
        working_directory='/var/lib/eg_daemon',
        umask=0o002,
        pidfile=pidfile.TimeoutPIDLockFile(pidf),
        ) as context:
        do_something(logf)


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Example daemon in Python")
    parser.add_argument('-p', '--pid-file', default='/var/run/eg_daemon.pid')
    parser.add_argument('-l', '--log-file', default='/var/log/eg_daemon.log')

    args = parser.parse_args()
    
    start_daemon(pidf=args.pid_file, logf=args.log_file)