Linux Follies

2018-06-15

The usefulness of awk in the present day

This anecdote illustrates exactly why scientists should invest some time in learning some of the standard tools of Unix/Linux.

https://news.ycombinator.com/item?id=17324122

2018-04-10

How to install the autoconf archive

This is a quick and dirty way to install the autoconf archive, “a collection of more than 500 macros for GNU Autoconf that have been contributed as free software.”

Either clone the git repository or download a tarball. Decide on a place for the macros: they will all reside in a single directory. Let’s say /usr/local/share/aclocal. Then, just copy all the .m4 files from the m4 subdirectory of the repo into /usr/local/share/aclocal. Next, set in either a system or user-specific login script, the environment variable ACLOCAL_PATH=/usr/local/share/aclocal. Like other PATH-like environment variables, the value is a colon-delimited list.

2018-03-29

Deficiency in Tumblr's two-factor authentication (2FA) implementation

This blog is mirrored, using an IFTTT applet (f.k.a. recipe), to http://linuxfollies.tumblr.com/ Two-factor authentication on a Tumblr account supports two methods: an app code generator (e.g. Google Authenticator, Authy, Duo Mobile), and SMS. Notably, it does not generate a list of one-time backup codes like most services do.

Backup codes are necessary in case the device is not accessible, e.g. lost or stolen, particularly if you are abroad without your usual SIM (perhaps it is also stolen) which means that SMS would not reach you.

SMS is not recommended for routine two-factor use because SMS can be hijacked. The National Institute of Standards and Technology (NIST) does not recommend SMS for two-factor authentication. See also: The Verge, and Schneier. As such, I normally do not enable SMS as a second factor.

Getting to the point, I got a new phone yesterday. I spent a couple of hours the night before making sure I had backup codes and/or a secondary method for the 2nd factor. All went well, but I had no Tumblr backup codes. Nor did I set SMS as an auth method.

Tumblr's recovery process requires that you have a photo of your face on your Tumblr account (avatar, etc.). Then, you send a picture of yourself holding a piece of paper with something particular written on it, which you then send to them, together with the URL of the picture already on Tumblr.

So, linuxfollies.tumblr.com is now no longer under my control. It will, however, keep getting mirrors of posts here as long as IFTTT remains up.

2018-01-16

Python multiprocessing ignores cgroups

At work, I noticed a Python job that was causing a large overload condition. The job requested 16 CPU cores, and was running gensim.models.ldamulticore requesting 15 workers. However, the load on that server indicated an overload of many times over. This is despite a cgroups cpuset restricting the number of cores for that process to 16.

It turns out, gensim.models.ldamulticore uses Python multiprocessing. That module decides how many threads to run based on the number of CPU cores read directly from /proc/cpuinfo. This completely bypasses the limitations imposed by cgroups.

There is currently an open enhancement request to add a new function to multiprocessing for requesting the number of usable CPU cores rather than the total number of CPU cores.

2018-01-11

Update to Grid Engine Job Submission Verifier (JSV) Using Go

A while ago, I posted about a job submission verifier (JSV) for Univa Grid Engine to try to handle job submissions which had less than ideal resource requests by leveraging cgroups. It was based on Daniel Gruber's JSV Go API.

In the 3+ years since that post, we had stopped using the JSV for one reason or another (including a Univa issue with cgroups and interaction with a specific kernel version), and just manually dealt with issues that came up by communicating with the users. Since then, as well, Daniel has updated API to be more Go-like. And we have had a fairly bad round of multithreaded programs submitted as serial jobs using up to 64 threads on our 64-core nodes.

So, I dusted off the old code, refreshed it, and reduced its scope to just deal with two cases: serial jobs, and multithreaded jobs. These types are jobs are defined either by a lack of PE (serial jobs), or a finite set of PEs (multithreaded).

There still is a deficiency in that the JSV cannot really deal with slot ranges. In Grid Engine, it is possible to request a range of slots for jobs, e.g. “-pe multithread 4-12” which would allow a job to be assigned any number of slots from 4 to 12. This is useful for busy clusters and users who would rather their jobs run slower than wait for the full 12 slots to open up.

Anyway, the JSV code is pretty straightforward. Find it here: https://github.com/prehensilecode/pecheck_simple

Together with this, UGE must be configured to have cgroups enabled (see your documentation). Here is the setup on our cluster -- the freezer functionality is disabled as there may be an issue in the interaction with RHEL 6 kernels:

cgroups_params cgroup_path=/cgroup cpuset=true mount=true \
killing=true freezer=false freeze_pe_tasks=false \
forced_numa=true h_vmem_limit=true \
m_mem_free_hard=true m_mem_free_soft=true \
min_memory_limit=250M

The JSV code is short enough that I include it here:

/*
* Requires https://github.com/dgruber/jsv
*/

package main

import (
"strings"
"github.com/dgruber/jsv"
)

func jsv_on_start_function() {
//jsv_send_env()
}

func job_verification_function() {
//
// Set binding on serial jobs (i.e. no PE) to "linear:1
//
var modified_p bool = false
if !jsv.IsParam("pe_name") {
jsv.SetParam("binding_strategy", "linear_automatic")
jsv.SetParam("binding_type", "set")
jsv.SetParam("binding_amount", "1")
jsv.SetParam("binding_exp_n", "0")
modified_p = true
} else {
pe_name, _ := jsv.GetParam("pe_name")

/* XXX the "shm" PE is the single-node multicore PE
* change this to the equivalent for your site;
* the "matlab" PE is identically defined to the "shm" PE
* XXX note that this does not properly deal with a range of number of slots;
* it just takes the max value of the range
*/
if (strings.EqualFold("shm", pe_name) || strings.EqualFold("matlab", pe_name)) {
pe_max, _ := jsv.GetParam("pe_max")
jsv.SetParam("binding_strategy", "linear_automatic")
jsv.SetParam("binding_type", "set")
jsv.SetParam("binding_amount", pe_max)
jsv.SetParam("binding_exp_n", "0")
modified_p = true
}
}

if modified_p {
jsv.Correct("Job was modified")
} else {
jsv.Correct("Job was not modified")
}

return
}

func main() {
jsv.Run(true, job_verification_function, jsv_on_start_function)
}

2017-10-21

Getting tilde in a Linux VirtualBox guest on Linux via X11 on Mac OS

I only just discovered this. On a Linux server at work (call it "workserver"), I run a VirtualBox VM. When I work on my Mac at home, I launch the XQuartz X11 server, and use VirtualBox on workserver displaying to my Mac.

This is not something I do often, so I had never encountered what I am about to describe until now. In a terminal on the guest, typing the ` key on my Mac gives <. And typing Shift-` (which is usually ~) gives >.

Oddly, this does not happen with a guest that is running directly on the Mac as the host.

The kluge, which I found at this UK-based blog for surgeons, is to remap the key using xmodmap. (I thought I was done with xmodmap about 10 years ago.) NB they have a typo: it should be “tilde” rather than “tilda”. Create the file ~/.xmodmaprc with the following line:

keycode 94 = grave asciitilda

And then:

$ xmodmap ~/.xmodmaprc

What I don’t quite understand is the value of the keycode, 94. On the guest, I do:

$ sudo showkey -s

and then type the key I want to see. It emits the code 0x56 on press, and 0xd6 on release. I thought that would be the keycode (after conversion to decimal), but it is not.

My guess is that it is an issue with X11. In XQuartz, I tried the 4 combinations of setting and unsetting these two:

Follow system keyboard layout
Enable key equivalents under X11

but they did not change the way the grave/tilde key worked, i.e. it still emitted </>.

I also tried setting the keyboard locale:

$ localectl status System Locale: LANG=en_US.UTF-8 VC Keymap: us X11 Layout: us X11 Model: pc105+inet X11 Options: terminate: ctrl_alt_bksp$ localectl set-keymap us-mac$ localectl status System Locale: LANG=en_US.UTF-8 VC Keymap: us-mac X11 Layout: us X11 Model: pc105+inet X11 Options: terminate: ctrl_alt_bksp

but that did nothing, either.

2017-10-19

Setting up a software RAID with sgdisk and mdadm

I wanted to set up a RAID0 (striped array) on two HDDs to servce as cache for Duplicity backups. And I wanted to use GPT and only command line tools: sgdisk(8) for partitioning, and mdadm(8) for creating the software RAID. (I have usually just used Gparted, a GUI partitioning tool.)

All of this was done on Red Hat Enterprise Linux 6.5.

So, I have two (spinning disc) HDDs, each 931 GB, mounted as

/dev/sda
/dev/sdb

First, zap any partitioning information they may have. (In all the examples below, "X" should be replaced by "a" or "b".)

# sgdisk -Z /dev/sdX

Next, partition them. The partitions have to be of type 0xFD00 "Linux RAID". You can do "sgdisk -L" to see a list of all available types. These type codes are not the same as the type codes used by fdisk(8).

The partitions will be 512 GB, leaving some for other uses.

# sgdisk -n 0:0:+512G -c 0:"cache" -t 0:0xFD00 /dev/sdX
# sgdisk -n 0:0:0 -c 0:"misc" /dev/sdX

The "0" first digit of the argument to "-n", "-c", and "-t" is shorthand for the first available partition number. In this case, the first line would be "1" and the second line would be "2". (N.B. this is automatically set; "1" and "2" do not need to be used in the commandline.)

In the second line, note that "-n 0:0:0" uses the default of starting at the first unallocated sector, and ending at the last allocateable sector on the drive, thereby using up the rest of the HDD for the "misc" partition 2. Leaving out the type specification, "-t", gives the default 0x8300 "Linux filesystem."

Print out the partition info to check:

# sgdisk -p /dev/sdX
Disk /dev/sdX: 1953525168 sectors, 931.5 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): xxxxxxxxxxxxxxx-xxx-xxxxxxxxxxxxxxxx
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 1953525134
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)

Number Start (sector)    End (sector) Size       Code Name
   1            2048      1073743871   512.0 GiB   FD00 cache
   2      1073743872      1953525134   419.5 GiB   8300 misc

And we see that it has done what we expected.

Next, we create the RAID0 from /dev/sda1 and /dev/sdb1:

# mdadm -v -C /dev/md/mycomputer:cache -l stripe -n 2 /dev/sda1 /dev/sdb1

This creates a device /dev/md127 with a symbolic link for the readable name:

lrwxrwxrwx 1 root root 8 2017-10-18 18:11:31 -0400 /dev/md/mycomputer:cache -> ../md127

In mdadm v3.3.4 in RHEL6, I found that the name given to the "-C/--create" option would always end up being "mycomputer:something", where "something" was determined by what you actually give it. This name comes up after reboot.

The "-v" is for verbose output, "-l" is for the RAID level (which can be specified by integer, or string), "-n" is the number of devices, and the positional arguments are a list of the devices to be used.

Also, the integer N in /dev/mdN is determined by the system. It seems to start with 127.

For instance, doing

# mdadm -C /dev/md0

after rebooting gave this:

lrwxrwxrwx 1 root root 8 2017-10-18 18:11:31 -0400 /dev/md/mycomputer:0 -> ../md127

And doing

# mdadm -C /dev/md/cache

gave

lrwxrwxrwx 1 root root 8 2017-10-18 18:11:31 -0400 /dev/md/mycomputer:cache -> ../md127

I wised up on my third time through, and named it what it was going to pick, anyway.

The RAID needs to be "assembled" and activated at boot time. This is not done by default. To do this, a file /etc/mdadm.conf must be created. (Other distros may have a different location for this file.)

Assuming there is no such file, start by using mdadm(8) to output the array specification to the file:

# mdadm -Ds /dev/md/mycomputer:cache > /etc/mdadm.conf
# cat /etc/mdadm.conf
ARRAY /dev/md/mycomputer:cache metadata=1.2 name=mycomputer:cache UUID=xxxxxxxx

Very important: this UUID will not be the same as the UUID of the filesystem we will create later.

Add DEVICE, MAILADDR, and AUTO lines to /etc/mdadm.conf, resulting in:

DEVICE /dev/sda1 /dev/sdb1
MAILADDR myname@myemail.net
AUTO +all
ARRAY /dev/md/mycomputer:cache metadata=1.2 name=mycomputer:cache UUID=xxxxxxxx

I did the next bit in single-user mode as I wanted this to be mounted as /var/cache, which is also used by several other things. Also, since it gets tiresome writing out the whole device name, I used the short name /dev/md127.

# telinit 1
# mkfs.ext4 /dev/md127

Next, I mounted the device in a temporary location to transfer the existing contents:

# mkdir /mnt/tmpmnt
# mount /dev/md127 /mnt/tmpmnt
# cd /var/cache
# tar cf - * | ( cd /mnt/tmpmnt ; tar xvf - )

Get the UUID of this new filesystem for use in /etc/fstab:

# blkid /dev/md127
yyyyyyyyy-yyyyyyyy-yyyyyyyyyy-yyyyyyyyy

And create an entry in /etc/fstab:

UUID=yyyyyyyyy-yyyyyyyy-yyyyyyyyyy-yyyyyyyyy /var/cache ext4   defaults    0 2

And reboot!