Linux Follies: 2015

2015-12-03

dotGo 2015 - Rob Pike - Simplicity is Complicated

Rob Pike talks about how “simplicity” was designed into Go.

2015-11-12

docbook2x-texi

Building Git 2.6.3 from source, I ran into this error:

DB2TEXI user-manual.texi
/bin/sh: line 1: docbook2x-texi: command not found

This is the docbook to texi converter from the docbook2X package (which has not been updated since 2007).

For Red Hat and Fedora, the EPEL package repo provides docbook2X. However, the name of the script is changed because there is a newer docbook package. So, in the git source directory, edit the file Documentation/Makefile and change one line:

DOCBOOK2X_TEXI = db2x_docbook2texi

2015-11-11

SWIG is great - Python DRMAA2 interface in less than an hour

I have never used SWIG before, surprisingly. I figured creating a Python interface to DRMAA2 would be a good self-tutorial. Turns out to be almost trivial, following the directions here.

My Python (3.5) DRMAA2 interface code is on GitHub - https://github.com/prehensilecode/pydrmaa2. The hardest part, really, was writing the Makefile.

NOTE: no testing at all has been done. This is just a Q&D exercise to use SWIG.

2015-10-13

Autotools mythbuster

This is a really useful site by Diego Elio “Flameeyes” Pettenò that clarifies and provides good examples of autotools usage: https://autotools.io/index.html

2015-08-21

Modernized makefiles for VASP 5.3

One of the research groups on the cluster I manage requested help in compiling VASP 5.3 (the Vienna Ab-intitio Simulation Package), code for electronic structure calculations and QM molecular dynamics from first principles. Like many research codes, the build procedure is a bit difficult owing to legacy stuff.

I updated the makefiles for my build, to more current GNU Make standards. They worked for me; however, I make no guarantee that they will work for you. In fact, they will likely not work for you without modification. In any case, the repo is on GitHub. Fork away!

https://github.com/prehensilecode/vasp_makefiles_proteus

2015-08-06

Apache Spark integration with Grid Engine

Apache Spark is a fast engine for big data. It can use Hadoop infrastructure (like HDFS), and provides its own map-reduce implementation. It can also be run in standalone mode, without Hadoop or the YARN resource manager.

I have been able to get Spark 1.4.1 running, with some integration into an existing Univa Grid Engine cluster. The integration is not "tight" in that the slave processes are still independently launched with ssh. I was unable to get Spark to work with qrsh. So, without tight integration, usage accounting is not exact.

I also had to make some modifications to the Spark standalone shell scripts in order to have job-specific configuration and log directories. Out of the box, Spark's shell scripts do not completely propagate the environment to the slaves. Job-specific configuration and log directories are needed because multiple users may want to run Spark jobs at the same time.

Additionally, I was not able to figure a way to constrain Spark slave instances to subsets of available processor cores. So, Spark jobs require exclusive use of compute nodes.

So, let's start there. Your GE installation needs to have the "exclusive" complex defined:

#name shortcut type relop requestable consumable default urgency
#------------------------------------------------------------------------------------------

exclusive excl BOOL EXCL YES YES 0 1000

The OS on Drexel's Proteus cluster is RHEL 6.4-ish. I use Red Hat's packaging of Oracle Java 1.7.0_85 by default. Running Spark requires the JAVA_HOME environment variable to be set, which I do in the global login script location /etc/profile.d/. I found that using /usr/lib/jvm/java did not work. It needed to be:

JAVA_HOME=/usr/lib/jvm/java-1.7.0-oracle.x86_64

Building Spark 1.4.1 was painless. I used the bundled script to generate a binary distribution tarball:

./make-distribution.sh --name myname --tgz

Untar it into some convenient location.

Next, the sbin/start-slaves.sh and sbin/stop-slaves.sh scripts need to be modified. You can look at my fork at GitHub. As they are, these two scripts just ssh to all the slave nodes to start the slave processes. However, ssh does not pass environment variables, so all the slave processes launch with the default SPARK_HOME. That means all the slave processes read the global Spark config and environment, and log to the global Spark installation log directory.

Because the remote shell is the user shell, we have to figure out the user shell in order to build the command to be executed on the slave hosts. Here is the snippet from sbin/start-slaves.sh:

# Launch the slaves

USERSHELL=$( getent passwd $USER | cut -f7 -d: )

if [ $USERSHELL = "/bin/bash" -o $USERSHELL = "/bin/zsh" -o $USERSHELL = "/bin/ksh" ] ; then

"$sbin/slaves.sh" cd "$SPARK_HOME" \&\& "." "$SPARK_CONF_DIR/spark-env.sh" \&\& "$sbin/start-slave.sh" "spark://$SPARK_MASTER_IP:$SPARK_MASTER_PORT"

elif [ $USERSHELL = "/bin/tcsh" -o $USERSHELL = "/bin/csh" ] ; then

"$sbin/slaves.sh" cd "$SPARK_HOME" \&\& "source" "$SPARK_CONF_DIR/spark-env.csh" \&\& "$sbin/start-slave.sh" "spark://$SPARK_MASTER_IP:$SPARK_MASTER_PORT"

The cluster here has two types of compute nodes: Dell C6145s with 64-core AMD CPUs, and Dell C6220s with 16-core Intel CPUs. So, I created a job class (JC) with two subclasses, and also separate parallel environments (PEs).

The job class is as follows -- all missing lines have the default "{+}UNSPECIFIED" config:

jcname spark

variant_list default intel amd

owner NONE

user_lists NONE

xuser_lists NONE

...

l_hard {+}exclusive=TRUE,h_vmem=4g,m_mem_free=3g, \

[{+}intel=vendor=intel,h_vmem=4g,m_mem_free=3g], \

[{+}amd=vendor=amd,h_vmem=4g,m_mem_free=3g]

...

pe_name {~}spark.intel,[intel=spark.intel],[amd=spark.amd]

...

The spark.intel PE is defined as follows (with the spark.amd PE defined similarly):

pe_name spark.intel

slots 99999

user_lists NONE

xuser_lists NONE

start_proc_args /cm/shared/apps/sge/var/default/common/pescripts/sparkstart.sh

stop_proc_args NONE

allocation_rule 16

control_slaves FALSE

job_is_first_task FALSE

urgency_slots min

accounting_summary FALSE

daemon_forks_slaves FALSE

master_forks_slaves FALSE

The PE start script writes the job-specific environment files, and log4j properties file:

#!/bin/bash

spark_conf_dir=${SGE_O_WORKDIR}/conf.${JOB_ID}

/bin/mkdir -p ${spark_conf_dir}

### for bash-like

sparkenvfile=${spark_conf_dir}/spark-env.sh

echo "#!/usr/bin/env bash" > $sparkenvfile

echo "export JAVA_HOME=/usr/lib/jvm/java-1.7.0-oracle.x86_64" >> $sparkenvfile

echo "export SPARK_CONF_DIR=${spark_conf_dir}" >> $sparkenvfile

echo "export SPARK_MASTER_WEBUI_PORT=8880" >> $sparkenvfile

echo "export SPARK_WORKER_WEBUI_PORT=8881" >> $sparkenvfile

echo "export SPARK_WORKER_INSTANCES=1" >> $sparkenvfile

spark_master_ip=$( cat ${PE_HOSTFILE} | head -1 | cut -f1 -d\ )

echo "export SPARK_MASTER_IP=${spark_master_ip}" >> $sparkenvfile

echo "export SPARK_MASTER_PORT=7077" >> $sparkenvfile

echo "export MASTER_URL=spark://${spark_master_ip}:7077" >> $sparkenvfile

spark_slaves=${SGE_O_WORKDIR}/slaves.${JOB_ID}

echo "export SPARK_SLAVES=${spark_slaves}" >> $sparkenvfile

spark_worker_cores=$( expr ${NSLOTS} / ${NHOSTS} )

echo "export SPARK_WORKER_CORES=${spark_worker_cores}" >> $sparkenvfile

spark_worker_dir=/lustre/scratch/${SGE_O_LOGNAME}/spark/work.${JOB_ID}

echo "export SPARK_WORKER_DIR=${spark_worker_dir}" >> $sparkenvfile

spark_log_dir=${SGE_O_WORKDIR}/logs.${JOB_ID}

echo "export SPARK_LOG_DIR=${spark_log_dir}" >> $sparkenvfile

echo "export SPARK_LOCAL_DIRS=${TMP}" >> $sparkenvfile

chmod +x $sparkenvfile

### for csh-like

sparkenvfile=${spark_conf_dir}/spark-env.csh

echo "#!/usr/bin/env tcsh" > $sparkenvfile

echo "setenv JAVA_HOME /usr/lib/jvm/java-1.7.0-oracle.x86_64" >> $sparkenvfile

echo "setenv SPARK_CONF_DIR ${spark_conf_dir}" >> $sparkenvfile

echo "setenv SPARK_MASTER_WEBUI_PORT 8880" >> $sparkenvfile

echo "setenv SPARK_WORKER_WEBUI_PORT 8881" >> $sparkenvfile

echo "setenv SPARK_WORKER_INSTANCES 1" >> $sparkenvfile

spark_master_ip=$( cat ${PE_HOSTFILE} | head -1 | cut -f1 -d\ )

echo "setenv SPARK_MASTER_IP ${spark_master_ip}" >> $sparkenvfile

echo "setenv SPARK_MASTER_PORT 7077" >> $sparkenvfile

echo "setenv MASTER_URL spark://${spark_master_ip}:7077" >> $sparkenvfile

spark_slaves=${SGE_O_WORKDIR}/slaves.${JOB_ID}

echo "setenv SPARK_SLAVES ${spark_slaves}" >> $sparkenvfile

spark_worker_cores=$( expr ${NSLOTS} / ${NHOSTS} )

echo "setenv SPARK_WORKER_CORES ${spark_worker_cores}" >> $sparkenvfile

spark_worker_dir=/lustre/scratch/${SGE_O_LOGNAME}/spark/work.${JOB_ID}

echo "setenv SPARK_WORKER_DIR ${spark_worker_dir}" >> $sparkenvfile

spark_log_dir=${SGE_O_WORKDIR}/logs.${JOB_ID}

echo "setenv SPARK_LOG_DIR ${spark_log_dir}" >> $sparkenvfile

echo "setenv SPARK_LOCAL_DIRS ${TMP}" >> $sparkenvfile

chmod +x $sparkenvfile

/bin/mkdir -p ${spark_log_dir}

/bin/mkdir -p ${spark_worker_dir}

cat ${PE_HOSTFILE} | cut -f1 -d \ > ${spark_slaves}

### defaults, sp. log directory

echo "spark.eventLog.dir ${spark_log_dir}" > ${spark_conf_dir}/spark-defaults.conf

### log4j defaults

log4j_props=${spark_conf_dir}/log4j.properties

echo "### Suggestion: use "WARN" or "ERROR"; use "INFO" when debugging" > $log4j_props

echo "# Set everything to be logged to the console" >> $log4j_props

echo "log4j.rootCategory=WARN, console" >> $log4j_props

echo "log4j.appender.console=org.apache.log4j.ConsoleAppender" >> $log4j_props

echo "log4j.appender.console.target=System.err" >> $log4j_props

echo "log4j.appender.console.layout=org.apache.log4j.PatternLayout" >> $log4j_props

echo "log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n" >> $log4j_props

echo "# Settings to quiet third party logs that are too verbose" >> $log4j_props

echo "log4j.logger.org.spark-project.jetty=WARN" >> $log4j_props

echo "log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR" >> $log4j_props

echo "log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO" >> $log4j_props

echo "log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO" >> $log4j_props

And then an example job script looks like:

#!/bin/bash

#$ -S /bin/bash

#$ -P myprj

#$ -M myemail@example.com

#$ -m ea

#$ -j y

#$ -cwd

#$ -jc spark.intel

#$ -l exclusive

#$ -pe spark.intel 32

#$ -l vendor=intel

#$ -l h_rt=0:30:00

#$ -l h_vmem=4g

#$ -l m_mem_free=3g

. /etc/profile.d/modules.sh

module load shared

module load proteus

module load gcc

module load sge/univa

module load python/2.7-current

module load apache/spark/1.4.1

###

### Set up environment for Spark

###

export SPARK_CONF_DIR=${SGE_O_WORKDIR}/conf.${JOB_ID}

. ${SPARK_CONF_DIR}/spark-env.sh

###

### The actual work is done below

###

### Start the cluster: master first, then slaves

echo "Starting master on ${SPARK_MASTER_IP} ..."

start-master.sh

echo "Done starting master."

echo "Starting slave..."

start-slaves.sh

echo "Done starting slave."

### the script which does the actual computation is submitted to the

### standalone Spark cluster

echo "Submitting job..."

spark-submit --master $MASTER_URL wordcount.py

echo "Done job."

### Stop the cluster: slaves first, then master

echo "Stopping slaves..."

stop-slaves.sh

echo "Done stopping slaves"

echo "Stopping master..."

stop-master.sh

echo "Done stopping master."

And, that's it. I have not done extensive testing or benchmarking, so I don't know what the performance is like relative to an installation that runs on Hadoop with HDFS.

2015-08-03

Abaqus integration for Univa Grid Engine (update)

My last post about Abaqus integration with Univa Grid Engine (UGE) had one disadvantage: it did not use qrsh to launch the slave MPI processes. As a result, job resource usage accounting was inaccurate. To fix this, certain parallel environment (PE) settings need to be corrected, and the rsh command that Abaqus uses for launching MPI slaves needs to be set to the wrapper rsh script.

The PE settings which worked worked for me -- see also sge_pe(5)

pe_name abaqus
slots 99999
user_lists NONE
xuser_lists NONE
start_proc_args /cm/shared/apps/sge/var/default/common/pescripts/abaqus.py
stop_proc_args NONE
allocation_rule $round_robin
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary TRUE
daemon_forks_slaves TRUEmaster_forks_slaves FALSE

And the updated PE script (again, setting the mp_host_list is optional). The rsh command is actually the rsh wrapper shell script, which then calls qrsh.

#!/usr/bin/env python

import sys, os

### PE startup script aka prologue to set up Abaqus MPI "hostfile"

### Based on documented env file format

### http://www.simulia.com/support/v67/books/sgb67EF/default.htm?startat=ch04s01.html

machinefile = os.environ['PE_HOSTFILE']

abaqenvfile = "abaqus_v6.env"

machinelines = []

with open(machinefile, "ro") as mf:

for l in mf:

lsplit = l.split()

machinelines.append( [lsplit[0], int(lsplit[1])] )

with open(abaqenvfile, "wo") as envfile:

envfile.write("mp_mode=MPI\n")

envfile.write("mp_rsh_command='/cm/shared/apps/sge/univa/mpi/rsh -n -l %U %H %C'\n")

envfile.write("mp_host_list=%s\n" % (str(machinelines)))

2015-07-18

Jupyter (FKA iPython) project gets $6m funding

The Helmsley Charitable Trust, the Alfred P. Sloan Foundation, and the Gordon and Betty Moore Foundation just announced a $6m grant to UC Berkeley and Cal Poly to fund Project Jupyter. Jupyter evolved from the iPython project, abstracting the language-agnostic parts. It also serves as an interactive shell for Python 3, Julia, R, Haskell, and Ruby. I think the most notable thing it provides is a web-based GUI “notebook” similar to what has been available in Maple and Mathematica for a while. (Maybe Matlab, too: I have not used Matlab much.)

Correction: Jupyter serves as an interactive shell for a lot more than what I listed. Here is the full list.

2015-06-30

Leap second added

The leap second has been added, and my systems seem to not have barfed. In particular, I was wary of what the NFS and Lustre file servers and clients would do.

Jun 30 20:00:00 myserver ntpd[5973]: 0.0.0.0 061b 0b leap_event

2015-05-28

Ganglia module (kludge) to monitor temperature via IPMI

Since I don't have environmental monitoring in my server room, I used ipmitool to read my cluster nodes' on-board sensors to sort of get at the cold aisle ambient temperature. One should be able to see a list of available sensor readings with "ipmitool sdr" or "ipmitool sensor", the latter giving other associated parameters for the metrics, like alarm thresholds.

Since access to /dev/ipmi0 is restricted to root, my kludge was to create a root cron job which runs every N minutes, writing the appropriate value to a log file:

ipmitool -c sdr get "Inlet Temp" | cut -f2 -d, > $LOGFILE

Then, the Ganglia python module reads that file. I followed the example.py module which is distributed with Ganglia, and the associated example.pyconf.

The code is at my github repo.

2015-05-20

Logjam: another day, another https vulnerability

A vulnerability has just been discovered in https, specifically in the Diffie-Hellman key exchange. This arose from the old export restrictions set by the US, so that its law enforcement and security agencies could break encryption used by foreign entities. Ars Technica, as usual, has a good write-up.

The researchers who discovered the flaw have a dedicated website which gives pointers on what to do if you run a web server, or just a browser. They have a server scanner, or you can use the one at Qualys SSL Labs.

Ivan Ristić has some more detail on increasing the strength of DH on Apache. Unfortunately, it may not be supported by the version of Apache you happen to have running.

Ubuntu swap partitions across upgrades

I recently upgraded from 14.10 Utopic to 15.04 Vivid, and only just realized that the swap partition was not mounted. Had to reformat it, and update /etc/fstab since the UUID had changed, too.

2015-05-15

pylab confusions

There are three pylabs that one may encounter in using Python. Two have been around for a while, and the third just showed up less than a month ago.

The “real” pylab is the procedural interface to matplotlib, i.e. a MATLAB-like command line interface. It imports matplotlib.pyplot and numpy into a single namespace. You can use it from ipython’s prompt by calling the magic function “%pylab”. It is no longer recommended by the matplotlib people. The recommended way is to import with abbreviated namespace names, and use the qualified functions. For example:

import matplotlib.pyplot as pltimport numpy as np
x = np.linspace(0, 2, 100)
plt.plot(x, x, label='linear')plt.plot(x, x**2, label='quadratic')plt.plot(x, x**3, label='cubic')
plt.xlabel('x label')plt.ylabel('y label')
plt.title("Simple Plot")
plt.legend()
plt.show()

Then there is the idea/proposal by Keir Mierle to improve on the pylab idea of a single package one might use to utilize Python for interactive analysis. This is written up in the SciPy wiki, but does not seem to have been updated since 2012.

And finally, if you are like me, and have not been thinking too hard, and typed “pip install pylab” you get this new package from PyPI, first added on 2015-04-23. It does nothing but pull in several other Python packages, i.e. it serves as a metapackage. You can see the source is basically a dummy, with all the action in the requirements defined in setup.py.

2015-03-17

Ganglia procstat.py fix to handle process names containing underscores

UPDATE: Accepted. Current version here.

This has bugged me for a while: Ganglia's Python module procstat.py which monitors process CPU and memory usage did not show any data for Grid Engine's qmaster, which has a process name of "sge_qmaster". Turns out, this is because it tries to parse out the process name by assuming it does not have underscores in it. This snippet is from the get_stat(name) function in procstat.py:

if name.startswith('procstat_'):
fir = name.find('_')
sec = name.find('_', fir + 1)
proc = name[fir + 1:sec]
label = name[sec + 1:]

I just submitted a pull request to change this to something which handles process names with some number of underscores. The snippet to replace the above:

if name.startswith('procstat_'):
nsp = name.split('_')
proc = '_'.join(nsp[1:-1])
label = nsp[-1]

2015-02-11

FlexLM and host names

If you ever get an error with lmstat like:

lmgrd is not running: License server machine is down or not responding. (-96,7:2 "No such file or directory")

but only from some machines outside your domain, check that the SERVER line in your license file specifies the FQDN of the license server. The default is to use just the hostname.

SERVER myserver.mydom.com XXXXXXXXXXXX NNNNN

Most search hits on that error message say things about firewalls.

2015-02-09

Grid Engine PE script (prologue) for Abaqus

UPDATE: Well, a closer look at some of the files Abaqus generates during its run indicates that Abaqus (or, technically, Platform MPI) is aware of Grid Engine and can figure out the host list by itself.

Abaqus 6.13 uses Platform MPI, but also uses its own "environment file" for the MPI hostfile. (Search for "mp_host_list" at the official documentation.) So, I cooked up this PE script (aka prologue) to write the abaqus_v6.env file in the job directory:

#!/usr/bin/env python
import sys, os ### PE startup script to set up Abaqus MPI "hostfile"
### Based on documented env file format machinefile = os.environ['PE_HOSTFILE']
abaqenvfile = "abaqus_v6.env" machinelines = []
with open(machinefile, "ro") as mf:
for l in mf:
lsplit = l.split()
machinelines.append( [lsplit[0], int(lsplit[1])] ) with open(abaqenvfile, "wo") as envfile:
envfile.write("mp_mode=MPI\n")
envfile.write("mp_host_list=%s\n" % (str(machinelines)))