2014-07-01

Using the NVIDIA Python plugin for Ganglia monitoring under Bright Cluster Manager

The github repo for Ganglia gmond Python plugins contains a plugin for monitoring NVIDIA GPUs. This presumes that the NVIDIA Deployment Kit, which contains the NVML (management library), is installed via the normal means into the usual places. If you are using Bright Cluster Manager, you would have used Bright's cuda60/tdk to do the installation. That means that the libnvidia-ml.so library is not in one of the standard library directories. To fix it, just modify the /etc/init.d/gmond init script. Near the top, modify the LD_LIBRARY_PATH:
export LD_LIBRARY_PATH=/cm/local/apps/cuda/libs/current/lib64
The modifications to Ganglia Web, however, are out of date. I will make another post once I figure out how to do modify Ganglia Web to display the NVIDIA metrics.

UPDATE: Well, turns out there seems to be no need to modify the Ganglia Web installation. Under the host view, there is a tab for "gpu metrics" which shows 22 available metrics.

2014-06-11

root cron jobs and /etc/security/access.conf

On RHEL6, if your root cron jobs do not run, check your /var/log/secure file for lines that look like:
crontab: pam_access(crond:account): access denied for user `root' from `cron'
You may also see the following message when, as root, you type "crontab -e":
Permission deniedYou (root) are not allowed to access to (crontab) because of pam configuration.

If there are any like that, check /etc/security/access.conf -- you need to allow root access via cron and crond by adding the following line:
+ : root : cron crond 

2014-06-09

Keepass, Google Drive, and multi-OS access

UPDATE 2014-06-11: The kdbx file seems to work fine on Dropbox. Luckily, I had a backup there.

If you store your Keepass DB file on Google Drive plus InSync in order to access it via Linux, Windows, and Android, you might want to stop doing so. My Keepass file seems to be irretrievably corrupted.

2014-06-06

More on SSSD - getting finger(1) and command completion of ~ to work

I noticed after getting SSSD up and running that the finger(1) command no longer worked, and neither did command completion of ~username. In the first case, finger(1) never found any users, no matter if I used the exact username. In the second case, if I typed at the command line cd ~d<tab>, it would not expand to a list of possibilities. However, cd ~david worked just fine.

Turns out, there needs to be one setting in /etc/sssd/sssd.conf:


[domain/default]
    ...
    enumerate = True

That allows a local precache to be created so that finger(1) can iterate over user info to find a matching record.

Then, restart the sssd service.

Useful links:

2014-05-14

SSSD setup in a Bright Cluster

UPDATE 2015-05-28: Also, on the master node, the User Portal web application needs a setting in /etc/openldap/ldap.conf:

    TLS_REQCERT      never


At my current job, we use Bright Cluster Manager and Univa Grid Engine on RHEL 6.5. We were seeing issues where submitted jobs ended up in an "Error" state, especially if many jobs were submitted in a short period, either an array or a shell script loop running qsub iteratively. The error reason was:

    can't get password entry for user "juser". Either the user does not exist or NIS error!

However, logging into the assigned compute node and running "id" or even some C code to do user lookups passed.

By default, our installation used nslcd for LDAP lookups. Univa suggested switching to SSSD (System Security Services Daemon) as Red Hat had phased out nslcd. The Fedora site has a good overview.

The switch to using SSSD turned out to be fairly easy, with some hidden hiccups. Running authconfig-tui and keeping the existing settings, and then hitting "OK" immediately turned off nslcd and started up sssd, instead. All the attendant changes were made, too: chkconfig settings, /etc/nsswitch.conf. However, I found that users could not change passwords on the login nodes. They could login, but the passwd command failed with "system offline".

Turns out, SSSD requires an encrypted connection to the LDAP server for password changes. This is a security requirement so that the new password is not sent in the clear from the client node to the LDAP server. (See this forum post by sgallagh.) This means an SSL certificate needs to be created. Self-signed will work if the following line is added to /etc/sssd/sssd.conf:

     [domain/default]
     ...
     ldap_tls_reqcert = never

To create the self-signed cert:
     root # cd /etc/pki/tls/certs
     certs # make slapd.pem
     certs # chown ldap:ldap slapd.pem

Then, edit /cm/local/apps/openldap/etc/slapd.conf to add the following lines:
    TLSCACertificateFile /etc/pki/tls/certs/ca-bundle.crt
    TLSCertificateFile /etc/pki/tls/certs/slapd.pem
    TLSCertificateKeyFile /etc/pki/tls/certs/slapd.pem

Also, make sure there is a section - my config did not have access to shadowLastChange:
    access to attrs=loginShell,shadowLastChange
        by group.exact="cn=rogroup,dc=cm,dc=cluster" read
        by self write
        by * read

Then, restart the ldap service.

UPDATE Adding some links to official Red Hat documentation: https://access.redhat.com/solutions/42746

2014-04-23

Grid Engine Job Submission Verifier (JSV) using Go

UPDATE 2018-01-11 Please see the updated post here.

UPDATE Well, this does not quite work the way I want, due to my sketchy understanding of core binding. It looks like core binding only works for jobs confined to a single execute host. If a job spans more than one, the "-binding striding:32:1" option will prevent the job from running on 2 nodes with 16 slots each. The correct option should be "-binding striding:16:1"

I have a job which wants 32 slots, which can only be satisfied by using 2 hosts with 16 slots each. If I set "-binding striding:32:1", the job fails to be scheduled because "cannot run on host ... because it offers only 16 core(s), but 32 needed".

What seems to work is to specify only the number available per host, i.e. "-binding striding:16:1" Or, perhaps, "-binding pe striding:16:1".

Daniel Gruber at Univa wrote a Go API for Univa Grid Engine job submission verifiers (JSV). His testing indicated that it was a fair bit faster than TCL or Perl, the recommended JSV languages for a production environment.

I decided it was a good enough time as any to dabble a little in Go, seeing as I had a simple problem to solve. Users occasionally make mistakes, and submit parallel jobs without requesting a parallel environment with the appropriate number of slots. It could be that they missed the PE line, and a job is assigned only one slot, but ends up actually using 8 (or 16, or whatever). This means the execute host(s) are over-subscribed when other jobs are also scheduled on those same hosts.

I also wanted to take advantage of Univa's new support for cgroups in order to make sure jobs are restricted in terms of their CPU and memory usage. It also helps with process cleanup when the jobs complete (cleanly or not).

This is pretty straightforward to do. Check the job qsub parameters/options, and set binding appropriately. The source is at my github.

/*
 * Requires https://github.com/dgruber/jsv
 */

package main

import (
    "github.com/dgruber/jsv"
    "strings"
    "strconv"
)

func jsv_on_start_function() {
    //jsv_send_env()
}

func job_verification_function() {

    //
    // Prevent jobs from accidental oversubscription
    //
    const intel_slots, amd_slots = 16, 64

    var modified_p bool = false
    if !jsv.JSV_is_param("pe_name") {
        jsv.JSV_set_param("binding_strategy", "linear_automatic")
        jsv.JSV_set_param("binding_type", "set")
        jsv.JSV_set_param("binding_amount", "1")
        jsv.JSV_set_param("binding_exp_n", "0")
        modified_p = true
    } else {
        if !jsv.JSV_is_param("binding_strategy") {
            var pe_max int
            var v string
            v, _ = jsv.JSV_get_param("pe_max")
            pe_max, _ = strconv.Atoi(v)

            var hostlist string 
            hostlist, _ = jsv.JSV_get_param("q_hard")
            hostlist = strings.SplitAfterN(hostlist, "@", 2)[1]

            jsv.JSV_set_param("binding_strategy", "striding_automatic")
            jsv.JSV_set_param("binding_type", "pe")

            if strings.EqualFold("@intelhosts", hostlist) {
                if pe_max < intel_slots {
                    jsv.JSV_set_param("binding_amount", strconv.Itoa(pe_max))
                } else {
                    jsv.JSV_set_param("binding_amount", strconv.Itoa(intel_slots))
                }
            } else if strings.EqualFold("@amdhosts", hostlist) {
                if pe_max < amd_slots {
                    jsv.JSV_set_param("binding_amount", strconv.Itoa(pe_max))
                } else {
                    jsv.JSV_set_param("binding_amount", strconv.Itoa(amd_slots))
                }
            }

            jsv.JSV_set_param("binding_step", "1")
            modified_p = true
        }
    }

    if modified_p {
        jsv.JSV_correct("Job was modified")

        // show qsub params
        jsv.JSV_show_params()
    }

    return
}

/* example JSV 'script' */
func main() {
    jsv.Run(true, job_verification_function, jsv_on_start_function)
}

2014-03-25

Exclusive host access in Grid Engine

UPDATE - 2014-03-28: Univa Grid Engine 8.1.7 (and possibly earlier) has a simpler way to set this up. One just needs to define the "exclusive" complex, without setting up a separate exclusive queue with a forced complex:

#name       shortcut   type  relop   requestable consumable default  urgency 
exclusive   excl       BOOL  EXCL    YES         YES        0        1000

Unfortunately, I just discovered a slight deficiency with this approach. That complex must be attached to specific hosts. This means modifying each exec host using "qconf -me hostname".

ORIGINAL POST BELOW:
Here at Drexel's URCF, we use Univa Grid Engine (Wikipedia article). One of the requirements that frequently comes up is for jobs to have exclusive access to the compute hosts that the jobs occupy. A common reason is that a job may need more memory per process than is available on any single host.

Some resource managers and schedulers like Torque allow one to reserve nodes exclusively. In Grid Engine (GE), it is not a built-in feature. However, there is a way to accomplish the same thing. This post expands a little on Dan Gruber's blog post, for people like me who are new to GE.

Here, we assume there is only one queue named all.q. And we have two host groups: @intelhosts, and @amdhosts.

One can create a Boolean resource, a.k.a. complex, named "exclusive", which can be requested. That resource is forced to have the value TRUE in a new queue called exclusive.q so that only jobs that requests "exclusive" will be sent to that queue.

exclusive    excl   BOOL  ==  FORCED  NO     0    0

Once the complex is created, create a new queue named exclusive.q which spans all hosts, with a single slot per host. Set it to be subordinate to all.q -- this means that if there are any jobs in all.q on a host, exclusive.q on that host is suspended. And set the "exclusive" Boolean complex to be TRUE.


    qname     exclusive.q
    ...
    slots     1
    subordinate_list  all.q=1
    complex_values exclusive=TRUE
    ...

Modify all.q and set it subordinate to exclusive.q -- this ensures that if there is a job in exclusive.q on a host, all.q on that host is suspended:

    subordinate_list    exclusive.q=1

And now, to get a parallel job on Intel hosts, the job script would have something like this:

    #$ -l excl 
    #$ -q *@@intelhosts
    #$ -pe mvapich 128