Linux Follies

2014-04-23

Grid Engine Job Submission Verifier (JSV) using Go

UPDATE 2018-01-11 Please see the updated post here.

UPDATE Well, this does not quite work the way I want, due to my sketchy understanding of core binding. It looks like core binding only works for jobs confined to a single execute host. If a job spans more than one, the "-binding striding:32:1" option will prevent the job from running on 2 nodes with 16 slots each. The correct option should be "-binding striding:16:1"

I have a job which wants 32 slots, which can only be satisfied by using 2 hosts with 16 slots each. If I set "-binding striding:32:1", the job fails to be scheduled because "cannot run on host ... because it offers only 16 core(s), but 32 needed".

What seems to work is to specify only the number available per host, i.e. "-binding striding:16:1" Or, perhaps, "-binding pe striding:16:1".

Daniel Gruber at Univa wrote a Go API for Univa Grid Engine job submission verifiers (JSV). His testing indicated that it was a fair bit faster than TCL or Perl, the recommended JSV languages for a production environment.

I decided it was a good enough time as any to dabble a little in Go, seeing as I had a simple problem to solve. Users occasionally make mistakes, and submit parallel jobs without requesting a parallel environment with the appropriate number of slots. It could be that they missed the PE line, and a job is assigned only one slot, but ends up actually using 8 (or 16, or whatever). This means the execute host(s) are over-subscribed when other jobs are also scheduled on those same hosts.

I also wanted to take advantage of Univa's new support for cgroups in order to make sure jobs are restricted in terms of their CPU and memory usage. It also helps with process cleanup when the jobs complete (cleanly or not).

This is pretty straightforward to do. Check the job qsub parameters/options, and set binding appropriately. The source is at my github.

/*
 * Requires https://github.com/dgruber/jsv
 */

package main

import (
    "github.com/dgruber/jsv"
    "strings"
    "strconv"
)

func jsv_on_start_function() {
    //jsv_send_env()
}

func job_verification_function() {

    //
    // Prevent jobs from accidental oversubscription
    //
    const intel_slots, amd_slots = 16, 64

    var modified_p bool = false
    if !jsv.JSV_is_param("pe_name") {
        jsv.JSV_set_param("binding_strategy", "linear_automatic")
        jsv.JSV_set_param("binding_type", "set")
        jsv.JSV_set_param("binding_amount", "1")
        jsv.JSV_set_param("binding_exp_n", "0")
        modified_p = true
    } else {
        if !jsv.JSV_is_param("binding_strategy") {
            var pe_max int
            var v string
            v, _ = jsv.JSV_get_param("pe_max")
            pe_max, _ = strconv.Atoi(v)

            var hostlist string 
            hostlist, _ = jsv.JSV_get_param("q_hard")
            hostlist = strings.SplitAfterN(hostlist, "@", 2)[1]

            jsv.JSV_set_param("binding_strategy", "striding_automatic")
            jsv.JSV_set_param("binding_type", "pe")

            if strings.EqualFold("@intelhosts", hostlist) {
                if pe_max < intel_slots {
                    jsv.JSV_set_param("binding_amount", strconv.Itoa(pe_max))
                } else {
                    jsv.JSV_set_param("binding_amount", strconv.Itoa(intel_slots))
                }
            } else if strings.EqualFold("@amdhosts", hostlist) {
                if pe_max < amd_slots {
                    jsv.JSV_set_param("binding_amount", strconv.Itoa(pe_max))
                } else {
                    jsv.JSV_set_param("binding_amount", strconv.Itoa(amd_slots))
                }
            }

            jsv.JSV_set_param("binding_step", "1")
            modified_p = true
        }
    }

    if modified_p {
        jsv.JSV_correct("Job was modified")

        // show qsub params
        jsv.JSV_show_params()
    }

    return
}

/* example JSV 'script' */
func main() {
    jsv.Run(true, job_verification_function, jsv_on_start_function)
}

2014-03-25

Exclusive host access in Grid Engine

UPDATE - 2014-03-28: Univa Grid Engine 8.1.7 (and possibly earlier) has a simpler way to set this up. One just needs to define the "exclusive" complex, without setting up a separate exclusive queue with a forced complex:

#name       shortcut   type  relop   requestable consumable default  urgency 
exclusive   excl       BOOL  EXCL    YES         YES        0        1000

Unfortunately, I just discovered a slight deficiency with this approach. That complex must be attached to specific hosts. This means modifying each exec host using "qconf -me hostname".

ORIGINAL POST BELOW:
Here at Drexel's URCF, we use Univa Grid Engine (Wikipedia article). One of the requirements that frequently comes up is for jobs to have exclusive access to the compute hosts that the jobs occupy. A common reason is that a job may need more memory per process than is available on any single host.

Some resource managers and schedulers like Torque allow one to reserve nodes exclusively. In Grid Engine (GE), it is not a built-in feature. However, there is a way to accomplish the same thing. This post expands a little on Dan Gruber's blog post, for people like me who are new to GE.

Here, we assume there is only one queue named all.q. And we have two host groups: @intelhosts, and @amdhosts.

One can create a Boolean resource, a.k.a. complex, named "exclusive", which can be requested. That resource is forced to have the value TRUE in a new queue called exclusive.q so that only jobs that requests "exclusive" will be sent to that queue.

exclusive    excl   BOOL  ==  FORCED  NO     0    0

Once the complex is created, create a new queue named exclusive.q which spans all hosts, with a single slot per host. Set it to be subordinate to all.q -- this means that if there are any jobs in all.q on a host, exclusive.q on that host is suspended. And set the "exclusive" Boolean complex to be TRUE.

    qname     exclusive.q
    ...
    slots     1
    subordinate_list  all.q=1
    complex_values exclusive=TRUE
    ...

Modify all.q and set it subordinate to exclusive.q -- this ensures that if there is a job in exclusive.q on a host, all.q on that host is suspended:

    subordinate_list    exclusive.q=1

And now, to get a parallel job on Intel hosts, the job script would have something like this:

    #$ -l excl 
    #$ -q *@@intelhosts
    #$ -pe mvapich 128

2014-02-19

Missing Adobe fonts

If you try to start up remote X11 application to display on your Ubuntu machine, and get errors complaining about missing fonts like:

-adobe-helvetica-medium-r-*-*-10-*-*-*-*-*-iso8859-1

here's how to fix it.

You have to install not just fonts, but also the font server. First, the font server:

sudo apt-get install xfs xfstt

and the fonts:

sudo apt-get install t1-xfree86-nonfree ttf-xfree86-nonfree ttf-xfree86-nonfree-syriac xfonts-75dpi xfonts-100dpi

And then, restart X11 on your Ubuntu machine, or just reboot.

How to get "normal" scrollbars in Ubuntu Unity

I hate the little scrollbar in Ubuntu because on a laptop, it's harder to get precise control of the cursor to hit the few-pixel-width area (the orange bit in the first image below).

To get "normal" scrollbars, type this command:

gsettings set com.canonical.desktop.interface scrollbar-mode normal

Should you want to get the Ubuntu-style scrollbar back, reset the parameter:

gsettings reset com.canonical.desktop.interface scrollbar-mode

2014-01-27

Nice online resource plus ebook for MPI

Wes Kendall from University of Tennessee, Knoxville has produced a website showing how to code in MPI, including installing an Amazon EC2 cluster. For portability, the site content is also available as a Kindle ebook for only US$5.

2014-01-15

Scripting Bright Cluster Manager

At my new position as Sr. SysAdmin at Drexel's University Research Computing Facility (URCF), we use Bright Cluster Manager. I am new to Bright, and I am finding it very nice, indeed. One of its best features is programmatic access via a Python API. In about half an hour, I figured out enough to modify the node categories of all the nodes in the cluster.

Node categories group nodes which have similar configurations and roles. Example configuration may be a list of remote filesystem mounts, and an example role may be Grid Engine compute node with 64 job slots. The cluster at URCF has 64-core AMD nodes and 16-core Intel nodes, so I created a category for each of these. Then, I needed to change the node categories from the default to the architecture-specific categories. The script below did it for the Intel nodes.

Google Blogger/Blogspot is a second-class citizen

I just browsed through old posts and realized that all the embedded images are now gone. This happened, I am guessing, due to the integration of Google+ with Blogger. Some of my old posts are now fairly useless without the images. Grrr.