Showing posts with label rhel. Show all posts
Showing posts with label rhel. Show all posts

2021-07-16

Podman and Shorewall

Bright Cluster Manager uses Shorewall to manage the various firewall rules on the head/management node. By default, this seems to prevent Podman and Docker from working right.

I am working through a simple example of running a pod with PostgreSQL and PGAdmin but the connection to the host port that forwards to the pgadmin container seemed to be blocked. Connection attempts using both curl and web browsers would hang.

There is additional configuration that needs to be done for Shorewall to work with Podman. Shorewall has instructions on making it work with Docker, and it seems to work for podman with minor modifications.

First, modify the systemd service to not clear firewall rules on service stop. Do:

sudo systemctl edit shorewall.service

which gives a blank file. Add these contents:

[Service]

# reset ExecStop

ExecStop=

# set ExecStop to "stop" instead of "clear"

ExecStop=/sbin/shorewall $OPTIONS stop

Then activate the changes with

sudo systemctl daemon-reload

Next, we need to know the name of the Podman network interface. Use “ip link list” to see it. On my RHEL 8 system, the interface is 

10: cni-podman0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000

    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff

And make the following modifications to the appropriate config files.

Enable Docker mode in /etc/shorewall/shorewall.conf:

DOCKER=Yes

Define a zone for Podman in /etc/shorewall/zones:

#ZONE    TYPE    OPTIONS

pod     ipv4    # 'pod' is just an example -- call it anything you like

Define policies for this zone in /etc/shorewall/policy:

#SOURCE        DEST        POLICY        LEVEL 

pod            $FW          REJECT

pod            all          ACCEPT

And match the zone to the interface in /etc/shorewall/interfaces:

# Need to specify "?FORMAT 2" 

?FORMAT 2

#ZONE  INTERFACE    OPTIONS

pod    cni-podman0  bridge   # Allow ICC (inter-container communication); bridge implies routeback=1

Then, restart shorewall. And start the pod; or restart if it was already running.

You many need additional rules to allow an external host to connect into the pod. E.g. a pod containing a pgadmin container and a postgresql container, where the pgadmin container serves on port 80. Say your administrative hosts will be in the address block 10.0.10.0/23. Then, add the following to /etc/shorewall/rules:

# Accept connections from admin hosts to the pgadmin container

# ACTION  SOURCE              DEST   PROTO   DEST

#                                            PORT(S)

ACCEPT    net:10.0.10.0/23    pod    tcp     80

2021-01-28

MediaWiki with PostgreSQL using Buildah and Podman on RHEL7

NOTE I started working on this some months ago, and then had to stop working on it due to other stuff coming up. So, this is an incomplete example. Most of it is tested, and should work, but I suggest working on this in a throwaway VM as an instructional example. I am posting as is mainly for my own reference. I may update it along the way as I get time to work through it in more detail.

This is a “port” of the Examining container performance on RHEL 8 with PCP and pmda-podman example at the Red Hat Blog to RHEL7. Except that the focus here will be more on getting PostgreSQL, Apache, and MediaWiki running, rather than the performance analysis.

Performance Co-Pilot (PCP) for podman does not seem to provide a podman monitoring feature, so we will not be doing that part of the example.

The Red Hat example uses RHEL8, and there are enough differences with RHEL7 that the Red Hat example cannot be used directly.

We also update to MediaWiki 1.34.2 since the 1.32 series is no longer supported.

Red Hat has a podman command line reference. (It's part of their RHEL8 documentation.) For an overview of Podman and Buildah, this post at the Red Hat Developers Blog is good.

What we will run:
  • RHEL 7.8
  • PostgreSQL 9.2.24-4.el7_8
  • Apache 2.4 (via Red Hat Software Collections)
  • PHP 7.3 (required by MediaWiki; via Red Hat Software Collections)
  • MediaWiki 1.34.2
There will be two containers:
  • one with PostgreSQL
  • another with Apache, PHP, and MediaWiki
NOTE do not use tmux on your host machine to work through this example since we will need to use tmux in one of the containers. But if you know how to handle nested tmux sessions, go for it.

CAUTION there are official PostgreSQL container images from Red Hat. They should be already set up such that the kluges below (modifying the postgresql-setup script, and PostgreSQL config files) should not be necessary. See this one for PostgreSQL 10 on RHEL8. Do "podman search postgresql" to see what is available.

In the following, the prompts will indicate which machine or container we are on: the host machine will have a prompt "[root@host ~]#" The containers will have some arbitrary string of hexadecimal digits as the hostname. However, for clarity, this example will use the container names, instead.

OUTLINE

  • Build two local images with buildah: one for PostgreSQL, one for Apache + PHP-FPM +  MediaWiki
  • Run containers using local images
  • Cleanup

BEFORE WE BEGIN

Here is a quick list of some of the commands that will be run in order, from getting an image and creating a container, showing all containers, removing the container, and removing the image:
  • container=$( buildah from image_url )
  • buildah containers
  • buildah rm $container
  • buildah rmi image_id

BUILD CONTAINERS

First of all, install buildah to manage container images, and podman to run them.

[root@host ~]# yum install buildah podman

Login to the Red Hat container registry -- you must have an existing Red Hat account:

[root@host ~]# buildah login registry.redhat.io

Logging in to the container registry allows us to download base images which our local images will be based on.

Our containers will use the RHEL7 image registry.access.redhat.com/rhel7 as a starting point.

PostgreSQL

We create a container based on the rhel7 image. Then, copy the repo file from the host to the image, and install postgresql-server (plus a few other packages).

[root@host ~]# container=$(buildah from registry.access.redhat.com/rhel7)
[root@host ~]# echo $container
rhel7-working-container
[root@host ~]# buildah copy $container /etc/yum.repos.d/redhat.repo \
    /etc/yum.repos.d/redhat.repo
1f302312276b6f60ca1189181159d8c8eba378d3ff76a6aff651220c8f8250f2


Run a shell in the container to install PostgreSQL and some other packages:
 
[root@host ~]# buildah run $container /bin/bash
[root@psql /]# yum -y install postgresql-server tmux psmisc nc vim
...  
Complete!
[root@psql /]# yum -y update
Loaded plugins: ovl, product-id, search-disabled-repos, subscription-manager
No packages marked for update
[root@psql /]# yum clean all
Loaded plugins: ovl, product-id, search-disabled-repos, subscription-manager
Cleaning repos: rhel-7-server-extras-rpms rhel-7-server-optional-rpms rhel-7-server-rpms rhel-server-rhscl-7-rpms


Next, modify the postgresql-setup2 script because the container will not be using systemd. In general, systemd cannot run in containers.

[root@psql /]# cp /usr/bin/postgresql-setup \
    /usr/bin/postgresql-setup2

Edit /usr/bin/postgresql-setup2: Comment out (or delete) lines 111-113 which define the PGDATA variable. In its place, add this at line:

PGDATA=/var/lib/pgsql/data

This defines the location of the PostgreSQL config and data files.

Next, comment out (or delete) lines 119-121 which define the PGPORT variable. Replace it with this at line 122:

PGPORT=5432

This defines the port number that PostgreSQL will respond on.

Then, as the "postgres" user, do the PostgreSQL setup:

[root@psql /]# su - postgres
-bash-4.2$ /usr/bin/postgresql-setup2 initdb  
Initializing database ... OK
-bash-4.2$ exit

Fix up the PostgreSQL server config: modify the authentication method, and the network addresses on which to listen:

[root@psql /]# sed -i 's/^host/#host/' /var/lib/pgsql/data/pg_hba.conf
[root@psql /]# echo "host all all all md5" >> /var/lib/pgsql/data/pg_hba.conf
[root@psql /]# echo "listen_addresses = '*'" >> /var/lib/pgsql/data/postgresql.conf
[root@psql /]# exit    # exit container

On the host, configure the PostgreSQL container to run postmaster as the postgres user on startup:

[root@host ~]# buildah config --cmd "su - postgres -c \
        '/usr/bin/postmaster -D /var/lib/pgsql/data'" $container

Commit the image to the local repository, as “localhost/postgres-test”:

[root@host ~]# buildah commit $container localhost/postgres-test
Getting image source signatures
Copying blob cacea99e9a8c skipped: already exists
Copying blob f15a9d9f7ab3 skipped: already exists
Copying blob d3e8e97ad524 done
Copying config 7614d3233c done
Writing manifest to image destination
Storing signatures
7614d3233c71651cfba0ba4aa149424dd349db55bee18cf762aef7b37e691a31

See a list of images -- the one just created should appear:

[root@host ~]# buildah images
REPOSITORY                         TAG      IMAGE ID       CREATED              SIZE
localhost/postgres-test            latest   8d75ec494b55   About a minute ago   340 MB
registry.access.redhat.com/rhel7   latest   1a9b6d0a58f8   6 weeks ago          215 MB

Run the newly-created container detached, i.e. in the background:

[root@host ~]# podman run -p 5432:5432 --name psql \
    --hostname psql --detach postgres-test
...outputs container id...

Check that it is running:
 
[root@host ~]# podman ps
CONTAINER ID  IMAGE                           COMMAND               CREATED        STATUS            PORTS                   NAMES
8651efee175f  localhost/postgres-test:latest  su - postgres -c ...  4 seconds ago  Up 4 seconds ago  0.0.0.0:5432->5432/tcp  psql

“Login” to the running psql container and set up PostgreSQL account and db for the wiki:

[root@host ~]# podman exec --interactive --tty psql bash
[root@psql ~]# su - postgres
[postgres@psql ~]$ createuser -S -D -R -P -E wikiuser # remember the password you use here
[postgres@psql ~]$ createdb -O wikiuser wikidb
[postgres@psql ~]$ exit # exit user postgres
[root@psql ~]# exit # exit container

Now, in the host system, connect to the running container’s PostgreSQL server, and set up the database for the wiki. The PostgreSQL server is a container running on localhost 127.0.0.1 and the host’s port 5432 is mapped to the container. Remember the db name (wikidb), the db user name (wikiuser), and the password that you use.

[root@host ~]# psql -h 127.0.0.1 -W wikidb wikiuser
Password for user wikiuser: 
psql (9.2.24)
Type "help" for help.

wikidb=> 

That is all for the PostgreSQL setup.

Apache HTTPD, PHP, and MediaWiki

Next, make another container for Apache + PHP + MediaWiki.  This runs httpd and php-fpm on the same container. It should also be possible to run php-fpm on a separate container.

[root@host ~]# container=$( buildah from \
        registry.access.redhat.com/rhel7)
[root@host ~]# echo $container
rhel7-working-container-1

MediaWiki requires PHP >= 7.2.9. However, it is NOT compatible with PHP 7.4.0 to 7.4.2 due to an upstream issue.

Because we need PHP 7, we could get it from EPEL.  You can copy the epel.repo file just as you did with the redhat.repo file in the PostgreSQL container, above.

Alternatively, install from Red Hat Software Collections. This makes things a little more complicated than using EPEL, but not terribly so. Some guidance here. To do this, we also need to use httpd24 from the Software Collections.

[root@host ~]# buildah copy $container /etc/yum.repos.d/redhat.repo \
    /etc/yum.repos.d/redhat.repo
1f302312276b6f60ca1189181159d8c8eba378d3ff76a6aff651220c8f8250f2

Do this if you want to use EPEL:

[root@host ~]# buildah copy $container /etc/yum.repos.d/epel.repo \
    /etc/yum.repos.d/epel.repo
15a7fc2ebe4c5260256294d2c890bc1ccb5f8097b1a25aa0c38f9b996fa5fc5b

Run a bash inside the container, and install Apache, PHP, MediaWiki; httpd24 is the Apache httpd 2.4 from the Software Collections:

[root@host ~]# buildah run $container -- /usr/bin/bash
[root@apache /]# yum install -y  wget less procps-ng lsof psmisc \
    tmux openssl httpd24 httpd24-httpd httpd24-mod_ssl

Install PHP-7.3 from Software Collections: 

[root@apache /]# yum install -y rh-php73 rh-php73-php \
    rh-php73-php-gd rh-php73-php-gmp rh-php73-php-intl \
    rh-php73-php-mbstring rh-php73-php-pgsql rh-php73-php-opcache \
         rh-php73-php-fpm

Check PHP version:

[root@apache tmp]# scl enable rh-php73 /bin/bash
[root@apache tmp]# which php
/opt/rh/rh-php73/root/usr/bin/php
[root@apache tmp]# php --version
PHP 7.3.11 (cli) (built: Oct 31 2019 08:30:29) ( NTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.3.11, Copyright (c) 1998-2018 Zend Technologies
    with Zend OPcache v7.3.11, Copyright (c) 1999-2018, by Zend Technologies


Update the tzdata package to address a possible bug:

[root@apache tmp]# yum update -y tzdata

Download and install MediaWiki into /opt/rh/httpd24/root/var/www/html/testwiki:

[root@apache tmp]# wget https://releases.wikimedia.org/mediawiki/1.34/mediawiki-1.34.2.tar.gz
[root@apache tmp]# cd /opt/rh/httpd24/root/var/www/html
[root@apache tmp]# tar xvf /tmp/mediawiki-1.34.2.tar.gz
[root@apache tmp]# mv mediawiki-1.34.2 testwiki
[root@apache tmp]# exit  # exits the rh-php73 environment
[root@apache tmp]# exit  # exits the container

Commit the container image as apache-test:

[root@host ~]# buildah commit $container localhost/apache-test

Here, will break from the Red Hat Blog example. That example runs httpd and php-fpm in the foreground. Here, we will run them in the background.

But first, SSL setup. As with the PostgreSQL service, systemctl cannot be used. Usually, the first time systemctl starts up Apache, it will also generate SSL certs. We need to do this manually. Enter appropriate information when prompted:

[root@host ~]# buildah run $container -- /usr/bin/bash
[root@apache ~]# openssl req -new -newkey rsa:4096 > new.cert.csr
[root@apache ~]# openssl rsa -in privkey.pem -out new.cert.key
[root@apache ~]# openssl x509 -in new.cert.csr -out /etc/pki/tls/certs/localhost.crt \
-req -signkey new.cert.key -days 730
[root@apache ~]# cp new.cert.key /etc/pki/tls/private/localhost.key

[root@apache ~]# openssl req -new -newkey rsa:4096 > new.cert.csr
Generating a 4096 bit RSA private key
.............................++
......................................................................................................................................................................................................................................................++
writing new private key to 'privkey.pem'
Enter PEM pass phrase: ***
Verifying - Enter PEM pass phrase: ***
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [XX]:US
State or Province Name (full name) []:California
Locality Name (eg, city) [Default City]:Riverside
Organization Name (eg, company) [Default Company Ltd]:ACME Corp.
Organizational Unit Name (eg, section) []:IT
Common Name (eg, your name or your server's hostname) []:myservername
Email Address []:web@acmecorp.com

Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:
An optional company name []:
[root@apache /]# openssl rsa -in privkey.pem -out new.cert.key
Enter pass phrase for privkey.pem:
writing RSA key
[root@apache /]# openssl x509 -in new.cert.csr -out /etc/pki/tls/certs/localhost.crt \
> -req -signkey new.cert.key -days 730
Signature ok
subject=/C=US/ST=Pennsylvania/L=Philadelphia/O=Drexel University/OU=URCF/CN=urcfstora-apache/emailAddress=dwc62@drexel.edu
Getting Private key
[root@apache /]# cp new.cert.key /etc/pki/tls/private/localhost.key
cp: overwrite '/etc/pki/tls/private/localhost.key'? y
[root@apache /]# exit

Commit changes to the image:

[root@host /]# buildah commit $container localhost/apache-test

Next, start up httpd without daemonizing, and php-fpm (FastCGI Process Manager). Run a shell on apache-test, mapping the http and https ports. And, in that shell, use tmux to manage the two terminal sessions, one for each process.

[root@host /]# podman run -p 80:80 -p 443:443 -it --name apache --hostname apache apache-test /usr/bin/bash
[root@apache /]# tmux
[root@apache /]# scl enable httpd24 /bin/bash
[root@apache /]# which httpd
/opt/rh/httpd24/root/usr/sbin/httpd
[root@apache /]# httpd -DFOREGROUND
AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 10.88.0.9. Set the 'ServerName' directive globally to suppress this message

Make a note of that IP address 10.88.0.9. (We may or may not need this.)

Create a new terminal window to deal with php-fpm: type "Ctrl-b" then "c". Then, run php-fpm

[root@apache /]# scl enable rh-php73 /bin/bash
[root@apache /]# mkdir /run/php-fpm
[root@apache /]# php-fpm --nodaemonize
[22-Jun-2020 20:08:10] NOTICE: fpm is running, pid 120
[22-Jun-2020 20:08:10] NOTICE: ready to handle connections
[22-Jun-2020 20:08:10] NOTICE: systemd monitor interval set to 10000ms


XXX

To configure an entrypoint which runs more than one executable, we can to write a wrapper script. In our case, we need to run httpd and php-fpm. (Docker example of a wrapper script.)  Note that this is not the recommended way of doing things, which would be to have separate containers for httpd and php-fpm. 

/opt/rh/httpd24/root/usr/sbin/httpd

/opt/rh/rh-php73/root/usr/sbin/php-fpm

Configure entrypoints to run httpd24 and php-fpm:

[root@urcfstora tmp]# buildah config --entrypoint '["scl enable httpd24 /usr/sbin/httpd -DFOREGROUND", "scl enable rh-php73 php-fpm --nodaemonize"]' $container

Run: 

[root@urcfstora tmp]# podman run -p 80:80 -p 443:443 --name apache --hostname apache --detach  apache-test 
708bece1d46225f8628a680516df00ce66921673df4e1cc50f2953053f04af70
[root@urcfstora tmp]# podman ps -a
CONTAINER ID  IMAGE                           COMMAND               CREATED        STATUS                    PORTS                   NAMES
708bece1d462  localhost/apache-test:latest    /bin/bash             4 seconds ago  Exited (0) 3 seconds ago  0.0.0.0:80->80/tcp      apache
8651efee175f  localhost/postgres-test:latest  su - postgres -c ...  2 weeks ago    Up 2 weeks ago            0.0.0.0:5432->5432/tcp  psql


Open a new terminal on the host machine to examine the running containers:


[root@host ~]# podman ps
CONTAINER ID  IMAGE                           COMMAND               CREATED            STATUS                PORTS                   NAMES
e1b1270c4745  localhost/apache-test:latest    /usr/bin/bash         25 minutes ago     Up 25 minutes ago     0.0.0.0:80->80/tcp      apache
8651efee175f  localhost/postgres-test:latest  su - postgres -c ...  About an hour ago  Up About an hour ago  0.0.0.0:5432->5432/tcp  psql

Try to connect to the web server (unencrypted). Launch a web browser on another machine (your PC, or something not the host machine), and connect to the host machine (ignoring the self-signed certificate errors):

    https://host.acmecorp.com

podman will have automatically opened ports in the firewall.

For the MediaWiki container to connect to the PostgreSQL container, the PostgreSQL container's IP address needs to be known. Find it by doing:

[root@host]# podman inspect psql | egrep "10\."
            "Gateway": "10.88.0.1",
            "IPAddress": "10.88.0.8",

So, the "psql" container's IP is 10.88.0.8. We will need this address for the Mediawiki setup in the next step. Also, leave the port (5432) the same.

Now, fire up a web browser on your host, and browse itself. The httpd running in the container will respond, since we ran it mapping the appropriate http/https ports to the host ports:

     https://host.acemcorp.com/testwiki/

Follow the prompts to setup the wiki. Recall the wiki db name, db user name, and the password set up above.

At the end of that, you will be able to download the LocalSettings.php file, which you will then copy to the "apache" container.



Next, we mount the "apache" container, and copy MediaWiki's LocalSettings.php file to it:

[root@host]# apachemnt=$(podman mount apache)
[root@host]# cp /location/of/LocalSettings.php $apachemnt/opt/rh/httpd24/root/var/www/html/testwiki

Then, in your browser, click on that "enter your wiki" link. You should see something like this:


Test that you can create a new article:



PERFORMANCE CO-PILOT
Unfortunately, RHEL7 does not seem to provide PCP for Podman. 

DELETE CONTAINERS
When containers are not running, they may be deleted. First, get their container IDs, and then delete them:

[root@host ~]# podman ps --all
CONTAINER ID  IMAGE                           COMMAND               CREATED            STATUS                PORTS                   NAMES
b67d98b97ebd  localhost/apache-test:latest    /usr/bin/bash         About an hour ago  Up About an hour ago  0.0.0.0:80->80/tcp      apache
8651efee175f  localhost/postgres-test:latest  su - postgres -c ...  3 days ago         Up 3 days ago         0.0.0.0:5432->5432/tcp  psql
[root@host ~]# podman rm CONTAINER_ID

If you don't want the images that you built to hang around in your local storage, you can remove them. The "-f" option will also remove containers which use those images. (Use "buildah images" to see what images are in local storage.)

[root@host ~]# buildah images
REPOSITORY                         TAG      IMAGE ID       CREATED       SIZE
localhost/apache-test              latest   c4f284291b58   3 days ago    1.14 GB
localhost/postgres-test            latest   8d75ec494b55   3 days ago    340 MB
registry.access.redhat.com/rhel7   latest   1a9b6d0a58f8   6 weeks ago   215 MB

[root@host ~]# buildah rmi -f IMAGE_ID




2019-11-20

Even more about SSSD + PAM + LDAP -- password is still expired even right after being changed by user

This keeps coming back to haunt me, partly because of patchy and disparate documentation, and partly because I do not have a rock-solid understanding of all the details of SSSD + PAM + LDAP. (Previous post.)

This is for RHEL6.

Here is the issue: my users kept running into the instance when upon logging in, they were shown:

WARNING: Your password has expired.
You must change your password now and login again!
Changing password for foouser.
Current password:
And then it automatically logs you out, which is expected behavior.

However, when they login again (with the password that they just set), they are again presented with the same password expiration warning. This repeats ad infinitum.

When I check the OpenLDAP server, and ldapsearch for the user record, it does show that the password was changed by that user on the correct date.

The key bit that I seem to have missed: a setting in /etc/pam_ldap.conf You have to set the secure LDAP URI since SSSD password transmissions must be encrypted.
uri ldaps://10.9.8.7/
This should match the URI specified in /etc/openldap/ldap.conf
URI ldaps://10.9.8.7/
And the setting in /etc/sssd/sssd.conf

[domain/default]
...
ldap_uri = ldaps://10.9.8.7/
...

And that fixed it.

While you are at it, you might as well specify SHA512 for the hash in /etc/pam_ldap.conf
pam_password sha512
I RTFMed: "sha512" is not an option for pam_password. This is to hash the password locally, before passing on to the LDAP server. The default is "clear", i.e. transmit the password in the clear to the LDAP server, and assume the LDAP server will hash if necessary. Another option is "crypt" which uses crypt(3).
pam_password crypt
However, there does not seem to be a way to specify which hash algorithm is to be used.

I do not think this is a big issue because the connection to the LDAP server is encrypted, any way.

Why was this a surprise? Well, because in /etc/nsswitch.conf we specified sss as the source for the passwd, shadow, and group name services:

passwd:     files sss
shadow:     files sss
group:      files sss
I.e., everything should be mediated through SSSD, and the SSSD config does have the correct URI.

2014-12-29

Mellanox Infiniband network cards on Linux

Sometimes, when one updates the firmware for Mellanox Infiniband cards, the MAC/hardware address gets changed. This usually happens if the IB card is OEM, i.e. made by Mellanox but stamped with a different company's name.

When the MAC gets changed, the network interface will not come up. The fix is to update the HWADDR field in /etc/sysconfig/network-scripts/ifcfg-ib0 and /etc/sysconfig/network-scripts/ifcfg-ib1. Use "ip link list" to display the new MAC.

2014-12-16

RHEL 6.4 kernel 2.6.32-358.23.2, Mellanox OFED 2.1-1.0.6, and Lustre client 2.5.0

I am planning some upgrades for the cluster that I manage. As part of the updates, it would be good to have MVAPICH2 with GDR (GPU-Direct RDMA -- yes, that's an acronym of an acronym). MVAPICH2-GDR, which is provided only as binary RPMs, only supports Mellanox OFED 2.1.

Now, our cluster runs RHEL6.4, but with most non-kernel and non-glibc packages updated to whatever is in RHEL6.5. The plan is to update everything to whatever is in RHEL6.6, except for the kernel, leaving that at 2.6.32-358.23.2 which is the last RHEL6.4 kernel update. The reason for staying with that version of the kernel is because of Lustre.

We have a Terascala Lustre filesystem appliance. The latest release of TeraOS uses Lustre 2.5.0. Upgrading the server is pretty straightforward, according to the Terascala engineers. Updating the client is a bit trickier. Currently, the Lustre support matrix says that Lustre 2.5.0 is supported only on RHEL6.4.

The plan of attack is this:

  1. Update a base node with all RHEL packages, leaving the kernel at 2.6.32-358.23.2
  2. Upgrade Mellanox OFED from 1.9 to 2.1
  3. Build lustre-client-2.5.0 and upgrade the Lustre client packages

Updating the base node is straightforward. Just use "yum update", after commenting out the exclusions in /etc/yum.conf. If you had updated the <tt>redhat-release-server-6server<tt> package, which defines which RHEL release you have, you can downgrade it. (See RHEL Knowledgebase, subscription required.) First, install the last (as of 2014-12-15) RHEL6.4 kernel, and then do the downgrade:
# yum install kernel-2.6.32-358.23.2.el6
# reboot
# yum downgrade redhat-release-server-6Server

Check with "cat /etc/redhat-release".

Next, install Mellanox OFED 2.1-1.0.6. You can install it directly using the provided installation script, or if you are paranoid like me, you can use the provided script to build RPMs against the exact kernel update you have installed.

Get the tarball directly from Mellanox. Extract, and make new RPMs:
# tar xf MLNX_OFED_LINUX-2.1-1.0.6-rhel6.4-x86_64.tgz
# cd MLNX_OFED_LINUX-2.1-1.0.6-rhel6.4-x86_64
# ./mlnx_add_kernel_support.sh -m .
...
# cp /tmp/MLNX_OFED_LINUX-2.1-1.0.6-rhel6.4-x86_64-ext.tgz .
# tar xf MLNX_OFED_LINUX-2.1-1.0.6-rhel6.4-x86_64-ext.tgz
# cd MLNX_OFED_LINUX-2.1-1.0.6-rhel6.4-x86_64-ext
# ./mlnxofedinstall
# reboot

Strictly speaking, the reboot is unnecessary: you can stop and restart a couple of services and the new OFED will load.

Next, for Lustre. Get the SRPM from Intel (who bought WhamCloud). You will notice that it is for kernel 2.6.32-358.18.1. Not mentioned is the fact that by default, it uses the generic OFED that RedHat rolls into its distribution. To use the Mellanox OFED, a slightly different installation method must be used.

# rpm -Ivh lustre-client-2.5.0-2.6.32_358.18.1.el6.x86_64.src.rpm
# cd ~/rpmbuild/SOURCES
# cp lustre-2.5.0.tar.gz ~/tmp
# cd ~/tmp
# tar xf lustre-2.5.0.tar.gz
# cd lustre-2.5.0
# ./configure --disable-server --with-o2ib=/usr/src/ofa_kernel/default
# make rpms
# cd ~/rpmbuild/RPMS/x86_64
# yum install lustre-client-2.5.0-2.6.32_358.23.2.el6.x86_64.x86_64.rpm \
lustre-client-modules-2.5.0-2.6.32_358.23.2.el6.x86_64.x86_64.rpm \
lustre-client-tests-2.5.0-2.6.32_358.23.2.el6.x86_64.x86_64.rpm \
lustre-iokit-2.5.0-2.6.32_358.23.2.el6.x86_64.x86_64.rpm
To make the lustre module load at boot, I have a kludge: to /etc/init.d/netfs right after the line
STRING=$"Checking network-atttached filesystems"
add
modprobe lustre
Reboot, and then check:
# lsmod | grep lustre
lustre                921744  0
lov                   516461  1 lustre
mdc                   199005  1 lustre
ptlrpc               1295397  6 mgc,lustre,lov,osc,mdc,fid
obdclass             1128062  41 mgc,lustre,lov,osc,mdc,fid,ptlrpc
lnet                  343705  4 lustre,ko2iblnd,ptlrpc,obdclass
lvfs                   16582  8 mgc,lustre,lov,osc,mdc,fid,ptlrpc,obdclass
libcfs                491320  11 mgc,lustre,lov,osc,mdc,fid,ko2iblnd,ptlrpc,obdclass,lnet,lvfs