Virtualization Blog

Discussions and observations on virtualization.
Originally an astronomer by vocation and one of the early users of CCD technology for imaging space objects, major interests have long included image processing and data analysis along with programming and systems and application administration. Since 1994, a full-time employee in the IT department of Northern Arizona University (30k students, 4.5k faculty/staff) primarily involved in the academic area, and a Team Lead / Senior Programmer since 1998 as well as adjunct faculty in astronomy since 1988. Interests include computer security, any kind of student services (lab technology, distance education, authentication/authorization, providing applications as services, etc.), assisting faculty with special Web hosting projects (in particular involving LAMP), open source applications, server/desktop virtualization (since 2004), ubiquitous computing, storage technologies, IoT (especially Octoblu), and GPU/vGPU technologies. His team has run multiple XenServer/XenDesktop/XenApp configurations since 2009. Tobias holds a PhD in astronomy from the University of Vienna, Austria, as well as CCA certification in XenServer 6. He can be encountered regularly on the Citrix Discussion Forum and lurks as well on LinkedIn and more recently, CUGC for which he also serves in the capacity of a Steering Committee member and Vice President and is on the Content Working Group. He received the Citrix Technology Professional (CTP) award in 2016 and 2017 and is also currently a member of the NVIDIA GRID Community Advocate (NGCA) group.

Changes to Hotfixes and Maintenance for XenServer Releases

Changes to Hotfixes and Maintenance for XenServer Releases

XenServer (XS) is changing as a product with more aggressive release cycles as well as the recent introduction of both Current Release (CR) and Long Term Service Release (LTSR) options. Citrix recently announced changes to the way XenServer updates (hotfixes) will be regulated and this article summarizes the way these will be dealt with looking forward.

To first define a few terms and clarify some points, the CR model is intended for those who always want to install the latest and greatest versions, and these are being targeted for roughly quarterly releases. The LTSR model is primarily intended for installations that are required to run with a fixed release. These are by definition required to be under a service contract in order to gain access to periodic updates, known as Cumulative Updates or CUs. The first of these will be CU1. Since Citrix is targeting support for LTSR versions for ten years, this incurs quite a bit of cost over time and is much longer than a general CR cycle is maintained, hence the requirement for active support. The support for those running XS as part of running licensed versions of XenApp/XenDesktop will remain the same, as these CR licenses are provided as part of the XenApp/XenDesktop purchase. There will still be a free version of XenServer and the only change at this point will be that on-demand hotfixes for a given CR will no longer be available once the next CR is released. Instead, updates to quarterly releases will be needed to access and stay on top of hotfixes. More will be discussed about this later on. End-of-life timelines should help customers in planning for these changes. Licensed versions of XenServer need to fall under active Customer Success Services (CSS) contracts to be eligible for access to the corresponding types of updates and releases that fall outside of current product cycles.

 

At a Glance

The summary of how various XS versions are treated looking forward is presented in the following table, which provides an overview at a glance.

XenServer Version

Active Hotfix Cut-Off

Action Required

6.2 & 6.5 SP1

EOL June 2018

None. Hotfixes will continue to be available as before until EOL.

7.0

December 2017

Upgrade or buy CSS.

7.1

December 2017

Upgrade or buy CSS (to stay on the LTSR stream by deploying CU1)

7.2

At release of next CR

Upgrade to latest CR, or buy CSS to obtain 4 months more of hotfixes (and thus skip one CR)

 

In the future, the only way to get hotfixes for XS 7.2 without paying for a CSS will be to upgrade to the most current CR once released. Customers who do pay for CSS will also be able to access hotfixes on 7.2 for a further four months after the next CR is released. This would in principle allow you to skip a release and transition to the next one beyond it.

 

What Stays the Same

While changes are fairly significant for especially users of the free edition of XS, some aspects will not change. These include:

·        There will still be free versions of XenSever released as part of the quarterly CR cycle.

·        The free version of XS can still be updated with all released hotfixes until the next CR becomes generally available.

·        XS code still remains open source.

·        Any hotfixes that have already been released will remain publically available.

·        There will be no change to XS 6.2 through 6.5 SP1. They are only going to get security hotfixes at this point, anyway, until being EOLed in June 2017.

 

In Summary 

First off, stay calm and keep on maintaining XS pretty much as before except for the cut-offs on how long hotfixes will be available for non-paid-for versions. The large base of free XS users should not fear that all access to hotfixes is getting completely cut off. You will still be able to maintain your installation exactly as before, up to the point of a new CR. This will take place roughly every quarter. This is a pretty small amount of extra effort to keep up, especially given how much easier updating has become, in particular for pools. Most users would do well to keep up with current releases to benefit from improvements and to not have to apply huge numbers of hotfixes whenever installing a base version; the CR releases will include all hotfixes accumulated up to that point, which will be more convenient.

 

What To Do 

Any customers who already have active CSS contracts will be unaffected by any of these changes. Included are servers that support actively licensed XenApp/XenDesktop instances, who by the way, get all the XS Enterprise features incorporated into the latest XS releases – for all editions of XA/XD. Customers on XS 7.0 and 7.1 LTSR who have allowed their software maintenance to lapse and have not renewed/upgraded to CCS subscriptions should seriously consider that route.

Citrix will provide additional information as these changes are put into place.

Continue reading
468 Hits
0 Comments

From the Field: XenServer 7.X Pool Upgrades, Part 2: Updates and Aftermath

The XenServer upgrade process with XenServer 7 can be a bit more involved than with past upgrades, particularly if upgrading from a version prior to 7.0. In this series I’ll discuss various approaches to upgrading XS pools to 7.X and some of the issues you might encounter during the process based on my experiences in the field.

Part 1 dealt with some of the preliminary actions needed to be taken into consideration and the planning process, as well as the case for a clean installation. You can go back and review it here. Part 2 deals with the alternative, namely a true upgrade procedure, and what issues may arise during and after the procedure.

In-Place Upgrades to XS 7

An in-place upgrade to XS 7.X is a whole different beast compared to where the OS is overwritten from scratch. Mostly, it depends on whether or not you wish to retain the original partition scheme or go with the newer, larger, and more flexible disk partition layout. The bottom line is despite the added work and potential issues, you may as well go through with the update as it will make life easier later down the road.

Your choices here are to retain the current disk partitioning layout or to switch to the newer partition layout. If you want to know why the new XS disk layout changed, review the “Why was the XenServer 7 Disk Layout Changed?” section in Part 1 of this blog series.

In my experience, issues found with in-place upgrades cover five areas – I’m going to cover them as follows: 

  1. Pre-XS 7 Upgrade, Staying the Course with the Original Partition Layout
  2. The 72% Solution
  3. The (In)famous Dell Utility Partition
  4. XS 7.X to XS 7.X Maintaining the Current Partition Layout
  5. XS 6.X or7.X to XS 7.X With a Change to the New Partition Layout (plus possible issues on Dell Servers with iDRAC modules)

Pre-XS 7 Upgrade, Staying the Course with the Original Partition Layout

If you choose to stick with the original partition layout, the rest of your installation experience – whether you go with the rolling pool upgrade, or conventional upgrade – will be pretty much the same as before.

As with any such upgrade, make sure to check ahead of time that the pool is ready for this undertaking: 

  • XenCenter has been upgraded 
  • Assure the proper backups of VMs, metadata, etc. have been performed 
  • That all VMs are agile or in the case of a rolling pool upgrade, at least shut down on all local SRs. 
  • All hosts must not only be running the same version of XenServer, but must have the identical list of hotfixes applied. 

Depending on the various ages of the hosts and how hotfixes were applied, you could run into the situation where there is a discrepancy. This can be particularly frustrating if an older hotfix has since been superseded and the older version is no longer available, yet shows up in the applied hotfix list. Though I’d only recommend this in dire circumstances (such as this, where the XenServer version is going to be replaced by a new one anyway) there are ways to make XenServers believe they have had the same specific patches applied by manipulating the contents of /var/update/applied directory (see, for example, this forum thread).

One thing you might run into is the following issue, covered in the next section, which incidentally appears to be independent of whichever partition layout you use.

The 72% Solution

This is not really so much of a solution, but rather a warning as to what you may encounter and what to do before you potentially run into this or what to do afterwards should you encounter it. 

This issue apparently crops up frequently during the latter part of the installation process and is characterized by a long wait with a screen showing 72% completion of the installation that can last anywhere from about five minutes to over an hour in extreme cases. 

One apparent cause of this is if you have an exceedingly large number of message files on your current version of XS. If you have the chance, it would be worthwhile taking the time to examine the contents of the area where these are stored and manually clean up any ancient versions within those areas under /var/lib/xcp/blobs/ in particular as most of these reside under the “messages” and “refs” subdirectories, paying attention also to the symbolic links.

If you do get stuck in the midst of a long-lasting installation you can escape out by pressing ALT+F2 to get into the shell and check the installer logs under /tmp/install-log to see if it has thrown any errors (thanks, Nekta -- @nektasingh -- for that tip!). If all looks OK, continue to wait until the process completes.

The (In)famous Dell Utility Partition

Using the rolling pool upgrade for the first time, I ran into a terrible situation in which the upgrade proceeded all the way to the very end and just as the final reboot was about to happen, I got this error popping up on the console:

Which read:

An unrecoverable error has occurred. The error was:

Failed to install bootloader: installing for i386-pc platform.

/usr/sbin/grub-install: warning: your embedding area is unusually small. core.img won’t fit in it..

/usr/sbin/grub-install: warning: Embedding is not possible. GRUB can only be installed in this setup by using blocklists. However blocklists are UNRELIABLE and their use is discouraged..

/usr/sbin/grub-install: error: will not process with blocklists.

You can imagine the reaction.

What caused this and what can you do about it? After all, this was something I’d never encountered before in any previous XS upgrades.
 
 

It turns out that the reason for this is the Dell utility partition (see, for example, this link). The Dell utility partition is a five-block partition that Dell puts on for its own purposes at the beginning of many of its servers as they ship. This did not interfere with any installs up to and including XS 6.5 SP1, hence this came to me as a total surprise when I first encountered in during a XS 7 upgrade. 

And while this wasn't an issue initially or in one particular upgrade which for whatever reason managed to squeak by without any errors being reported whatsoever, it's too small to hold the initial configuration needed to do the installation under most circumstances when installing XS 7.

What's bad is that the XenServer installation doesn't apparently perform any pre-checks to see if that first partition on the disk is big enough. This is the case for both UEFI and Legacy/BIOS boots.

The solution was simply to delete that sda1 partition altogether using fdisk and re-install. Deleting the partition can be performed live on the host prior to the installation process.

You can then successfully bypass this issue. I have performed this surgical procedure on a number of hosts with success and have not experienced any adverse effects; you do not even require a vorpal sword to accomplish this task.

Possibly future upgrade processes will be better accommodating and should perform a pre-check for this along with looking at other potential inconsistencies such as the lack of VM agility or uniformity of applied hotfixes.

XS 6.X or 7.X to XS 7.X With a Change to the New Partition Layout 

This is the trickiest area where most, including me, seem to encounter issues.

One point that should be clarified right away is: 

Any local partition on or partly contained on the system disk to be upgraded is going to get destroyed in the process. 

There are no two ways about this. Hence, plan ahead: either 

  • Storage Xenmotion any local VMs to pooled storage, or
  • Export them. 

If you want or need to preserve your existing local storage SRs, you’ll have to stay with the original partition scheme.

Should you decide to update the new partition scheme, before the upgrade, you will need to perform the following action:

# touch /var/preserve/safe2upgrade
The upgrade process (to preserve the pool) will need to be performed using the rolling pool upgrade. The caveat is that if anything goes wrong during any part of the upgrade process, you will have to exit and try to start over. The end point can vary from it being simple to reconvene from where you left off, to having things in a broken state, depending on the circumstances.The pre-checks performed are supposed to catch issues beforehand and allow you to take care of them before launching into the upgrade process, but these do not always trap everything! Above all, make sure that:
  • All VMs are agile, i.e. that there are no VMs running or even resident on any local storage
  • High Availability (HA) and Workload Balancing (WLB) are disabled
  • You have created metadata backups to at least one pooled storage device or exported it to an external location 
  • You have plenty of space to hold VMs on whatever hosts remain in your pool, figuring you’ll have one out at any given point and that initially, only the master will be potentially able to take on VMs from whichever hosts is in the process of being upgraded 
Preferably also have recent backups of all your VMs. It’s generally also a good idea to keep good notes about your various host configurations, including all physical and virtual networks, VLANs, iSCSI and NFS connection and the like.  I’d recommend doing an “export resource data” for the pool, which has been available as a utility since XS 6.5 and can run from XenCenter.To export resource data:
  1. In the XenCenter Navigation pane, click Infrastructure and then click on the pool. 
  2. From the XenCenter menu, click Pool and then select Export Resource Data. 
  3. Browse to a location where you would like to save report and then click Save.
You can chose between an XLS or CVS output. Note that this feature is only available on XenCenter with paid-for licensed version of XenServer. However, it can also be run via the CLI for any (including free) version of XS 6.5 or newer using:
# xe pool-dump-database file-name=target-output-file-name
Being prepared now for the upgrade, you may still run into issues along the way, in particular if you run out of space to move VMs to a different server or if the upgrade process hangs. If the master server manages to make it through the upgrade and the upgrade process fails at some point later, one of the first consequences will be that you will no longer be able to add any external hosts to the pool because the pool will be in a mixed state of hosts running different XS versions. This is not a good situation and makes it very difficult under some circumstances to recover from.
 

Recommended Process for Changing XenServer Partition Layout

Through various discussions on the XenServer discussion forum as well as some of my own experimentation, this is what I recommend as the most solid and reliable way of doing an upgrade to XS 7.X with the new partition layout coming from a version that still has the old layout. It will take quite a bit longer, but is more reliable.  
In addition to all the preparatory steps listed above:

In addition to all the preparatory steps listed above:

  • You will be needing to eject all hosts from the pool at one point or another, so make sure you have carefully recorded the various network and other settings, in particular some which are not retained as such in the metadata exports, such as the individual iSCSI network connections or NFS connections. 
  • Copy your /etc/fstab file and also be sure to check for any other customizations you may have done, such as cron jobs or additions to /etc/rc.local (and also note that rc.local does not run on its own under XS 7.X, so you will need to manually enable it to do so – see the section “Enabling rc.local to Run On Its Own” below).

Once your enhanced preparation is complete: 

  1. Start with your pool master and do a rolling pool upgrade to it. Do not attempt to upgrade to the new partition layout! After it reboots, it should still have retained the role of pool master. Note that the rolling pool upgrade will not allow you to pick which host it will upgrade next beyond the master so be certain that any of these hosts can have all its VMs fit on the pool master. If necessary move some or all the VMs on the pool master onto other hosts within the pool before you commence the upgrade procedure.
  2. Pick another host and migrate all its VMs to the pool master.
  3. Follow this procedure for all hosts until the pool is completely upgraded to the identical version of XS 7.X on all pool members. You can either continue to use the rolling pool upgrade or if desired, switch to manual upgrade mode.
  4. Making sure you’ve carefully recorded all important storage and network information on that host, migrate all VMs off a non-master host and eject it from the pool.
  5. Touch the file /var/preserve/safe2upgrade and plan on the local SR being wiped and re-created. Then shutdown the host and perform a standalone installation to that just ejected host. It will have the new partition layout.
  6. Reduce the network settings on this host to just a single primary management interface NIC, one that matches the same one it had initially. Rejoin this just upgraded host back into the pool. Many of the pool metadata settings should automatically get recreated, in particular any of the pooled resources. Update any host-specific network settings as well as other customizations.
  7. Continue this process for the remainder of the non-master hosts.
  8. When these have all been completed, designate a new pool master using the command “xe pool-designate-new-master host-uuid=new-master-uuid”. Make sure the transition to the new pool master is properly completed. 
  9. Eject what was the original pool master and perform the standalone upgrade and rejoin the host to the pool as before.
  10. Celebrate your hard work!
While this process will take easily two times as long as a standard upgrade, at least you know that things should not break so badly that you may have to spend hours of additional time making things right. As a side benefit (if you want to consider it as such), it will also force you to take stock of how your pool is configured and require you to take careful inventory, In the event of a disaster, you will be grateful to have gone through this process as it may be very close to what you may have to go through under less controlled circumstances!I have done this myself several of times and had it work correctly each time.
 

An Additional Item: If the Console Shows No Network Configured on Dell Servers with iDRAC modules

This condition can show up unexpectedly and while normally something like this can be handled if the host is found to be in emergency mode with an “xe pool-recover-slaves” command, that’s not always the case. And even more oddly, if instead you ssh in and run xsconsole from the CLI, all looks perfectly normal, including all the network settings that appear present and correct and also match the settings visible in XenCenter. This condition, as far as I know, seems unique to Dell servers and was seen with iDRAC 7, 8 and 9 modules. Here's what it looks like:

The issue here appears to have been a change in behavior that kicked in starting with XenServer 7.1 and hence may not even be evident in XS 7.0.

The fix turned out to be an upgrade all the BIOS/firmware and iDRAC configurations. In this particular case, I made use of the method I described in this blog entry and that took care of it.  Note that this still does not seem to consistently address this issue, in particular with some older hardware (e.g., R715 with an iDRAC 6, even after updating to DSU 1.4.2, BIOS 3.2.1, but being stuck at iDRAC version 1.97 – apparently not upgradeable).

Enabling rc.local To Run On Its Own

The rc.local file is not automatically executed by default under RHEL/CentOS 7 installations, and since XS 7 is based on CentOS 7, it is no exception. If you wish to customize your environment and assure this file is executed at boot time, you will have to manually enable /etc/rc.local to run automatically on reboot on XenServer 7.X and to do so, will need to run these two commands:

# chmod u+x /etc/rc.d/rc.local
# systemctl start rc-local

You can verify that rc.local is now running with the following command, which in turn should produce output similar to what is shown below:

# systemctl status rc-local

   rc-local.service - /etc/rc.d/rc.local Compatibility
 Loaded: loaded (/usr/lib/systemd/system/rc-local.service; static; vendor preset: disabled)
Active: active (exited) since Sun 2017-06-18 09:09:50 MST; 1 weeks 6 days ago
Process: 4649 ExecStart=/etc/rc.d/rc.local start (code=exited, status=0/SUCCESS)

A Final Word

Once again, I will reiterate that feedback is always appreciated and the XenServer forum is one good option. Errors, on the other hand, should be reported to the XenServer bugs site with as much information as possible to help the engineers understand and reproduce the issues.

I sincerely hope some of the information in this series will have been useful to some degree.

I would like to most gratefully acknowledge Andrew Wood, fellow Citrix Technology Professional, for the review of and constructive additions to this blog.

 

Recent comment in this post
Tobias Kreidl
I would like to add that an option for a more efficient, yet safe update procedure, would be to follow what's in the article, but ... Read More
Thursday, 06 July 2017 19:00
Continue reading
2853 Hits
1 Comment

From the Field: XenServer 7.X Pool Upgrades, Part 1: Background Information and Clean Install

The XenServer upgrade process with XenServer 7 can be a bit more involved than with past upgrades, particularly if upgrading from a version prior to 7.0. In this series I’ll discuss various approaches to upgrading XenServer (XS) pools to 7.X and some of the issues you might encounter during the process based on my experiences in the field.

Part 1 deals with some of the preliminary actions that need to be taken into consideration and the planning process, as well as the case for a clean installation. Part 2 deals with the alternative, namely a true in-place upgrade procedure, and what issues may arise during and after the procedure.

 

Why Upgrade to XenServer 7?

XenServer (XS) 7 has been out for a little over a year now and is currently up to release 7.2. A number of improvements in scaling, functionality and features are available in XS 7, which should encourage users to upgrade to take advantage of them.

 

First Step in Any XS Upgrade

The first step in any XenServer upgrade is to upgrade your XenCenter to at least the minimum version required to support the newest installation of XenServer you are or will be accessing.

This step is crucial. Failure to do this causes puppies to die. This is because older versions of XenCenter won’t connect to newer versions of XS and in many cases, new features are built into the newer versions of XenCenter that are not available in older versions and support important components in the newer XS releases, including hotfix and rolling pool upgrade options, changes in licensing, and others such as supported pooled storage options. In short, your upgrade process will rapidly come to a screeching halt and there may be potential damage done unless you are working with the proper version of XenCenter.

It is also highly recommended to make backups of anything and everything prior to upgrading. More on that will be discussed later.

 

Summary of Pre-Upgrade Checks

From experience, the following are key activities to perform before upgrading any XenServer pool:

XenCenter has been upgraded to at least the version that does or will match your newest pool

  • Assure the proper backups of virtual machines (VMs), metadata, etc. have been performed
  • Make sure that all VMs are agile (can be migrated to other hosts within the pool) or in the case of a rolling pool upgrade, at least shut down on all local SRs
  • All hosts must not only be running the same version of XenServer, but must have the identical list of hotfixes applied
  • Clean up old messages in /var/lib/xcp/blobs/

I’d also recommend doing an “export resource data” for the pool, which has been available as a utility since XS 6.5 and can run from XenCenter.

To export resource data:

  1. In the XenCenter Navigation pane, click Infrastructure and then click on the pool.
  2. From the XenCenter menu, click Pool and then select Export Resource Data.
  3. Browse to a location where you would like to save the report and then click Save.

You can chose between an XLS or CVS output. Note that this feature is only available on XenCenter with paid-for licensed versions of XenServer. However, it can also be run via the CLI for any (including free) version of XS 6.5 or newer using:

# xe pool-dump-database file-name=target-file-name

 

Benefits of a XenServer 7 Upgrade

XenServer 7 is a significant jump from XS 6.5 SP1 in many ways, including improvements in scale and performance. It also has added a number of very nice new features (see what's new in XS 7; what's new in XS 7.1; XS 7.2 ).

While any upgrade can be a bit intimidating, ones that jump to a whole new edition can be more complicated and given the changes involved here, this case was no exception. I would also recommend trying to verify your hardware is present on the HCL though it’s often slow to get populated. If in doubt and you have the extra resources, take as close as possible a spare server and try to do an installation on it first to make sure it’s compatible. If you don’t have such a machine and have the spare capacity, eject one of the hosts from your pool and use it to test out the upgrade. Before performing an upgrade on a production server or pool, it is always recommended to try this out first on a test configuration.

I do not recommend at this time the rolling pool or standalone upgrade that involves switching at the same time from the old to the new partition layout. Overall experiences within the user community have not been consistently positive.

It is hoped that the preparatory steps that are recommended before undertaking an upgrade do not prove to be too daunting. This series is intended to help with those preparations as well as give guidelines and tips how to go about the process and what to do if issues arise. There are clearly a number of ways to tackle upgrades and there are undoubtedly numerous other possible approaches. One might, for example, storage XenMotion VMs to other pools or make use of temporary pooled storage as part of the process. The detaching and attaching pooled storage to servers already running a newer version of XenServer is another possible route to explore.

Experimentation is always something that should be encouraged, albeit not on production systems!

Feedback is always appreciated and the XenServer forum https://discussions.citrix.com/forum/101-xenserver/ is one good option. Errors, on the other hand, should be reported to the XenServer bugs site with preferably sufficient detail to allow the engineers to be able to understand and reproduce the issues.

 

XS 7.X Leveraging a Clean Installation as an Upgrade Component

A clean install can become part of an upgrade procedure if for example an upgraded pool already exists and individual hosts or hosts from a whole different pool are to be merged into a pool running a newer version of XenServer. Hence talking about this option is relevant to upgrading. You can potentially storage Xenmotion VMs over to a new pool, then eject and dissolve individual pool members, perform a clean installation on them, and join the hosts to the new pool. Of course the target pool must have adequate resources to hold and run the VMs being moved over until the new hosts can be configured and added. This process might involve leveraging temporary external storage, for example, by creating an NFS SR.

The easiest installation scenario is a clean install, which has thankfully always been the case and this approach may work well in some instances. This is obviously suited best for a standalone server or as mentioned above, creating a server that will be joined to an existing pool.

Regardless of which upgrade process you undertake, be sure to review your VMs afterwards to see if XenTools need to be updated.

 

A Few Words About the XenServer 7 New Disk Layout

The main item to be aware of is that a clean installation will always create the new GPT partition layout as follows:

 

 

Size Purpose
4GB log partition (syslogd and other logs)
18GB backup partition for dom0 (for upgrades)
18GB XenServer host control domain (dom0) partition
0.5GB UEFI boot partition
1GB swap partition

 

This adds up to 41.5 GB. There can be a sixth partition which is optionally used for creating a local storage repository (SR) or a portion of a local SR if spanning it to other storage devices. See also James Bulpin's XS 7 blog for more details about the XS 7 architecture.

Here is what a real-life partition table looks like after performing an installation. Note that the space for a local SR ends up in partition 3 (in this case, a default LVM partition 94.6 GB in size):

# fdisk -l /dev/sda

WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.

Disk /dev/sda: 146.2 GB, 146163105792 bytes, 285474816 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk label type: gpt

#         Start            End          Size  Type       Name

 1     46139392     83888127     18G  Microsoft basic

 2       8390656     46139391     18G  Microsoft basic

 3     87033856   285474782   94.6G  Linux LVM

 4     83888128     84936703   512M  BIOS boot parti

 5            2048       8390655       4G  Microsoft basic

 6     84936704     87033855       1G  Linux swap

 

Why was the XenServer 7 Disk Layout Changed?

The XS 6.5 SP1 and earlier partition scheme was much smaller and simpler:

4 GB XenServer host control domain (dom0) partition

4 GB backup partition for dom0 (for upgrades)

The rest of the space was again available for use as a local SR in a third partition.

The default partition layout was changed under XS 7 for a number of reasons. For one, most storage devices these days start with a capacity of few hundred GB and so it makes sense to be able to allocate more space for the XenServer installation itself beyond just 8 GB. In that regard, running out of space has been a common problem caused by either a lot of hotfixes being stored or the accumulation of a lot of log files. Isolating /var/log to its own partition now helps keep XS from crashing in case it fills up. Before, /var/log was just another subdirectory on the system “/” partition and inevitably resulted in a system crash when the partition became 100% full. A separate UEFI boot area was also added. Plus, the Linux swap space was assigned to its own partition. In short, there was no reason not to make use of more space and provide an improved partitioning scheme that would also scale better for larger pools containing a lot of VMs.

Note: One idiosyncrasy of XS 6.5 SP1 is that it switched to a GUID partition table in order to use more than 2 TB of local storage if available, but this has evidently been undone in 7.0 since GPT is no longer limited as such when running a 64-bit OS based on CentOS 7.

Note: There are changes in the minimum host physical system requirements, including a recommended minimum of 46 GB of disk space. This will impact environments that used smaller partitions or SSDs for the XS OS.

 

Stage Complete: Level Up!

Well done – you’ve completed Part 1. In Part 2, I’ll deal with the alternative to leveraging a clean install -- a true in-place upgrade procedure, and what issues may arise during and after the procedure.

Once again, I will reiterate that feedback is always appreciated and the XenServer forum is one good option. Errors, on the other hand, should be reported to the XenServer bugs site with as much information as possible to help the engineers understand and reproduce them.

 

Continue reading
2036 Hits
0 Comments

XenServer High-Availability Alternative HA-Lizard

XenServer High-Availability Alternative HA-Lizard

WHY HA AND WHAT IT DOES

XenServer (XS) contains a native high-availability (HA) option which allows quite a bit of flexibility in determining the state of a pool of hosts and under what circumstances Virtual Machines (VMs) are to be restarted on alternative hosts in the event of the loss of the ability of a host to be able to serve VMs. HA is a very useful feature that protects VMs from staying failed in the event of a server crash or other incident that makes VMs inaccessible. Allowing a XS pool to help itself maintain the functionality of VMs is an important feature and one that plays a large role in sustaining as much uptime as possible. Permitting the servers to automatically deal with fail-overs makes system administration easier and allows for more rapid reaction times to incidents, leading to increased up-time for servers and the applications they run.

XS allows for the designation of three different treatments of Virtual Machines: (1) always restart, (2) restart if possible, and (3) do not restart. The VMs designated with the highest restart priority will be the first to be attempted to restart and all will be handled, provided adequate resources (primarily, host memory) are available.  A specific start order, allowing for some VMs to be checked to be running before others, can also be established. VMs will be automatically distributed among whatever remaining XS hosts are considered active. Where necessary, note that hosts that contain expandable memory will be shrunk down to accommodate additional hosts and those hosts designated to be restarted will also be run with reduced memory, if necessary. If additional capacity exists to run more VMs, those designated as “start if possible” will be brought online. Whichever VMs that are not considered essential typically will be marked as “do not restart” and hence will be left “off” had they been running before, requiring any of those desired to be restarted to be done manually, resources permitting.

XS also allows for specifying the minimum number of active hosts to remain to accommodate failures; larger pools that are not overly populated with VMs can readily accommodate even two or more host failures.

The election of what hosts are “live” and should be considered active members of the pool follows a rather involved process of a combination of network accessibility plus access to an independent designated pooled Storage Repository (SR) that serves as an additional metric. The pooled SR can also be a fiber channel device, being independent of Ethernet connections. A quorum-based algorithm is applied to establish which servers are up and active as members of the pool and which -- in the event of a pool master failure -- should be elected the new pool master.

 

WHEN HA WORKS, IT WORKS GREAT

Without going into more detail, suffice it to say that this methodology works very well, however requiring a few prerequisite conditions that need to be taken into consideration. First of all, the mandate that a pooled storage device be available clearly means that a pool consisting of hosts that only make use of local storage will be precluded. Second, there is also a constraint that for a quorum to be possible, it is required to have a minimum of three hosts in the pool or HA results will be unpredictable as the election of a pool master can become ambiguous. This comes about because of the so-called “split brain” issue (http://linux-ha.org/wiki/Split_Brain) which is endemic in many different operating system environments that employ a quorum as means of making such a decision. Furthermore, while fencing (the process of isolating the host; see for example http://linux-ha.org/wiki/Fencing) is the typical recourse, the lack of intercommunication can result in a wrong decision being made and hence loss of access to VMs. Having experimented with two-host pools and the native XenServer HA, I would say that an estimate of it working about half the time is about right and from a statistical viewpoint, pretty much what you would expect.

This limitation is, however, still of immediate concern to those with either no pooled storage and/or only two hosts in a pool. With a little bit of extra network connectivity, a relatively simple and inexpensive solution to the external SR can be provided by making a very small NFS-based SR available. The second condition, however, is not readily rectified without the expense of at least one additional host and all the connectivity associated with it. In some cases, this may simply not be an affordable option.

 

ENTER HA-LIZARD

For a number of years now, an alternative method of providing HA has been available through the program package provided by HA-Lizard (http://www.halizard.com/) , a community project that provides a free alternative that is neither dependent on external SRs nor requires a minimum of three hosts within a pool. In this blog, the focus will be on the standard HA-Lizard version and because of the particularly harder-to-handle situation of a two-node pool, it will also be the subject of discussion.

I had been experimenting for some time with HA-Lizard and found in particular that I was able to create failure scenarios that needed some improvement. HA-Lizard’s Salvatore Costantino was more than willing to lend an ear to the cases I had found and this led further to a very productive collaboration on investigating and implementing means to deal with a number of specific cases involving two-host pools. The result of these several months of efforts is a new HA-Lizard release that manages to address a number of additional scenarios above and beyond its earlier capabilities.

It is worthwhile mentioning that there are two ways of deploying HA-Lizard:

1) Most use cases combine HA-Lizard and iSCSI-HA which creates a two-node pool using local storage while maintaining full VM agility with VMs being able to run on either host. In this case, DRBD (http://www.drbd.org/) is implemented in this type of deployment and it works very well making use of the real-time storage replication.

2) HA-Lizard, only, is used with an external Storage Repository (as in this particular case).

Before going into details of the investigation, a few words should go towards a brief explanation of how this works. Note that there is only Internet connectivity (the use of a heuristic network node) and no external SR, so how is a split brain situation then avoidable?

This is how I'd describe the course of action in this two-node situation:

If a node sees the gateway, assume it's alive. If it cannot, assume it's a good candidate for fencing. If the node that cannot see the gateway is the master, it should internally kill any running VMs and surrender its ability to be the master and fence itself. The slave node should promote itself to master and attempt to restart any missing VMs. Any that are on the previous master will probably fail though, because there is no communication to the old master. If the old VMs cannot be restarted, eventually the new master will be able to restart them regardless after a toolstack restart. If the slave node fails by not being able to communicate with the network, as long as the master still sees the network and not the slave’s network, it can assume the slave needs to fence itself, kill off its VMs and assume that they will be restarted on the current master. The slave needs to realize it cannot communicate out, and therefore should kill off any of its VMs and fence itself.

Naturally, the trickier part comes with the timing of the various actions, since each node has to blindly assume the other is going to conduct a sequence of events. The key here is that these are all agreed on ahead of time and as long as each follows its own specific instructions, it should not matter that each of the two nodes cannot see the other node. In essence, the lack of communication in this case allows for creating a very specific course of action! If both nodes fail, obviously the case is hopeless, but that would be true of any HA configuration in which no node is left standing.

Various test plans were worked out for various cases and the table below elucidates the different test scenarios, what was expected and what was actually observed. It is very encouraging that the vast majority of these cases can now be properly handled.

 

Particularly tricky here was the case of rebooting the master server from the shell, without first disabling HA-Lizard (something one could readily forget to do). Since the fail-over process takes a while, a large number of VMs cannot be handled before the communication breakdown takes place, hence one is left with a bit of a mess to clean up in the end. Nevertheless, it’s still good to know what happens if something takes place that rightfully shouldn’t!

The other cases, whether intentional or not, are handled predictably and reliably, which is of course the intent. Typically, a two-node pool isn’t going to have a lot of complex VM dependencies, so the lack of a start order of VMs should not be perceived as a big shortcoming. Support for this feature may even be added in a future release.

 

CONCLUSIONS

HA-Lizard is a viable alternative to the native Citrix HA configuration. It’s straightforward to set up and can handle standard failover cases with a selective “restart/do not restart” setting for each VM or can be globally configured. There are a quite a number of configuration parameters which the reader is encouraged to research in the extensive HA-Lizard documentation. There is also an on-line forum which serves as a source for information and prompt assistance with issues. This most recent release 2.1.3 is supported on both XenServer 6.5 and 7.0.

Above all, HA-Lizard shines when it comes to handling a non-pooled storage environment and in particular, all configurations of the dreaded two-node pool configuration. From my direct experience, HA-Lizard now handles the vast majority of issues involved in a two-node pool and can do so more reliably than the non-supported two-node pool using Citrix’ own HA application. It has been possible to conduct a lot of tests with various cases and importantly, and to do so multiple times to ensure the actions are predictable and repeatable.

I would encourage taking a look at HA-Lizard and giving it a good test run. The software is free (contributions are accepted) and it is in extensive use and has a proven track record.  For a two-host pool, I can frankly not think of a better alternative, especially with these latest improvements and enhancements.

I would also like to thank Salvatore Costantino for the opportunity to participate in this investigation and am very pleased to see the fruits of this collaboration. It has been one way of contributing to the Citrix XenServer user community that many can immediately benefit from.

 

 

 

 

 

 

Recent comment in this post
JK Benedict
I hath no idea why more have not read this intense article! As always: bravo, sir! BRAVO!
Wednesday, 04 January 2017 12:43
Continue reading
6057 Hits
1 Comment

NAU VMbackup 3.0 for XenServer

NAU VMbackup 3.0 for XenServer

By Tobias Kreidl and Duane Booher

Northern Arizona University, Information Technology Services

Over eight years ago, back in the days of XenServer 5, not a lot of backup and restore options were available, either as commercial products or as freeware, and we quickly came to the realization that data recovery was a vital component to a production environment and hence we needed an affordable and flexible solution. The conclusion at the time was that we might as well build our own, and though the availability of options has grown significantly over the last number of years, we’ve stuck with our own home-grown solution which leverages Citrix XenServer SDK and XenAPI (http://xenserver.org/partners/developing-products-for-xenserver.html). Early versions were created from the contributions of Douglas Pace, Tobias Kreidl and David McArthur. During the last several years, the lion’s share of development has been performed by Duane Booher. This article discusses the latest 3.0 release.

A Bit of History

With the many alternatives now available, one might ask why we have stuck with this rather un-flashy script and CLI-based mechanism. There are clearly numerous reasons. For one, in-house products allow total control over all aspects of their development and support. The financial outlay is all people’s time and since there are no contracts or support fees, it’s very controllable and predictable. We also found from time-to-time that various features were not readily available in other sources we looked at. We also felt early on as an educational institution that we could give back to the community by freely providing the product along with its source code; the most recent version is available via GitHub at https://github.com/NAUbackup/VmBackup for free under the terms of the GNU General Public License. There was a write-up at https://www.citrix.com/blogs/2014/06/03/another-successful-community-xenserver-sdk-project-free-backup-tools-and-scripts-naubackup-restore-v2-0-released/ when the first GitHub version was published. Earlier versions were made available via the Citrix community site (Citrix Developer Network), sometimes referred to as the Citrix Code Share, where community contributions were published for a number of products. When that site was discontinued in 2013, we relocated the distribution to GitHub.

Because we “eat our own dog food,” VMbackup gets extensive and constant testing because we rely on it ourselves as the means to create backups and provide for restores for cases of accidental deletion, unexpected data corruption, or in the event that disaster recovery might be needed. The mechanisms are carefully tested before going into production and we perform frequent tests to ensure the integrity of the backups and that restores really do work. A number of times, we have relied on resorting to recovering from our backups and it has been very reassuring that these have been successful.

What VMbackup Does

Very simply, VMbackup provides a framework for backing up virtual machines (VMs) hosted on XenServer to an external storage device, as well as the means to recover such VMs for whatever reason that might have resulted in loss, be it disaster recovery, restoring an accidentally deleted VM, recovering from data corruption, etc.

The VMbackup distribution consists of a script written in Python and a configuration file. Other than a README document file, that’s it other than the XenServer SDK components which one needs to download separately; see http://xenserver.org/partners/developing-products-for-xenserver.html for details. There is no fancy GUI to become familiar with, and instead, just a few simple things that need to be configured, plus a destination for the backups needs to be made accessible (this is generally an NFS share, though SMB/CIFS will work, as well). Using cron job entries, a single host or an entire pool can be set up to perform periodic backups. Configurations on individual hosts in a pool are needed in that the pool master performs the majority of the work and it can readily change to a different XenServer, while individual host-based instances are also needed when local storage is also made use of, since access to any local SRs can only be performed from each individual XenServer. A cron entry and numerous configuration examples are given in the documentation.

To avoid interruptions of any running VMs, the process of backing up a VM follows these basic steps:

  1. A snapshot of the VM and its storage is made
  2. Using the xe utility vm-export, that snapshot is exported to the target external storage
  3. The snapshot is deleted, freeing up that space

In addition, some VM metadata are collected and saved, which can be very useful in the event a VM needs to be restored. The metadata include:

  • vm.cfg - includes name_label, name_description, memory_dynamic_max, VCPUs_max, VCPUs_at_startup, os_version, orig_uuid
  • DISK-xvda (for each attached disk)
    • vbd.cfg - includes userdevice, bootable, mode, type, unplugable, empty, orig_uuid
    • vdi.cfg - includes name_label, name_description, virtual_size, type, sharable, read_only, orig_uuid, orig_sr_uuid
  • VIFs (for each attached VIF)
    • vif-0.cfg - includes device, network_name_label, MTU, MAC, other_config, orig_uuid

An additional option is to create a backup of the entire XenServer pool metadata, which is essential in dealing with the aftermath of a major disaster that affects the entire pool. This is accomplished via the “xe pool-dump-database” command.

In the event of errors, there are automatic clean-up procedures in place that will remove any remnants plus make sure that earlier successful backups are not purged beyond the specified number of “good” copies to retain.

There are numerous configuration options that allow to specify which VMs get backed up, how many backup versions are to be retained, whether the backups should be compressed (1) as part of the process, as well as optional report generation and notification setups.

New Features in VMbackup 3.0

A number of additional features have been added to this latest 3.0 release, adding flexibility and functionality. Some of these came about because of the sheer number of VMs that needed to be dealt with, SR space issues as well as with changes coming to the next XenServer release. These additions include:

  • VM “preview” option: To be able to look for syntax errors and ensure parameters are being defined properly, a VM can have a syntax check performed on it and if necessary, adjustments can then be made based on the diagnosis to achieve the desired configuration.
  • Support for VMs containing spaces: By surrounding VM names in the configuration file with double quotes, VM names containing spaces can now be processed. 
  • Wildcard suffixes: This very versatile option permits groups of VMs to be configured to be handled similarly, eliminating the need to create individual settings for every desired VM. Examples include “PRD-*”, “SQL*” and in fact, if all VMs in the pool should be backed up, even “*”. There are however, a number of restrictions on wildcard usage (2).
  • Exclude VMs: Along with the wildcard option to select which VMs to back up, clearly a need arises to provide the means to exclude certain VMs (in addition to the other alternative, which is simply to rename them such that they do not match a certain backup set). Currently, each excluded VM must be named separately and any such VMs should de defined at the end of the configuration file. 
  • Export the OS disk VDI, only: In some cases, a VM may contain multiple storage devices (VDIs) that are so large that it is impractical or impossible to take a snapshot of the entire VM and its storage. Hence, we have introduced the means to backup and restore only the operating system device (assumed to be Disk 0). In addition to space limitations, some storage, such as DB data, may not be able to be reliably backed up using a full VM snapshot. Furthermore, the next XenServer release (Dundee) will likely support up to as many as perhaps 255 storage devices per VM, making a vm-export even more involved under such circumstances. Another big advantage here is that currently, this process is much more efficient and faster than a VM export by a factor of three or more!
  • Root password obfuscation: So that clear-text passwords associated with the XenServer pool are not embedded in the scripts themselves, the password can be basically encoded into a file.

The mechanism for a running VM from which only the primary disk is to be backed up is similar to the full VM backup. The process of backing up such a VM follows these basic steps:

  1. A snapshot of just the VM's Disk 0 storage is made
  2. Using the xe utility vdi-export, that snapshot is exported to the target external storage
  3. The snapshot is deleted, freeing up that space

As with the full VM export, some metadata for the VM are also collected and saved for this VDI export option.

These added features are of course subject to change in future releases, though typically later editions generally encompass the support of previous versions to preserve backwards compatibility.

Examples

Let’s look at the configuration file weekend.cfg:

# Weekend VMs
max_backups=4
backup_dir=/snapshots/BACKUPS
#
vdi-export=PROD-CentOS7-large-user-disks
vm-export=PROD*
vm-export=DEV-RH*:3
exclude=PROD-ubuntu12-benchmark
exclude=PRODtestVM

Comment lines start with a hash mark and may be contained anywhere with the file. The hash mark must appear as the first character in the line. Note that the default number of retained backups is set here to four. The destination directory is set next, indicating where the backups will be written to. We then see a case where only the OS disk is being backed up for the specific VM "PROD-CentOS7-large-user-disks" and below that, all VMs beginning with “PROD” are backed up using the default settings. Just below that, a definition is created for all VMs starting with "DEV-RH" and the default number of backups is reduced for all of these from the global default of four down to three. Finally, we see two excludes for specific VMs that fall into the “PROD*” group that should not be backed up at all.

To launch the script manually, you would issue from the command line:

./VmBackup.py password weekend.cfg

To launch the script via a cron job, you would create a single-line entry like this:

10 0 * * 6 /usr/bin/python /snapshots/NAUbackup/VmBackup.py password
/snapshots/NAUbackup/weekend.cfg >> /snapshots/NAUbackup/logs/VmBackup.log 2>&1

This would run the task at ten minutes past midnight on Saturday and create a log entry called VmBackup.log. This cron entry would need to be installed on each host of a XenServer pool.

Additional Notes

It can be helpful to break up when backups are run so that they don’t all have to be done at once, which may be impractical, take so long as to possibly impact performance during the day, or need to be coordinated with when is best for specific VMs (such as before or after patches are applied). These situations are best dealt with by creating separate cron jobs for each subset.

There is a fair load on the server, comparable to any vm-export, and hence the queue is processed linearly with only one active snapshot and export sequence for a VM being run at a time. This is also why we suggest you perform the backups and then asynchronously perform any compression on the files on the external storage host itself to alleviate the CPU load on the XenServer host end.

For even more redundancy, you can readily duplicate or mirror the backup area to another storage location, perhaps in another building or even somewhere off-site. This can readily be accomplished using various copy or mirroring utilities, such as rcp, sftp, wget, nsync, rsync, etc.

This latest release has been tested on XenServer 6.5 (SP1) and various beta and technical preview versions of the Dundee release. In particular, note that the vdi-export utility, while dating back a while, is not well documented and we strongly recommend not trying to use it on any XenServer release before XS 6.5. Doing so is clearly at your own risk.

The NAU VMbackup distribution can be found at: https://github.com/NAUbackup/VmBackup

In Conclusion

This is a misleading heading, as there is not really a conclusion in the sense that this project continues to be active and as long as there is a perceived need for it, we plan to continue working on keeping it running on future XenServer releases and adding functionality as needs and resources dictate. Our hope is naturally that the community can make at least as good use of it as we have ourselves.

Footnotes:

  1. Alternatively, to save time and resources, the compression can potentially be handled asynchronously by the host onto which the backups are written, hence reducing overhead and resource utilization on the XenServer hosts, themselves.
  2. Certain limitations exist currently with how wildcards can be utilized. Leading wildcards are not allowed, nor are multiple wildcards within a string. This may be enhanced at a later date to provide even more flexibility.

This article was written by Tobias Kreidl and Duane Booher, both of Northern Arizona University, Information Technology Services. Tobias' biography is available at this site, and Duane's LinkedIn profile is at https://www.linkedin.com/in/duane-booher-a068a03 while both can also be found on http://discussions.citrix.com primarily in the XenServer forum.     

Recent Comments
Lorscheider Santiago
Tobias Kreidl and Duane Booher, Greart Article! you have thought of a plugin for XenCenter?
Saturday, 09 April 2016 13:28
Tobias Kreidl
Thank you, Lorscheider, for your comment. Our thoughts have long been that others could take this to another level by developing a... Read More
Thursday, 14 April 2016 01:34
Niklas Ahden
Hi, First of all I want to thank you for this great article and NAUBackup. I am wondering about the export-performance while usin... Read More
Sunday, 17 April 2016 19:14
Continue reading
14306 Hits
11 Comments

Review: XenServer 6.5 SP1 Training CXS-300

A few weeks ago, I received an invitation to participate in the first new XenServer class to be rolled out in over three years, namely CXS-300: Citrix XenServer 6.5 SP1 Administration. Those of you with good memories may recall that XenServer 6.0, on which the previous course was based, was officially released on September 30, 2011. Being an invited guest in what was to be only the third time the class had been ever held was something that just couldn’t be passed up, so I hastily agreed. After all, the evolution of the product since 6.0 has been enormous. Plus, I have been a huge fan of XenServer since first working with version 5.0 back in 2008.  Shortly before the open-sourcing of XenServer in 2013, I still recall the warnings of brash naysayers that XenServer was all but dead. However, things took a very different turn in the summer of 2013 with the open-source release and subsequent major efforts to improve and augment product features. While certain elements were pulled and restored and there was a bit of confusion about changes in the licensing models, things have stabilized and all told, the power and versatility of XenServer with the 6.5 SP1 release is at a level now some thought it would never reach.

FROM 6.0 TO 6.5 – AND BEYOND

XenServer (XS for short) 6.5 SP1 made its debut on May 12, 2015. The feature set and changes are – as always – incorporated within the release notes. There are a number of changes of note that include an improved hotfix application mechanism, a whole new XenCenter layout (since 6.5), increased VM density, more guest OS support, a 64-bit kernel, the return of workload balancing (WLB) and the distributed virtual switch controller (DVSC) appliance, in-memory read caching, and many others. Significant improvements have been made to storage and network I/O performance and overall efficiency. XS 6.5 was also a release that benefited significantly from community participation in the Creedence project and the SP1 update builds upon this.

One notable point is that XenServer has been found to now host more XenDesktop/XenApp (XD/XA) instances than any other hypervisor (see this reference). And, indeed, when XenServer 6.0 was released, a lot of the associated training and testing on it was in conjunction with Provisioning Services (PVS). Some users, however, discovered XenServer long before this as a perfectly viable hypervisor capable of hosting a variety of Linux and Windows virtual machines, without having even given thought to XenDesktop or XenApp hosting. For those who first became familiar with XS in that context, the added course material covering provisioning services had in reality relatively little to do with XenServer functionality as an entity. Some viewed PVS an overly emphasized component of the course and exam. In this new course, I am pleased to say that XS’s original roots as a versatile hypervisor is where the emphasis now lies. XD/XA is of course discussed, but the many features available that are fundamental to XS itself is what the course focuses on, and it does that well.

COURSE MATERIALS: WHAT’S INCLUDED

The new “mission” of the course from my perspective is to focus on the core product itself and not only understand its concepts, but to be able to walk away with practical working knowledge. Citrix puts it that the course should be “engaging and immersive”. To that effect, the instructor-led course CXS-300 can be taken in a physical classroom or via remote GoToMeeting (I did the latter) and incorporates a lecture presentation, a parallel eCourseware manual plus a student exercise workbook (lab guide) and access to a personal live lab during the entire course. The eCourseware manual serves multiple purposes, providing the means to follow along with the instructor and later enabling an independent review of the presented material. It adds a very nice feature of providing an in-line notepad for each individual topic (hence, there are often many of these on a page) and these can be used for note taking and can be saved and later edited. In fact, a great takeaway of this training is that you are given permanent access to your personalized eCourseware manual, including all your notes.

The course itself is well organized; there are so many components to XenServer that five days works out in my opinion to be about right – partly because often question and answer sessions with the instructor will take up more time than one might guess, and also, in some cases all participants may have already some familiarity with XS or other hypervisor that makes it possible to go into some added depth in some areas. There will always need to be some flexibility depending on the level of students in any particular class.

A very strong point of the course is the set of diagrams and illustrations that are incorporated, some of which are animated. These compliment the written material very well and the visual reinforcement of the subject matter is very beneficial. Below is an example, illustrating a high availability (HA) scenario:

XS6.5SP1_course_image.jpg 

 

The course itself is divided into a number of chapters that cover the whole range of features of XS, enforced by some in-line Q&A examples in the eCourseware manual and with related lab exercises.  Included as part of the course are not only important standard components, such as HA and Xenmotion, but some that require plugins or advanced licenses, such as workload balancing (WLB), the distributed virtual switch controller (DVSC) appliance and in-memory read caching. The immediate hands-on lab exercises in each chapter with the just-discussed topics are a very strong point of the course and the majority of exercises are really well designed to allow putting the material directly to practical use. For those who have already some familiarity with XS and are able to complete the assignments quickly, the lab environment itself offers a great sandbox in which to experiment. Most components can readily be re-created if need be, so one can afford to be somewhat adventurous.

The lab, while relying heavily on the XenCenter GUI for most of the operations, does make a fair amount of use of the command line interface (CLI) for some operations. This is a very good thing for several reasons. First off, one may not always have access to XenCenter and knowing some essential commands is definitely a good thing in such an event. The CLI is also necessary in a few cases where there is no equivalent available in XenCenter. Some CLI commands offer some added parameters or advanced functionality that may again not be available in the management GUI. Furthermore, many operations can benefit from being scripted and this introduction to the CLI is a good starting point. For Windows aficionados, there are even some PowerShell exercises to whet their appetites, plus connecting to an Active Directory server to provide role-based access control (RBAC) is covered.

THE INSTRUCTOR

So far, the materials and content have been the primary points of discussion. However, what truly can make or break a class is the instructor. The class happened to be quite small, and primarily with individuals attending remotely. Attendees were in fact from four different countries in different time zones, making it a very early start for some and very late in the day for others. Roughly half of those participating in the class were not native English speakers, though all had admirable skills in both English and some form of hypervisor administration.  Being all able to keep up a common general pace allowed the class to flow exceptionally well. I was impressed with the overall abilities and astuteness of each and every participant.

The instructor, Jesse Wilson, was first class in many ways. First off, knowing the material and being able to present it well are primary prerequisites. But above and beyond that was his ability to field questions related to the topic at hand and even to go off onto relevant tangential material and be able to juggle all of that and still make sure the class stayed on schedule. Both keeping the flow going and also entertaining enough to hold students’ attention are key to holding a successful class. When elements of a topic became more of a debatable issue, he was quick to not only tackle the material in discussion, but to try this out right away in the lab environment to resolve it. The same pertained to demonstrating some themes that could benefit from a live demo as opposed to explaining them just verbally. Another strong point was his adding his own drawings to material to further clarify certain illustrations, where additional examples and explanations were helpful.

SUMMARY

All told, I found the course well structured, very relevant to the product and the working materials to be top notch. The course is attuned to the core product itself and all of its features, so all variations of the product editions are covered.

Positive points:

  • Good breadth of material
  • High-quality eCourseware materials
  • Well-presented illustrations and examples in the class material
  • Q&A incorporated into the eCourseware book
  • Ability to save course notes and permanent access to them
  • Relevant lab exercises matching the presented material
  • Real-life troubleshooting (nothing ever runs perfectly!)
  • Excellent instructor

Desiderata:

  • More “bonus” lab materials for those who want to dive deeper into topics
  • More time spent on networking and storage
  • A more responsive lab environment (which was slow at times)
  • More coverage of more complex storage Xenmotion cases in the lecture and lab

In short, this is a class that fulfills the needs of anyone from just learning about XenServer to even experienced administrators who want to dive more deeply into some of the additional features and differences that have been introduced in this latest XS 6.5 SP1 release. CXS-300: Citrix XenServer 6.5 SP1 Administration represents a makeover in every sense of the word, and I would say the end result is truly admirable.

Continue reading
16597 Hits
0 Comments

Configuring XenApp to use two NVIDIA GRID engines

SUMMARY

The configuration of a XenApp virtual machine (VM) hosted on XenServer that supports two concurrent graphics processing engines in passthrough mode is shown to work reliably and provide the opportunity to give more flexibility to a single XenApp VM rather than having to spread the access to the engines over two separate XenApp VMs. This in turn can provide more flexibility, save operating system licensing costs and ostensibly, could be extended to incorporate additional GPU engines.

INTRODUCTION

A XenApp virtual machine (VM) that supports two or more concurrent graphics processing units (GPUs) has a number of advantages over running separate VM instances, each with its own GPU engine. For one, if users happen to be unevenly relegated to particular XenApp instances, some XenApp VMs may idle while other instances are overloaded, to the detriment of users associated with busy instances. It is also simpler to add capacity to such a VM as opposed to building and licensing yet another Windows Server VM.  This study made use of an NVIDIA GRID K2 (driver release 340.66), comprised of two Kepler GK104 engines and 8 GB of GDDR5 RAM (4 GB per GPU). It is hosted in a base system that consists of a Dell R720 with dual Intel Xeon E5-2680 v2 CPUs (40 VCPUs, total, hyperthreaded) hosting XenServer 6.2 SP1 running XenApp 7.6 as a VM with 16 VCPUs and 16 GB of memory on Windows 2012 R2 Datacenter.

PROCEDURE

It is important to note that these steps constitute changes that are not officially supported by Citrix or NVIDIA and are to be regarded as purely experimental at this stage.

Registry changes to XenApp were made according to these instructions provided in the Citrix Product Documentation.

On the XenServer, first list devices and look for GRID instances:
# lspci|grep -i nvid
44:00.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K2] (rev a1)
45:00.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K2] (rev a1)

Next, get the UUID of the VM:
# xe vm-list
uuid ( RO)           : 0c8a22cf-461f-0030-44df-2e56e9ac00a4
     name-label ( RW): TST-Win7-vmtst1
    power-state ( RO): running
uuid ( RO)           : 934c889e-ebe9-b85f-175c-9aab0628667c
     name-label ( RW): DEV-xapp
    power-state ( RO): running

Get the address of the existing GPU engine, if one is currently associated:
# xe vm-param-get param-name=other-config uuid=934c889e-ebe9-b85f-175c-9aab0628667c
vgpu_pci: 0/0000:44:00.0; pci: 0/0000:44:0.0; mac_seed: d229f84d-73cc-e5a5-d105-f5a3e87b82b7; install-methods: cdrom; base_template_name: Windows Server 2012 (64-bit)
(Note: ignore any vgpu_pci parameters that are irrelevant now to this process, but may be left over from earlier procedures and experiments.)

Dissociate the GPU via XenCenter or via the CLI, set GPU type to “none”.
Then, add both GPU engines following the recommendations in assigning multiple GPUs to a VM in XenServer using the other-config:pci parameter:
# xe vm-param-set uuid=934c889e-ebe9-b85f-175c-9aab0628667c
   other-config:pci=0/0000:44:0.0,0/0000:45:0.0
In other words, do not use the vgpu_pci parameter at all.

Check if the new parameters took hold:
# xe vm-param-get param-name=other-config uuid=934c889e-ebe9-b85f-175c-9aab0628667c params=all
vgpu_pci: 0/0000:44:00.0; pci: 0/0000:44:0.0,0/0000:45:0.0; mac_seed: d229f84d-73cc-e5a5-d105-f5a3e87b82b7; install-methods: cdrom; base_template_name: Windows Server 2012 (64-bit)
Next, turn GPU passthrough back on for the VM in XenCenter or via the CLI and start up the VM.

On the XenServer you should now see no GPUs available:
# nvidia-smi
Failed to initialize NVML: Unknown Error
This is good, as both K2 engines now have been allocated to the XenApp server.
On the XenServer you can also run “xn –v pci-list  934c889e-ebe9-b85f-175c-9aab0628667c” (the UUID of the VM) and should see the same two PCI devices allocated:
# xn -v pci-list 934c889e-ebe9-b85f-175c-9aab0628667c
id         pos bdf
0000:44:00.0 2   0000:44:00.0
0000:45:00.0 1   0000:45:00.0
More information can be gleaned from the “xn diagnostics” command.

Next, log onto the XenApp VM and check settings using nvidia-smi.exe. The output will resemble that of the image in Figure 1.

 

 GRID-Fig-1.jpg
Figure 1. Output From the nvidia-smi utility, showing the allocation of both K2 engines.


Note the output shows correctly that 4096 MiB of memory are allocated for each of the two engines in the K2, totaling its full capacity of 8196 MiB. XenCenter will still show only one GPU engine allocated (see Figure 2) since it is not aware that both are allocated to the XenApp VM and has currently no way of making that distinction.

 

GRID-Fig-2.jpgFigure 2. XenCenter GPU allocation (showing just one engine – all XenServer is currently capable of displaying)

 

So, how can you tell if it is really using both GRID engines? If you run the nvidia-smi.exe program on the XenApp VM itself, you will see it has two GPUs configured in passthrough mode (see the earlier screenshot in Figure 1). Depending on how apps are launched, you will see one or the other or both of them active.  As a test, we ran two concurrent Unigine "Heaven" benchmark instances and both came out with the same metrics within 1% of each other as well as when just one instance was run, and both engines showed as being active. Displayed in Figure 3 is a sample screenshot of the Unigine ”Heaven” benchmark running with one active instance; note that it sees both K2 engines present, even though the process is making use of just one.


GRID-Fig-3.jpg
Figure 3. A sample Unigine “Heaven” benchmark frame. Note the two sets of K2 engine metrics displayed in the upper right corner.


It is evident from the display in the upper right hand corner that one engine has allocated memory and is working, as evidenced by the correspondingly higher temperature reading and memory frequency. The result of a benchmark using openGL and a 1024x768 pixel resolution is seen in Figure 4. Note again the difference between what is shown for the two engines, in particular the memory and temperature parameters.

 GRID-Fig-4.jpg

Figure 4. Outcome of the benchmark. Note the higher memory and temperature on the second K2 engine.

 

When another instance is running concurrently, you see its memory and temperature also rise accordingly in addition to the load evident on the first engine, as well as activity on both engines in the output from the nvidia-smi.exe utility (Figure 5).


CaptureDualK2c.JPG
Figure 5. Two simultaneous benchmarks running, using both GRID K2 engines, and the nvidia-smi output.

You can also see with two instances running concurrently how the load is affected. Note in the performance graphs from XenCenter shown in Figure 6 how one copy of the “Heaven” benchmark impacts the server and then about halfway across the graphs, a second instance is launched.

 GRID-Fig-6.jpg

Figure 6. XenCenter performance metrics of first one, then a second concurrent Unigine “Heaven” benchmark.


CONCLUSIONS

The combination of two GRID K2 engines associated with a single, hefty XenApp VM works well for providing adequate capacity to support a number of concurrent users in GPU passthrough mode without the need of hosting additional XenApp instances. As there is a fair amount of leeway in the allocation of CPUs and memory to a virtualized instance under XenServer (up to 16 vCPUs and 128 GB of memory under XenServer 6.2 when these tests were run), one XenApp VM should be able to handle a reasonably large number of tasks.  As many as six concurrent sessions of this high-demand benchmark with 800x600 high-resolution settings have been tested with the GPUs still not saturating. A more typical application, like Google Earth, consumes around 3 to 5% of the cycles of a GRID K2 engine per instance during active use, depending on the activity and size of the window, so fairly minimal. In other words, twenty or more sessions could be handled by each engine, or potentially 40 or more for the entire GRID K2 with a single XenApp VM, provided of course that the XenApp’s memory and its own CPU resources are not overly taxed.

XenServer 6.2 already supports as many as eight physical GPUs per host, so as servers expand, one could envision having even more available engines that could be associated with a particular VM. Under some circumstances, passthrough mode affords more flexibility and makes better use of resources compared to creating specific vGPU assignments. Windows Server 2012 R2 Datacenter supports up to 64 sockets and 4 TB of memory, and hence should be able to support a significantly larger number of associated GPUs. XenServer 6.2 SP1 has a processor limit of 16 VCPUs and 128 GB of virtual memory. XenServer 6.5, officially released in January 2015, supports up to four K2 GRID cards in some physical servers and up to 192 GB of RAM per VM for some guest operating systems as does the newer release documented in the XenServer 6.5 SP1 User's Guide, so there is a lot of potential processing capacity available. Hence, a very large XenApp VM could be created that delivers a lot of raw power with substantial Microsoft server licensing savings. The performance meter shown above clearly indicates that VCPUs are the primary limiting factor in the XenApp configuration and with just two concurrent “Heaven” sessions running, about a fourth of the available CPU capacity is consumed compared to less than 3 GB of RAM, which is only a small additional amount of memory above that allocated by the first session.

These same tests were run after upgrading to XenServer 6.5 and with newer versions of the NVIDIA GRID drivers and continue to work as before. At various times, this configuration was run for many weeks on end with no stability issues or errors detected during the entire time.

ACKNOWLEDGEMENTS

I would like to thank my co-worker at NAU, Timothy Cochran, for assistance with the configuration of the Windows VMs used in this study. I am also indebted to Rachel Berry, Product Manager of HDX Graphics at Citrix and her team, as well as Thomas Poppelgaard and also Jason Southern of the NVIDIA Corporation for a number of stimulating discussions. Finally, I would like to greatly thank Will Wade of NVIDIA for making available the GRID K2 used in this study.

Continue reading
16977 Hits
0 Comments

XenServer 6.5 and Asymmetric Logical Unit Access (ALUA) for iSCSI Devices

INTRODUCTION

There are a number of ways to connect storage devices to XenServer hosts and pools, including local storage, HBA SAS and fiber channel, NFS and iSCSI. With iSCSI, there are a number of implementation variations including support for multipathing with both active/active and active/passive configurations, plus the ability to support so-called “jumbo frames” where the MTU is increased from 1500 to typically 9000 to optimize frame transmissions. One of the lesser-known and somewhat esoteric iSCSI options available on many modern iSCSI-based storage devices is Asymmetric Logical Unit Access (ALUA), a protocol that has been around for a decade and is furthermore mysterious and intriguing because of its ability to be used not only with iSCSI, but also with fiber channel storage. The purpose of this article is an attempt to both clarify and outline how ALUA can be used more flexibly now with iSCSI on XenServer 6.5.

HISTORY

ALUA support on XenServer goes way back to XenServer 5.6 and initially only with fiber channel devices. The support of iSCSI ALUA connectivity started on XenServer 6.0 and was initially limited to specific ALUA-capable devices, which included the EMC Clariion, NetApp FAS as well as the EMC VMAX and VNX series. Each device required specific multipath.conf file configurations to properly integrate with the server used to access them, XenServer being no exception. The upstream XenServer code also required customizations. The "How to Configure ALUA Multipathing on XenServer 6.x for Enterprise Arrays" article CTX132976 (March 2014, revised March 2015) currently only discusses ALUA support through XenServer 6.2 and only for specific devices, stating: “Most significant is the usability enhancement for ALUA; for EMC™ VNX™ and NetApp™ FAS™, XenServer will automatically configure for ALUA if an ALUA-capable LUN is attached”.

It was announced in the XenServer 6.5 Release Notes that XenServer will automatically connect to one of these aforementioned documented devices and it is now running the updated device mapper multipath (DMMP) version 0.4.9-72. This rekindled my interest in ALUA connectivity and after some research and discussions with Citrix and Dell about support, it appeared this might now be possible specifically for the Dell MD3600i units we have used on XenServer pools for some time now. What is not stated in the release notes is that XenServer 6.5 now has the ability to connect generically to a large number of ALUA-capable storage arrays. This will be gone into detail later. It is also of note that MPP-RDAC support is no longer available in XenServer 6.5 and DMMP is the exclusive multipath mechanism supported. This was in part because of support and vendor-specific issues (see, for example, the XenServer 6.5 Release Notes or this document from Dell, Inc.).

But first, how are ALUA connections even established? And perhaps of greater interest, what are the benefits of ALUA in the first place?

ALUA DEFINITIONS AND SETTINGS

As the name suggests, ALUA is intended to optimize storage traffic by making use of optimized paths. With multipathing and multiple controllers, there are a number of paths a packet can take to reach its destination. With two controllers on a storage array and two NICs dedicated to iSCSI traffic on a host, there are four possible paths to a storage Logical Unit Number (LUN). On the XenServer side, LUNs then are associated with storage repositories (SRs). ALUA recognizes that once an initial path is established to a LUN that any multipathing activity destined for that same LUN is better served if routed through the same storage array controller. It attempts to do so as much as possible, unless of course a failure forces the connection to have to take an alternative path. ALUA connections fall into five self-explanatory categories (listed along with their associated hex codes):

  • Active/Optimized : x0
  • Active/Non-Optimized : x1
  • Standby : x2
  • Unavailable : x3
  • Transitioning : xf

For ALUA to work, it is understood that an active/active storage path is required and furthermore that an asymmetrical active/active mechanism is involved. The advantage of ALUA comes from less fragmentation of packet traffic by routing if at all possible both paths of the multipath connection via the same storage array controller as the extra path through a different controller is less efficient. It is very difficult to locate specific metrics on the overall gains, but hints of up to 20% can be found in on-line articles (e.g., this openBench Labs report on Nexsan), hence this is not an insignificant amount and potentially more significant that gains reached by implementing jumbo frames. It should be noted that the debate continues to this day regarding the benefits of jumbo frames and to what degree, if any, they are beneficial. Among numerous articles to be found are: The Great Jumbo Frames Debate from Michael Webster, Jumbo Frames or Not - Purdue University Research, Jumbo Frames Comparison Testing, and MTU Issues from ESNet. Each installation environment will have its idiosyncrasies and it is best to conduct tests within one's unique configuration to evaluate such options.

The SCSI Architecture Model version defines these SCSI Primary Commands (SPC-3) used to determine paths. The mechanism by which this is accomplished is target port group support (TPGS). The characteristics of a path can be read via an RTPG command or set with an STPG command. With ALUA, non-preferred controller paths are used only for fail-over purposes. This is illustrated in Figure 1, where an optimized network connection is shown in red, taking advantage of routing all the storage network traffic via Node A (e.g., storage controller module 0) to LUN A (e.g., 2).

 

b2ap3_thumbnail_ALUAfig1.jpg

Figure 1.  ALUA connections, with the active/optimized paths to Node A shown as red lines and the active/non-optimized paths shown as dotted black lines.

 

Various SPC commands are provided as utilities within the sg3_utils (SCSI generic) Linux package.

There are other ways to make such queries, for example, VMware has a “esxcli nmp device list” command and NetApp appliances support “igroup” commands that will provide direct information about ALUA-related connections.

Let us first examine a generic Linux server containing ALUA support connected to an ALUA-capable device. In general, note that this will entail a specific configuration to the /etc/multipath.conf file and typical entries, especially for some older arrays or XenServer versions, will use one or more explicit configuration parameters such as:

  • hardware_handler ”1 alua”
  • prio “alua”
  • path_checker “alua”

Consulting the Citrix knowledge base article CTX132976, we see for example the EMC Corporation DGC Clariion device makes use of an entry configured as:

        device{
                vendor "DGC"
                product "*"
                path_grouping_policy group_by_prio
                getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
                prio_callout "/sbin/mpath_prio_emc /dev/%n"
                hardware_handler "1 alua"
                no_path_retry 300
                path_checker emc_clariion
                failback immediate
        }

To investigate the multipath configuration in more detail, we can make use of the TPGS setting. The TPGS setting can be read using the sg_rtpg command. By using multiple “v” flags to increase verbosity and “d” to specify the decoding of the status code descriptor returned for the asymmetric access state, we might see something like the following for one of the paths:

# sg_rtpg -vvd /dev/sde
open /dev/sdg with flags=0x802
    report target port groups cdb: a3 0a 00 00 00 00 00 00 04 00 00 00
    report target port group: requested 1024 bytes but got 116 bytes
Report list length = 116
Report target port groups:
  target port group id : 0x1 , Pref=0
    target port group asymmetric access state : 0x01 (active/non optimized)
    T_SUP : 0, O_SUP : 0, U_SUP : 1, S_SUP : 0, AN_SUP : 1, AO_SUP : 1
    status code : 0x01 (target port asym. state changed by SET TARGET PORT GROUPS command)
    vendor unique status : 0x00
    target port count : 02
    Relative target port ids:
      0x01
      0x02
(--snip--)

Noting the boldfaced characters above, we see here specifically that target port ID 1 is an active/non-optimized ALUA path, both from the “target port group id” line as well as from the “status code”. We also see there are two paths identified, with target port IDs 1,1 and 1,2.

There are a slew of additional “sg” commands, such as the sg_inq command, often used with the flag “-p 0x83” to get the VPD (vital product data) page of interest, sg_rdac, etc. The sg_inq command will in general return, in fact, TPGS > 0 for devices that support ALUA. More on that will be discussed later on in this article. One additional command of particular interest, because not all storage arrays in fact support target port group queries (more also on this important point later!), is sg_vpd (sg vital product data fetcher), as it does not require TPG access. The base syntax of interest here is:

sg_vpd –p 0xc9 –hex /dev/…

where “/dev/…” should be the full path to the device in question. Looking at an example of the output of a real such device, we get:

# sg_vpd -p 0xc9 --hex /dev/mapper/mpathb1
Volume access control (RDAC) VPD Page:
00     00 c9 00 2c 76 61 63 31  f1 01 00 01 01 01 00 00    ...,vac1........
10     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00    ................
20     00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00    ................

If one reads the source code for various device handlers (see the multipath tools hardware table for an extensive list of hardware profiles as well as the Linux SCSI device handler regarding how the data are interpreted through the device handler), one can determine that the value of interest here is that of avte_cvp (part of the RDAC c9_inquiry structure), which is the sixth hex value, and will indicate if the connected device is using ALUA (if shifted right five bits together with a logical AND with 0x1, in the RDAC world, known as IOSHIP mode), AVT, or Automatic Volume Transfer mode (if shifted right seven bits together with a logical AND with 0x1), or otherwise defaults in general to basic RDAC (legacy) mode. In the case above we see “61” returned (indicated in boldface), so (0x61 >> 5 & 0x1) is equal to 1, and hence the above connection is indeed an ALUA RDAC-based connection.

I will revisit sg commands once again later on. Do note that the sg3_utils package is not installed on stock XenServer distributions and as with any external package, the installation of external packages may void any official Citrix support.

MULTIPATH CONFIGURATIONS AND REPORTS

In addition to all the information that various sg commands provide, there is also an abundance of information available from the standard multipath command. We saw a sample multipath.conf file earlier, and at least with many standard Linux OS versions and ALUA-capable arrays, information on the multipath status can be more readily obtained using stock multipath commands.

For example, on an ALUA-enabled connection we might see output similar to the following from a “multipath –ll” command (there will be a number of variations in output, depending on the version, verbosity and implementation of the multipath utility):

mpath2 (3600601602df02d00abe0159e5c21e111) dm-4 DGC,VRAID
[size=100G][features=1 queue_if_no_path][hwhandler=1 alua][rw]
_ round-robin 0 [prio=50][active]
 _ 1:0:3:20  sds   70:724   [active][ready]
 _ 0:0:1:20  sdk   67:262   [active][ready]
_ round-robin 0 [prio=10][enabled]
 _ 0:0:2:20  sde   8:592    [active][ready]
 _ 1:0:2:20  sdx   128:592  [active][ready]

Recalling the device sde from the section above, note that it falls under a path with a lower priority of 10,  indicating it is part of an active, non-optimized network connection vs. 50, which indicates being in an active, optimized group; a priority of “1” would indicate the device is in the standby group. Depending on what mechanism is used to generate the priority values, be aware that these priority values will vary considerably; the most important point is that whatever path has a higher “prio” value will be the optimized path. In some newer versions of the multipath utility, the string “hwhandler=1 alua” shows clearly that the controller is configured to allow the hardware handler to help establish the multipathing policy as well as that ALUA is established for this device. I have read that the path priority will be elevated to typically a value of between 50 and 80 for optimized ALUA-based connections (cf. mpath_prio_alua in this Suse article), but have not seen this consistently.

The multipath.conf file itself has traditionally needed tailoring to each specific device. It is particularly convenient, however, that using a generic configuration is now possible for a device that makes use of the internal hardware handler and is rdac-based and can auto-negotiate an ALUA connection. The italicized entries below represent the specific device itself, but others should now work using this generic sort of connection:

device {
                vendor                  "DELL"
                product                 "MD36xx(i|f)"
                features                "2 pg_init_retries 50"
                hardware_handler        "1 rdac"
                path_selector           "round-robin 0"
                path_grouping_policy    group_by_prio
                failback                immediate
                rr_min_io               100
                path_checker            rdac
                prio                    rdac
                no_path_retry           30
                detect_prio             yes
                retain_attached_hw_handler yes
        }

Note how this differs (the additional entries above are in boldface type) from the “stock” version (in XenServer 6.5) of the MD36xx multipath configuration:

device {
                vendor                  "DELL"
                product                 "MD36xx(i|f)"
                features                "2 pg_init_retries 50"
                hardware_handler        "1 rdac"
                path_selector           "round-robin 0"
                path_grouping_policy    group_by_prio
                failback                immediate
                rr_min_io               100
                path_checker            rdac
                prio                    rdac
                no_path_retry           30
        }

THE CURIOUS CASE OF DELL MD32XX/36XX ARRAY CONTROLLERS

The LSI controllers incorporated into Dell’s MD32xx and MD36xx series of iSCSI storage arrays represent an unusual and interesting case. As promised earlier, we will get back to looking at the sg_inq command, which queries a storage device for several pieces of information, including TPGS. Typically, an array that supports ALUA will return a value of TPGS > 0, for example:

# sg_inq /dev/sda
standard INQUIRY:
PQual=0 Device_type=0 RMB=0 version=0x04 [SPC-2]
[AERC=0] [TrmTsk=0] NormACA=1 HiSUP=1 Resp_data_format=2
SCCS=0 ACC=0 TPGS=1 3PC=1 Protect=0 BQue=0
EncServ=0 MultiP=1 (VS=0) [MChngr=0] [ACKREQQ=0] Addr16=0
[RelAdr=0] WBus16=0 Sync=0 Linked=0 [TranDis=0] CmdQue=1
[SPI: Clocking=0x0 QAS=0 IUS=0]
length=117 (0x75) Peripheral device type: disk
Vendor identification: NETAPP
Product identification: LUN
Product revision level: 811a

Highlighted in boldface, we see in this case above that TPGS is reported to have a value of 1. The MD36xx has supported ALUA since RAID controller firmware 07.84.00.64 and  NVSRAM  N26X0-784890-904, however, even with that (or newer) revision level, an sg_inq returns the following for this particular storage array:

# sg_inq /dev/mapper/36782bcb0002c039d00005f7851dd65de
standard INQUIRY:
  PQual=0  Device_type=0  RMB=0  version=0x05  [SPC-3]
  [AERC=0]  [TrmTsk=0]  NormACA=1  HiSUP=1  Resp_data_format=2
  SCCS=0  ACC=0  TPGS=0  3PC=1  Protect=0  BQue=0
  EncServ=1  MultiP=1 (VS=0)  [MChngr=0]  [ACKREQQ=0]  Addr16=0
  [RelAdr=0]  WBus16=1  Sync=1  Linked=0  [TranDis=0]  CmdQue=1
  [SPI: Clocking=0x0  QAS=0  IUS=0]
    length=74 (0x4a)   Peripheral device type: disk
 Vendor identification: DELL
 Product identification: MD36xxi
 Product revision level: 0784
 Unit serial number: 142002I

Various attempts to modify the multipath.conf file to try to force TPGS to appear with any value greater than zero all failed. Above all, it seemed that without access to the TPGS command, there was no way to query the device for ALUA-related information.  Furthermore, the command mpath_prio_alua and similar commands appear to have been deprecated in newer versions of the device-mapper-multipath package, and so offer no help.

This proved to be a major roadblock in making any progress. Ultimately it turned out that the key to looking for ALUA connectivity in this particular case comes oddly from ignoring what TPGS reports, and rather focusing on what the MD36xx controller is doing. What is going on here is that the hardware handler is taking over control and the clue comes from the sg_vpd output shown above. To see how a LUN is mapped for these particular devices, one needs to hunt back through the /var/log/messages file for entries that appear when the LUN was first attached. To investigate this for the MD36xx array, we know it uses the internal “rdac” connection mechanism for the hardware handler, so a Linux grep command for “rdac” in the /var/log/messages file around the time the connection was established to a LUN should reveal how it was established.

Sure enough, if one looks at a case where the connection is known to not be making use of ALUA, you might see entries such as these:

[   98.790309] rdac: device handler registered
[   98.796762] sd 4:0:0:0: rdac: AVT mode detected
[   98.796981] sd 4:0:0:0: rdac: LUN 0 (owned (AVT mode))
[   98.797672] sd 5:0:0:0: rdac: AVT mode detected
[   98.797883] sd 5:0:0:0: rdac: LUN 0 (owned (AVT mode))
[   98.798590] sd 6:0:0:0: rdac: AVT mode detected
[   98.798811] sd 6:0:0:0: rdac: LUN 0 (owned (AVT mode))
[   98.799475] sd 7:0:0:0: rdac: AVT mode detected
[   98.799691] sd 7:0:0:0: rdac: LUN 0 (owned (AVT mode))

In contrast, an ALUA-based connection to LUNs shown below on an MD3600i that has new enough firmware to support ALUA and using an appropriate client that also supports ALUA and has a properly configured entry in the /etc/multipath.conf file will instead show the IOSHIP connection mechanism (see p. 124 of this IBM System Storage manual for more on I/O Shipping):

Mar 11 09:45:45 xs65test kernel: [   70.823257] scsi 8:0:0:1: rdac: LUN 1 (IOSHIP) (owned)
Mar 11 09:45:46 xs65test kernel: [   71.385835] scsi 9:0:0:0: rdac: LUN 0 (IOSHIP) (unowned)
Mar 11 09:45:46 xs65test kernel: [   71.389345] scsi 9:0:0:1: rdac: LUN 1 (IOSHIP) (owned)
Mar 11 09:45:46 xs65test kernel: [   71.957649] scsi 10:0:0:0: rdac: LUN 0 (IOSHIP) (owned)
Mar 11 09:45:46 xs65test kernel: [   71.961788] scsi 10:0:0:1: rdac: LUN 1 (IOSHIP) (unowned)
Mar 11 09:45:47 xs65test kernel: [   72.531325] scsi 11:0:0:0: rdac: LUN 0 (IOSHIP) (owned)

Hence, we happily recognize that indeed, ALUA is working.

The even better news is that not only is ALUA now functional in XenServer 6.5 but should, in fact, work now with a large number of ALUA-capable storage arrays, both with custom configuration needs as well as potentially many that may work generically. Another surprising find was that for the MD3600i arrays tested, it turns out that even the “stock” version of the MD36xxi multipath configuration entry provided with XenServer 6.5 creates ALUA connections. The reason for this is that the hardware handler is being used consistently, provided no specific profile overrides are intercepted, and so primarily the storage device is doing the negotiation itself instead of being driven by the file-based configuration. This is what made the determination of ALUA connectivity more difficult, namely that the TPGS setting was never changed from zero and could consequently not be used to query for the group settings.

CONCLUSIONS

First off, it is really nice to know now that many modern storage devices support ALUA and that XenServer 6.5 now provides an easier means to leverage this protocol. It is also a lesson that documentation can be either hard to find and in some cases, is in need of being updated to reflect the current state. Individual vendors will generally provide specific instructions regarding iSCSI connectivity, and should of course be followed. Experimentation is best carried out on non-production servers where a major faux pas will not have catastrophic consequences.

To me, this was also a lesson in persistence as well as an opportunity to share the curiosity and knowledge among a number of individuals who were helpful throughout this process. Above all, among many who deserve thanks, I would like to thank in particular Justin Bovee from Dell and Robert Breker of Citrix for numerous valuable conversations and information exchanges.

Recent Comments
JK Benedict
POETRY! Thank you so much for this effort, Tobias!!!
Monday, 20 April 2015 14:01
Loren Saxby
This is excellent! It's amazing how certain features are overlooked or never brought up to begin with. Articles like this one shed... Read More
Wednesday, 06 May 2015 06:01
Tobias Kreidl
Thank you for all your collective comments. Shedding some light on the obscure can be very rewarding, even if only a small audienc... Read More
Sunday, 10 May 2015 23:38
Continue reading
29515 Hits
9 Comments

About XenServer

XenServer is the leading open source virtualization platform, powered by the Xen Project hypervisor and the XAPI toolstack. It is used in the world's largest clouds and enterprises.
 
Commercial support for XenServer is available from Citrix.