I will in this article brush up some of the exciting stuff that NVIDIA announced last week at their GPU conference. Some of the big news was that VMware DaaS supports NVIDIA GRID technology with vSGA & vDGA. NVIDIA GRID vGPU technology will be GA with VMware vSphere in 2015, this is great news that VMware and NVIDIA is working close together and there will be a beta available later this year, so customers can start evaluate. If they wanna use vGPU with NVIDIA GRID the only Hypervisor is Citrix XenServer.

Lets look at which new technologies NVIDIA CEO Jen-Hsun Huang unveiled 25th March 2014 at NVIDIA GTC.

upcoming GPU chip PASCAL

Pascal is the new GPU family that will follow this year’s Maxwell GPUs.

Pascal module

Named for 17th century French mathematician Blaise Pascal, our next-generation family of GPUs will include three key new features: stacked DRAM, unified memory, and NVLink.

  • 3D Memory: Stacks DRAM chips into dense modules with wide interfaces, and brings them inside the same package as the GPU. This lets GPUs get data from memory more quickly – boosting throughput and efficiency – allowing us to build more compact GPUs that put more power into smaller devices. The result: several times greater bandwidth, more than twice the memory capacity and quadrupled energy efficiency.
  • Unified Memory: This will make building applications that take advantage of what both GPUs and CPUs can do quicker and easier by allowing the CPU to access the GPU’s memory, and the GPU to access the CPU’s memory, so developers don’t have to allocate resources between the two.
  • NVLink: Today’s computers are constrained by the speed at which data can move between the CPU and GPU. NVLink puts a fatter pipe between the CPU and GPU, allowing data to flow at more than 80-200GB per second, compared to the 16GB per second available now.
  • Pascal Module: NVIDIA has designed a module to house Pascal GPUs with NVLink. At one-third the size of the standard boards used today, they’ll put the power of GPUs into more compact form factors than ever before.

Pascal is due in 2016.


NVIDIA announced a new interconnect called NVLink which enables the next step in harnessing the full potential of the accelerator, and the Pascal GPU architecture with stacked memory, slated for 2016.

Stacked Memory

Pascal will support stacked memory, a technology which enables multiple layers of DRAM components to be integrated vertically on the package along with the GPU. Stacked memory provides several times greater bandwidth, more than twice the capacity, and quadrupled energy efficiency, compared to current off-package GDDR5. Stacked memory lets us combine large, high-bandwidth memory in the same package with the GPU, allowing us to place the place the voltage regulators close to the chip for efficient power delivery. Stacked Memory, combined with a new Pascal module that is one-third the size of current PCIe boards, will enable us to build denser solutions than ever before.

Outpacing PCI Express

Today a typical system has one or more GPUs connected to a CPU using PCI Express. Even at the fastest PCIe 3.0 speeds (8 Giga-transfers per second per lane) and with the widest supported links (16 lanes) the bandwidth provided over this link pales in comparison to the bandwidth available between the CPU and its system memory. In a multi-GPU system, the problem is compounded if a PCIe switch is used. With a switch, the limited PCIe bandwidth to the CPU memory is shared between the GPUs. The resource contention gets even worse when peer-to-peer GPU traffic is factored in.

NVLink addresses this problem by providing a more energy-efficient, high-bandwidth path between the GPU and the CPU at data rates 5 to 12 times that of the current PCIe Gen3. NVLink will provide between 80 and 200 GB/s of bandwidth, allowing the GPU full-bandwidth access to the CPU’s memory system.

A Flexible and Energy-Efficient Interconnect

The basic building block for NVLink is a high-speed, 8-lane, differential, dual simplex bidirectional link. Our Pascal GPUs will support a number of these links, providing configuration flexibility. The links can be ganged together to form a single GPU↔CPU connection or used individually to create a network of GPU↔CPU and GPU↔GPU connections allowing for fast, efficient data sharing between the compute elements.

When connected to a CPU that does not support NVLink, the interconnect can be wholly devoted to peer GPU-to-GPU connections enabling previously unavailable opportunities for GPU clustering.

Moving data takes energy, which is why we are focusing on making NVLink a very energy efficient interconnect. NVLink is more than twice as efficient as a PCIe 3.0 connection, balancing connectivity and energy efficiency.

Understanding the value of the current ecosystem, in an NVLink-enabled system, CPU-initiated transactions such as control and configuration are still directed over a PCIe connection, while any GPU-initiated transactions use NVLink. This allows us to preserve the PCIe programming model while presenting a huge upside in connection bandwidth.

What NVLink and Stacked Memory Mean for Developers

Today, developers devote a lot of effort to optimizing and avoiding PCIe transfer bottlenecks. Current applications that have devoted time to maximizing concurrency of computation and communication will enjoy a boost from the enhanced connection.

NVLink and stacked memory enable acceleration of a whole new class of applications. The large increase in GPU memory size and bandwidth provided by stacked memory will enable GPU applications to access a much larger working set of data at higher bandwidth, improving efficiency and computational throughput, and reducing the frequency of off-GPU transfers. Crafting and optimizing applications that can exploit the massive GPU memory bandwidth as well as the CPU↔GPU and GPU↔GPU bandwidth provided by NVLink will allow you to take the next steps towards exascale computing.

Starting with CUDA 6, Unified Memory simplifies memory management by giving you a single pointer to your data, and automatically migrating pages on access to the processor that needs them. On Pascal GPUs, Unified Memory and NVLink will provide the ultimate combination of simplicity and performance. The full-bandwidth access to the CPU’s memory system enabled by NVLink means that NVIDIA’s GPU can access data in the CPU’s memory at the same rate as the CPU can. With the GPU’s superior streaming ability, the GPU will sometimes be able to stream data out of the CPU’s memory system even faster than the CPU.



This GPU is for the consumer market.
Built around two Kepler GPUs and 12GB of dedicated frame buffer memory, 5760 CUDA “processing cores” or 2880 CUDA cores per GPU.


NVIDIA have released a new physically VCA appliance with the name VCA IRAY. Price will be only $ 50.000 for 1 VCA IRAY appliance and includes an Iray license and the first year of maintenance and updates. GA is Summer 2014.

  • The big question is which GPU is in this appliance, in the existing VCA its 8x GRID K2, in the new VCA IRAY is it the new TITAN Z or a new GRID GPU?
  • Will the VCA IRAY replace the existing VCA appliance?


VMware embrace NVIDIA GRID


The CTO of VMware was on stage with CEO of NVIDIA and talked about their new partnership how VMware will embrace NVIDIA for their DaaS strategy and their hypervisor ESX.

NVIDIA vSGA/vDGA for VMware Daas

VMware’s Horizon DaaS (a.k.a. Desktone) platform now supports both vSGA and vDGA GPU virtualization options with NVIDIA GRID GPU’s.

The Horizon DaaS solution is available today. Navisite will be first service provider to deliver this.

NVIDIA vGPU for VMware ESX/vSphere

NVIDIA and VMware is working on with integrating NVIDIA vGPU with ESX/vSphere which will be available in Q3 2014 (BETA), and with general availability(FINAL) in 2015.





VMware DaaS

Hi all

I am next week doing a free live Webinar with fellow CTP, Trond Eirik Håvarstein from XenAppBlog.com, 11th February 2014. (time 14:00 EST (GMT-5))

xenappblog ervik                  poppelgaard_com   Thomas Poppelgaard

This is my favorite topic and I am travel to different parts of the World taking about this subject both at Citrix, NVIDIA GTC, Citrix User Groups, VMware User Groups, other Partner Events, now this is your chance to see my webinar free and live at XenAppBlog.

FYI – there is limited seats so hurry up and sign on here https://xenapptraining.leadpages.net/gpu-in-virtualization-learn-why-its-important/

My topic is “GPU in virtualization, learn why it’s important”

  • Evolution of Virtualized Graphics (Citrix vs VMware)
  • Business drivers for virtualizing applications that requires GPU
  • User Experience – VDI with a GPU vs Shared Desktop with a GPU
  • NVIDIA GRID vGPU, Buzz, How to use it, Sizing, Limitations – Q&A


Join the Free Webinar here *Limited Seats

Citrix 3D Graphics Pack is the new name for the “Citrix Virtual GPU solution” that was introduced in October 2013 in Tech Preview which is NVIDIA vGPU and XenServer/XenDesktop components. The cool thing about this release is that NVIDIA vGPU is now released and no more “technical preview/beta” and the product have been fully built into XenServer and Citrix have created some amazing GUI in XenServer 6.2 XenCenter and XenDesktop 7.1. There is a cool new SDK commands to fully automate the GPU commands if you want to use the CLI method instead of the GUI mode. December 16th, Citrix released support for GPU virtualization using XenDesktop 7.1 HDX 3D Pro with XenServer 6.2 SP1 in the Citrix 3D Graphics Pack (see http://www.citrix.com/go/vgpu). This means that multiple users can share a single GPU, overcoming the 1:1 ratio to achieve higher user densities and create a more cost-effective remote 2D/3D virtualization solution.

  • Kudos to NVIDIA for building the vGPU
  • Kudos to Citrix XenServer team for integrating the vGPU into the Xen
  • Kudos to Citrix XenDesktop team for integrating the vGPU into the XenDesktop
  • Kudos to all the Citrix HDX 3D crew


What is Citrix 3D Graphics Pack

The Citrix 3D Graphics Pack enables true hardware GPU sharing of NVIDIA GRID Graphics cards providing the industry’s highest performance virtualized professional graphics app acceleration. This technology was first unveiled at Citrix Synergy 2013 and allows GPU sharing for Virtual Desktop Infrastructure (VDI) for XenServer, XenDesktop and NVIDIA GRID GPUs.

XenServerGPU pass-through improvements including XenCenter configuration.

Citrix/NVIDIA XenServer w. vGPU Architecture

vgpu architecture

Sizing NVIDIA vGPU profile

pGPU vs vGPU

NVIDIA vGPU profiles are designed different with amount of memory, CUDA cores and frame buffer, amount of display and display resolution pr GRID GPU type.

I have added more informations than Citrix and NVIDIA does in their branding.
vGPU OS support is one of the important things where you clearly see which OS is supported on Pass-through profile vs vGPU profiles.

vGPU profiles

Which GRID to choose


Whats new in XenServer 6.2 SP1

  • The 3D Graphics Pack supporting NVIDIA GRID GPUs
  • Support for Windows 8.1 and Windows Server 2012 R2
  • Improvements to the Site Recovery wizard for large deployments
  • GPU pass-through improvements including XenCenter configuration.
  • New SR wizard allows up to 50 new fibre-channel HBA SRs to be created in a single step.
  • Security Hotfix and functional Hotfix roll-up.
  • New SDK for XenServer 6.2.0 Service Pack 1 is ideal for developers wishing to access programmatically XenServer’s new management features for GPU virtualization (including the new vGPU and GPU pass-through). The five available XenServer SDKs, one for each of C, C#, Java, PowerShell and Python, expose the new XenAPI commands for working with physical GPUs (pGPUs), GPU groups, virtual GPUs (vGPUs) and virtual GPU types. The GPU technologies for XenDesktop and XenServer do of course also come with rich GUI configuration operations and provisioning via XenCenter, XenDesktop and MCS.Citrix last few development cycles they have invested in re-writing XenServer’s PowerShell API to provide developers and administrators with a PoSH alternative to using the XenServer command line (CLI) interface. In particular this interface is proving popular with those looking to automate bespoke vGPU and GPU pass-through configuration and benchmarking or auto-test frameworks. The PowerShell API is also a popular choice for XenDesktop and Windows administrators working with XenServer. Read more about the SDK here

Whats new in NVIDIA GRID vGPU Pack

GA of NVIDIA GRID vGPU Manager + Windows Display Driver

  • Latest NVIDIA GRID vGPU Manager is version  (331.30)
  • Latest NVIDIA GRID vGPU Windows Display Driver (332.07) for Windows 7, Windows 8, Server 2008R2, Server 2012.

Important if you implemented XenServer vGPU tech preview

  • Customers who have previously installed the vGPU Tech Preview (XS62ETP001) on a host, cannot subsequently install Service Pack 1. Customers wishing to install Service Pack 1 will need to do a fresh installation of XenServer 6.2.0, before installing Service Pack 1.

How to implement Citrix 3D Graphics Pack

Download Citrix XenServer 6.2 + SP1
Download NVIDIA GRID vGPU Pack for GRID K1 or GRID K2
Download Citrix XenDesktop 7.1 99 user trial or licensed software here (require MyCitrix ID)

install vgpu

1. Start with a fresh XenServer 6.2 installation on a GRID supported hardware
2. Install Service Pack 1 on the XenServer 6.2
3. Download the NVIDIA GRID vGPU Pack and unzip the contents; install NVIDIA GRID manager in XenServer from the CLI
4. Create a Windows 7 VM (this will be the base image)
5. From the XenCenter GUI, assign a vGPU type to the base image
6. In the Windows 7 VM:
a. Install NVIDIA GPU guest OS driver (available in the NVIDIA GRID vGPU Pack)
b. Install the XenServer Tools
c. Install the latest version of Citrix HDX 3D Pro VDA 7.1
7. Create a Machine Catalog using MCS to provision new VMs based on the base image or you can also use Citrix Provisioning Services (PVS)
8. Create Delivery Group, assign users, and publish the desktops
9. Access virtual desktops using Citrix Receiver. No GPU is required on the end-point devices
10. Validate GPU sharing by multiple desktops, using monitoring tools like Process Explorer from Microsoft

GUI enhancements in XenCenter for XenServer 6.2 SP1

In the Citrix XenCenter, there is a new tab called “GPU” at the host level. The appropriate vGPU types attached to the host are defined in this GUI, and made available to the virtual machines (VM). Depending on the requirements, one can also define the GPU placement policy here. This tab also makes it very convenient to visualize how many vGPU’s are already attached, and the physical GPU’s where they get placed.
This makes later troubleshooting simpler.



At the VM-level in XenCenter, the vGPU can be selected as part of VM properties or during New VMcreation on GPU enabled hosts. In the tech preview, this was a laborious step in the CLI. Now, simply determine the suitable vGPU Profile for your use-case and select it from the drop-down list. Once the VM is created, it boots into the Windows standard 800X600 VGA resolution. The vGPU features are available once the guest driver is installed in the Windows VM.


GPU performance graphs are available under the Performance tab of XenServer host. On first-run, these graphs have to be added to the view. Subsequently, they can be moved up or down and can show one or more of the installed GPUs.



GUI enhancements in XenDesktop 7.1 with XenServer 6.2 SP1

There are few GPU related enhancements on the XenDesktop consoles, and automated-provisioning of vGPU-enabled VMs using Machine Creation Services (MCS) is the one we’ve been waiting for. Simply attach a vGPU to the base VM, install the virtual delivery agent (VDA for HDX 3D Pro), and install the required graphics apps. Then head over to XenDesktop Studio to create the machine catalog. The only part to exercise caution is not to perform a Sysprep after creating a vGPU-enabled base image, else it wipes out the vGPU information.

In Studio, the vGPU Type must be defined while creating the host settings to be used as a platform for the MCS machines.


Subsequently, proceed to creation of a machine catalog as usual. The exact steps are outlined in the Reviewer’s Guide. At the step where MCS base image is chosen, hovering over the image name shows information to confirm if you have a valid vGPU-enabled master image.


The remaining process to create machine catalog, create delivery group, and assign users is no different than the usual way of delivering desktops and apps. Use the latest Citrix Receiver to access 3D apps.

Tweak XenServer 6.2.x for GPU intense applications/performance

Below articles are critical to follow, no matter if you use GPU pass-through or vGPU profiles with your virtual machines.
I have seen many GPU intense applications that uses the Turbo mode or the max CPU clock frequency and if you think Turbo mode work out of the box, think again. Many virtualize their 3D applications and will have an impact on this if this is not configured.

Follow this article How to use host-cpu-tune to fine tune XenServer 6.2.0 performance

Follow this article How to investigate and use Turbo mode, C-States and P-States in XenServer

Tweak Citrix XenDesktop 7.1 HDX 3D Pro.

I have seen multiple issues with performance, now Citrix have officially shared the informations, so shall I help you position how to tweak XenDesktop 7.1 HDX 3D. Following tweaks are for XenDesktop 7.0 & 7.1 VDA’s.

  • With high screen resolutions (such as 2560×1600), a lower than expected Frames per Second (FPS) may be apparent, impacting user experience.
    Change Encodespeed from
    to [HKEY_LOCAL_MACHINE\SOFTWARE\Citrix\Graphics]”EncodeSpeed”=dword:00000001
    registry key to work around this issue.
  • While connecting to high resolution displays (for example: 2560×1600) artifacts of previously opened windows can remain. To ensure that the screen is refreshed, users can add the following registry key:


Software – Download vGPU (GRID Manager + GRID drivers) + XenServer 6.2 SP1 here

Ctx article – How to Resolve GPU Memory Mapping Issues in XenServer *important to check if you want to use vGPU

Citrite Mayunk Jain blogpost Super Easy GPU Sharing with XenDesktop 7.1: Introducing 3D Graphics Pack

Citrite Mayunk Jain Reviewer’s Guide for Delivering 3D Graphics Apps: Part 3 (vGPU)

Citrite Konstantina Chremmou blogpost – Configuring vGPU and GPU pass-through using the PowerShell SDK for XenServer 6.2.0 Service Pack 1

Citrite Rachel Berry blogpost – Configuring XenServer to use Turbo mode – including for 3-D graphically intense applications

Xen Team Advice for developers and partners working with GPUs

Alexander Ervik (CTP)  Shows how to enable NVIDIA vGPU support in XenServer 6.2 SP1 with Dell R720

Citrix blogpost – True hardware GPU sharing with XenDesktop and NVIDIA GRID arrives!

Citrix FlexCast Services: Virtualize 3D professional graphics

Citrix Technical and Training Materials about vGPU & HDX 3D Pro

NVIDIA GRID certified OEM servers

NVIDIA GRID certified applications

At Citrix Synergy in May 2013, I was so excited to see CEO of Citrix & NVIDIA showing, how people can virtualize high performance graphical intense applications with Citrix XenDesktop 7 HDX 3D Pro, XenServer with vGPU support and NVIDIA GRID K2.

vGPU isn’t released yet, but  it’s soon time for true hardware virtualization of the GPU. NVIDIA GRID GPU’s are the “only” GPU’s available on the marked, thats build for cloud computing. What I mean with build for cloud computing, is that the GPU’s have multiple GPU’s build onboard and soon a software component “vGPU” from NVIDIA will be released to XenServer, which enables hardware virtualization of the GPU and reduce the cost and density for high performance workloads. Each virtual machine gets its own reserved GPU ressources of the GPU, before it was a 2:1 with the K2 soon its 8:1 maybe more.. time will tell 😉

If you got a NVIDIA GRID K1 or K2 running in your environment and you would like to virtualize the GPU and get access to the vGPU, please follow next step:
Advice: You need a XenServer as your chosen Hypervisor, vSphere and Hyper-V is not supported with vGPU.

How do you get access to the vGPU:

  1. XenDesktop 7 deployed with XenServer 6.2.  Both of these pieces you can obtain today, and on September 27th Citrix will release a tech preview add-on pack with the technologies to fully enable vGPU access for Windows desktop VDI workloads, extending our high performance GPU sharing capabilities beyond Windows Server RDS workloads.  When you look at the combined solution you’ll find there are some immediate benefits.  The most notable is that with the combined solution applications interact directly with NVIDIA drivers, not hypervisor drivers.  This means greater application compatibility, and greater performance with large 3D models.  Plus it doesn’t hurt that we’re able to natively support the latest versions of both DirectX and OpenGL out of the box. This will be true hardware vGPU with professional graphics performance benefits differentiating it from software vGPU and API intercept technologies such as Remote FX and vSGA which address less demanding 3D use cases like Aero effects and PowerPoint slide transitions
  2. Servers capable of running the NVIDIA GRID K1 or K2 cards. We’ve been working with the major hardware vendors and NVIDIA to bring a new generation of servers optimised for these technologies to market, multi-slot servers capable of supporting NVIDIAs best GPUs . Only a year ago Boeing’s engineers were dedicating a single server to a one user, today the possibilities got a whole lot wider.  Here are some server options to start with, and others do exist.

These new servers will support up to 12 GPUs, and with multiple users sharing each GPU, we expect these servers to support tens if not hundreds of users. Bringing the cost of ownership of a GPU enabled desktop within the range of entirely new markets and applications

Using XenServer vGPU capabilities, applications interact with an NVIDIA driver directly, not a XenServer one.  That means application vendors don’t need to recertify their applications to run with both NVIDIA and Citrix display adapters to leverage the power of XenDesktop 7 with high performance graphics.  Once certified with an NVIDIA driver, users can have every confidence that their applications will also work when accessed remotely via XenDesktop


Hi all

If you missed the webinar I did for NVIDIA GTC express, then the recorded session is now available plus my presentation.


nvidiaForslag til nordisk aftale mellem

GPU Accelerated XenDesktop for Designers and Engineers

Learn following:

History of how you can virtualized graphics from Citrix/NVIDIA.

Which Business drivers that leads to virtualizing 2D/3D applications

Citrix Solutions that are available for virtualizing Applications & Desktop

Customer cases, that have been using the technology since it was available and case studies from 2013

New technologies in Virtualization

– XenDesktop 7


Q & A



Watch the streamed presentation here

Download the presentation here (PDF format)

Listen to the presentation in MP4 format here.