NVIDIA have released the next generation of GRID 2.0. GRID 2.0 is based on the Maxwell architecture and the GRID 1.0 (K1/K2) was based on the Kepler architecture. I have been working with the GRID 1.0 technology since 2012 and it have matured alot in its 2 years of history. When the K1/K2 was released they was first working with GPU pass-through and then vGPU got introduced and you could virtualize the GPUs and increase density, which people wanted. Citrix was with their hypervisor the first company that supported NVIDIA GRID 1.0 and they was also the first company integrating vGPU into their Citrix Studio, so companies could easier provisioning machines with either MCS technology or PVS technology. VMware supported GRID 1.0 vGPU technology in 2015 in their hypervisor VMware vSphere 6.0 and fully integrated with their EUC stack VMware View, so companies can fully provisioning machines. The great thing about GRID 2.0 is that there is no need for a conversation when to choose either a K1 or a K2, if you required GPU compute or GPU framebuffer, M60 are being added to the tope end of the range and bringing 2x the performance, and if you have bladeserver’s, you can add the powerfull vGPU technology into the bladeserver’s with the M6.
Please notice that M6 will 0nly be supporting newer architecture of vendors not old platforms.
Maxwell architecture is the new architecture of GPUs and a powerful GPU you might know is the Titan X
New GPUs GRID 2.0 and specifications
In GRID 2.0 NVIDIA now have a GPU for blade servers a MXM single socket, High-end GPU called M6
In GRID 2.0 NVIDIA replaces K1/K2 with the new PCIe 3.0 Dual Socket, Dual High-end GPU called M60
The M60 delivers 4096 CUDA or compute and 16GB GDDR5 memory/framebuffer
The M60 has 6x the h.264 encoders of the K2, and also Maxwell supports 4:4:4 chroma sub sampling, which is great news for encoders.
NVIDIA GRID 2.0 software is available in three editions that deliver accelerated virtual desktops to support the needs of your users. These editions include Virtual PC, Virtual Workstation, and Virtual Workstation Extended. GRID perpetual licenses are sold by Concurrent User (CCU).
NVIDIA GRID 2.0 (CCU) stands for ConCurrent User. So basically, per running VM as regardless of whether the user is connected to the VM or not, the VM is connected to the GPU and so consumes a license
NVIDIA GRID 2.0 software is much more than a “driver”. While the software package does include a guest driver for Windows and Linux, it also includes the NVIDIA GRID vGPU manager for VMware vSphere and Citrix XenServer, as well as the license server and M6/M60 mode switching utility.
NVIDIA Tesla M6 and M60 profiles are specific to the M6 and M60. There will be similar profiles as to what NVIDIA had on K1 and K2 (512 MB through 4 GB), all with twice the number of users on M6/ M60 compared to K1/K2. Plus, there is an additional 8 GB profile on M6/M60 which also adds support for CUDA, which wasn’t available on K1/K2.
NVIDIA GRID 2.0 is Maxwell only. If you are an existing customer K1/K2 are unchanged and will remain as a parallel option.
The NVIDIA GRID 2.0 solution
GA of NVIDIA GRID 2.0 (M60 and M6) will be 15 September 2015.
To get NVIDIA GRID 2.0 if you are a Citrix customer you need:
Server hardware that supports NVIDIA GRID 2.0 +NVIDIA GPU M60 or M6 + NVIDIA vGPU Software license + Citrix XenDesktop or XenApp License (XenServer is included in XD/XA licenses)
To get NVIDIA GRID 2.0 if you are a VMware customer you need:
Server hardware that supports NVIDIA GRID 2.0 +NVIDIA GPU M60 or M6 + NVIDIA vGPU Software license + VMware Horizon license (Horizon includes vSphere for Desktop)
If you are a Citrix customer that wants to run on VMware vSphere you need:
Server hardware that supports NVIDIA GRID 2.0 + NVIDIA GPU M60 or M6 + NVIDIA vGPU Software license + Citrix XenDesktop or XenApp License + VMware vSphere Enterprise Plus license or vSphere for Desktop license
NVIDIA have released a new GRID Virtual GPU Manager 346.68 for Citrix XenServer 6.5 and VMware vSphere 6.
NVIDIA have in this release also released Windows drivers for vGPU 348.27
The GRID Virtual GPU Manager 346.68 is not updated in this release, its only the Windows drivers for vGPU 348.27 If you have GRID Virtual GPU Manager 346.68 installed in either XenServer or VMware you only need to update your VMs.
The GRID vGPU Manager and Windows guest VM drivers must be installed together.
Older VM drivers will not function correctly with this release of GRID vGPU Manager. Similarly, older GRID vGPU Managers will not function correctly with this release of Windows guest drivers.
What is fixed in Windows driver for vGPU 348.27 VM using Citrix XenServer 6.5
What is fixed in Windows drivers for vGPU 348.27 VM using VMware vSphere 6
Over the last several years, many of us in the industry have discussed the need for community driven End User Computing podcasts focusing on virtualization topics for people designing, deploying, and using Citrix, Microsoft, VMware and surrounding†technologies. I am excited to share that this month, two new Podcasts are being launched! First, a warm congratulations to Jarian Gibson and Andy Morgan on the successful launch of their Podcast, Frontline Chatter. Here’s to many years of continued success! Next, allow me to introduce the End User Computing Podcast!
I will in this article brush up some of the exciting stuff that NVIDIA announced last week at their GPU conference. Some of the big news was that VMware DaaS supports NVIDIA GRID technology with vSGA & vDGA. NVIDIA GRID vGPU technology will be GA with VMware vSphere in 2015, this is great news that VMware and NVIDIA is working close together and there will be a beta available later this year, so customers can start evaluate. If they wanna use vGPU with NVIDIA GRID the only Hypervisor is Citrix XenServer.
Lets look at which new technologies NVIDIA CEO Jen-Hsun Huang unveiled 25th March 2014 at NVIDIA GTC.
upcoming GPU chip PASCAL
Pascal is the new GPU family that will follow this year’s Maxwell GPUs.
Named for 17th century French mathematician Blaise Pascal, our next-generation family of GPUs will include three key new features: stacked DRAM, unified memory, and NVLink.
3D Memory: Stacks DRAM chips into dense modules with wide interfaces, and brings them inside the same package as the GPU. This lets GPUs get data from memory more quickly – boosting throughput and efficiency – allowing us to build more compact GPUs that put more power into smaller devices. The result: several times greater bandwidth, more than twice the memory capacity and quadrupled energy efficiency.
Unified Memory: This will make building applications that take advantage of what both GPUs and CPUs can do quicker and easier by allowing the CPU to access the GPU’s memory, and the GPU to access the CPU’s memory, so developers don’t have to allocate resources between the two.
NVLink: Today’s computers are constrained by the speed at which data can move between the CPU and GPU. NVLink puts a fatter pipe between the CPU and GPU, allowing data to flow at more than 80-200GB per second, compared to the 16GB per second available now.
Pascal Module: NVIDIA has designed a module to house Pascal GPUs with NVLink. At one-third the size of the standard boards used today, they’ll put the power of GPUs into more compact form factors than ever before.
Pascal is due in 2016.
NVIDIA announced a new interconnect called NVLink which enables the next step in harnessing the full potential of the accelerator, and the Pascal GPU architecture with stacked memory, slated for 2016.
Pascal will support stacked memory, a technology which enables multiple layers of DRAM components to be integrated vertically on the package along with the GPU. Stacked memory provides several times greater bandwidth, more than twice the capacity, and quadrupled energy efficiency, compared to current off-package GDDR5. Stacked memory lets us combine large, high-bandwidth memory in the same package with the GPU, allowing us to place the place the voltage regulators close to the chip for efficient power delivery. Stacked Memory, combined with a new Pascal module that is one-third the size of current PCIe boards, will enable us to build denser solutions than ever before.
Outpacing PCI Express
Today a typical system has one or more GPUs connected to a CPU using PCI Express. Even at the fastest PCIe 3.0 speeds (8 Giga-transfers per second per lane) and with the widest supported links (16 lanes) the bandwidth provided over this link pales in comparison to the bandwidth available between the CPU and its system memory. In a multi-GPU system, the problem is compounded if a PCIe switch is used. With a switch, the limited PCIe bandwidth to the CPU memory is shared between the GPUs. The resource contention gets even worse when peer-to-peer GPU traffic is factored in.
NVLink addresses this problem by providing a more energy-efficient, high-bandwidth path between the GPU and the CPU at data rates 5 to 12 times that of the current PCIe Gen3. NVLink will provide between 80 and 200 GB/s of bandwidth, allowing the GPU full-bandwidth access to the CPU’s memory system.
A Flexible and Energy-Efficient Interconnect
The basic building block for NVLink is a high-speed, 8-lane, differential, dual simplex bidirectional link. Our Pascal GPUs will support a number of these links, providing configuration flexibility. The links can be ganged together to form a single GPU↔CPU connection or used individually to create a network of GPU↔CPU and GPU↔GPU connections allowing for fast, efficient data sharing between the compute elements.
When connected to a CPU that does not support NVLink, the interconnect can be wholly devoted to peer GPU-to-GPU connections enabling previously unavailable opportunities for GPU clustering.
Moving data takes energy, which is why we are focusing on making NVLink a very energy efficient interconnect. NVLink is more than twice as efficient as a PCIe 3.0 connection, balancing connectivity and energy efficiency.
Understanding the value of the current ecosystem, in an NVLink-enabled system, CPU-initiated transactions such as control and configuration are still directed over a PCIe connection, while any GPU-initiated transactions use NVLink. This allows us to preserve the PCIe programming model while presenting a huge upside in connection bandwidth.
What NVLink and Stacked Memory Mean for Developers
Today, developers devote a lot of effort to optimizing and avoiding PCIe transfer bottlenecks. Current applications that have devoted time to maximizing concurrency of computation and communication will enjoy a boost from the enhanced connection.
NVLink and stacked memory enable acceleration of a whole new class of applications. The large increase in GPU memory size and bandwidth provided by stacked memory will enable GPU applications to access a much larger working set of data at higher bandwidth, improving efficiency and computational throughput, and reducing the frequency of off-GPU transfers. Crafting and optimizing applications that can exploit the massive GPU memory bandwidth as well as the CPU↔GPU and GPU↔GPU bandwidth provided by NVLink will allow you to take the next steps towards exascale computing.
Starting with CUDA 6, Unified Memory simplifies memory management by giving you a single pointer to your data, and automatically migrating pages on access to the processor that needs them. On Pascal GPUs, Unified Memory and NVLink will provide the ultimate combination of simplicity and performance. The full-bandwidth access to the CPU’s memory system enabled by NVLink means that NVIDIA’s GPU can access data in the CPU’s memory at the same rate as the CPU can. With the GPU’s superior streaming ability, the GPU will sometimes be able to stream data out of the CPU’s memory system even faster than the CPU.
GeForce GTX TITAN Z
This GPU is for the consumer market.
Built around two Kepler GPUs and 12GB of dedicated frame buffer memory, 5760 CUDA “processing cores” or 2880 CUDA cores per GPU.
NVIDIA have released a new physically VCA appliance with the name VCA IRAY. Price will be only $ 50.000 for 1 VCA IRAY appliance and includes an Iray license and the first year of maintenance and updates. GA is Summer 2014.
The big question is which GPU is in this appliance, in the existing VCA its 8x GRID K2, in the new VCA IRAY is it the new TITAN Z or a new GRID GPU?
Will the VCA IRAY replace the existing VCA appliance?
VMware embrace NVIDIA GRID
The CTO of VMware was on stage with CEO of NVIDIA and talked about their new partnership how VMware will embrace NVIDIA for their DaaS strategy and their hypervisor ESX.
NVIDIA vSGA/vDGA for VMware Daas
VMware’s Horizon DaaS (a.k.a. Desktone) platform now supports both vSGA and vDGA GPU virtualization options with NVIDIA GRID GPU’s.
The Horizon DaaS solution is available today. Navisite will be first service provider to deliver this.
NVIDIA vGPU for VMware ESX/vSphere
NVIDIA and VMware is working on with integrating NVIDIA vGPU with ESX/vSphere which will be available in Q3 2014 (BETA), and with general availability(FINAL) in 2015.
I am very excited to share this great news with you all. I did a webinar with fellow CTP Trond Eirik Håvarstein from XenAppBlog.com, and we had a special guest surprise Jeroen Van De Kamp CTP and CTO, LoginVSI announcing ground breaking stuff in the webinar. We had over 700 people signed up for the Webinar, if you was among the crowd that missed the opportunity to see the webinar here is your chance, the webinar is now available for everyone for free. There was a lot of Q/A and I will the next couple of days reply to all the Q/A and make them available in this article.
The webinar has been re-mastered and the audio & graphical demo videos is even better now than in the actual webinar, make sure to check it out now:
Summary of webinar product announcements from LoginVSI, Lakeside Software, Uberagent for Splunk.
LoginVSI upcoming new version support’s GPU benchmark…
LoginVSI is working on next version that will support benchmark, capacity planning, stress testing the “missing component in virtualization” GPU. If you are interested you can write to get access to the beta version of LoginVSI.
Here are some screen shots from the session…. watch it to here what Jeroen tells about the upcoming version
Note if you want to get more info on the next version of LoginVSI that supports GPU, write to email@example.com subject GFX
Another groundbreaking product announcement was from Lakeside Software, they are about to release version 7 of Systrack that will support NVIDIA GPU Monitoring/assessing.
Application Graphics Benchmarking
The transformation of an existing software portfolio first begins with the identification of all of the actively used software packages in the environment. The added complication in the case of a project to begin advanced application delivery is the need to understand multiple facets of usage: resource consumption, graphics utilization, frequency of use, user access habits, and mobility needs. Because the state of IT is already so complex it only becomes possible to fully understand and plan with a complete set of descriptive information that really characterizes the unique aspects of every environment. Of particular interest is the ability to first identify applications that have GPU demands, and then begin to segment them into tiers of utilization. SysTrack continually collects information about software packages as they’re used and normalizes all data points for cross platform comparison. One of the key performance parameters that’s identified in this process is a graphical intensity measure (Graphics Index) that provides a way to identify those applications in the portfolio that have higher GPU demands than others. With this critical information it becomes possible to segment the portfolio into groupings based on their requirements for specific resources. By tying a general sense of which applications have peak demand to total length of usage it becomes easier to start developing a portfolio made up of different combinations of usage styles. This includes separating applications that may be used by a small set of the population with intense requirements versus widely used applications with a smaller footprint. Of course, this also allows for much deeper analytics centering on the behaviors of users that is quite important in planning the GPU profiles in use in provisioning. Figure 1 displays this relationship in a bubble chart format, this format groups applications based on their similar characteristics presenting clusters of similar applications in larger bubbles. The vast majority of applications exist in the “low graphics demand – Low Time Active” area in the bottom left, while only a select few have either high graphics demand or high time active.
SysTrackTracks graphics usage frequency across on physical clients and allows you to group users based on graphics usage & frequency
A natural expansion of this is grouping users into distinct workload types to understand how best to configure the profile types and GPU assignments for users. Once the target applications and users have been characterized and a plan has been developed it’s critical to begin the process of sizing the environment. This includes determining the architecture, sizing the desktops and servers that will be worked with, and identifying resources that will be required to support the needs of the planned deployment.
Resource Modeling & Capacity Planning
NVIDIA Marketplace report from Systrack’sVirtual Machine Planner (VMP) outlines the number of users that fall into different use cases making it easier to forecast how many users per board can be allocated
With a complete portfolio plan it now becomes possible to move into the next phase and start creating a model for what resources will be required for a complete environment. Because each of the users have been fully characterized throughout the assessment data collection interval it’s possible to use SysTrack’s Virtual Machine Planner (VMP) for powerful mathematical analysis to provide deep insight into infrastructure provisioning. The first component of this involves using the profile information above to help develop a plan for what kind of solution will be provided to the end-users. By segmenting the population into different delivery strategies using Citrix FlexCast options as a guideline, a more complete and accurate picture of how the net new environment will operate can be created. An additional benefit of segmentation is the ability to take advantage of grouping by general graphics consumption to identify the number of GPUs required for the environment based on the user density information for each profile type
The NVIDIA MarketPlace report from VMP outlines the number of users that fall into the various use cases (e.g. “high” for a designer or higher end power user), making it much easier to forecast how many users per board can be allocated and in turn how many total boards may be needed
This information creates an easy to use design for a set of user profiles, both for the actual desktop delivery and for the vGPU assignment. By ensuring the best possible analysis of the environment prior to the actual deployment the end-user experience is much simpler to forecast and control. This results in higher end-user satisfaction and a shorter transition time.
User Experience Optimization
After the successful implementation of the solution the environment still requires observation to prevent interruption of service and the potential for productivity impact. The best way to ensure optimal end-user service quality is to have a real-time alerting and analytical engine to collect and report instantly on degradation of any aspect of the systems the users interact with. SysTrack provides this in the form of proactive alerting, detailed system analysis in Resolve, and aggregate trending through Enterprise and Site Visualizer. An even more interesting feature is vScape, a tool designed to examine utilization across multiple virtual machines and correlate resource consumption to concurrency of application utilization. vScape provides real-time updates of all of the application usage across all virtual platforms in an enterprise, including information about what applications are currently demanding GPU resources. It also provides insight into other resource demands as well, such as CPU, memory, and I/O. This can help automate the discovery of co-scheduled or highly concurrent applications to pinpoint the root cause of oversubscription issues much more quickly. It also provides key insight into guest health characteristics with trending to correlate precisely which events may lead to service degradation
Another key feature introduced in SysTrack version 7.0 is the result of close collaboration with NVIDIA to leverage APIs presented in the guest operating system. This allows the capture of detailed GPU performance metrics to correlate vGPU consumption to end-user service quality. Specifically, with NVIDIA drivers present in the guest OS or on a physical system, the GPU utilization and key metrics (see table 2 for a sample of selected metrics) from the graphics card can be captured and analyzed in the same way as CPU or other system metrics are currently in SysTrack.
In Systrack 7 after provisioning users in VDI environment the IT admins can monitors performance, which enables to optimize density over time.
This completes the set of KPIs used in SysTrack to calculate the end-user experience score, including categories like resource limitation, network configuration, latency, guest configuration, protocol specific data for ICA, and virtual infrastructure. With a complete set of relevant information the proactive and trending health analysis provided in SysTrack yields a thorough analysis in an easy to understand, quantitative score that summarizes performance on an environmental, group based, or individual system level.
NVIDIA GPU Monitoring/Assessing: (Works with all NVIDIA GPU) Quadro, Kepler, GRID
You will be able to look at following parameters:
Frame Buffer Usage
Memory Usage (Bytes and Percent)
# of Apps
Temperatures and Fan RPMS
Use this data to accurately plan and size GRID and HDX 3D Pro deployments based on actually observed usage and utilization.
Monitor users post-deployment to provide the best user experience
UberAgent 1.8 for Splunk adds GPU performance monitoring
Helge Klein have developed a new version of Splunk that now supports monitoring of GPU, this was a feature request I talked with Helge Klein about in 2013, and I am so happy to see the results what he have done with UberAgent for Splunk, lets dig in what it can do.
GPU compute usage per machine
GPU memory usage per machine
GPU compute usage per process
GPU memory usage per process
uberAgent shows memory usage separately for shared and dedicated memory (dedicated = on the GPU, shared = main system RAM)
uberAgent shows compute usage per GPU engine. The various GPU engines serve different functions, e.g. 2D acceleration, 3D acceleration, video decoding, etc.
You will see more upcoming blogs from me covering this topic. End User experience, assessments of GPU workload, scaling/sizing, benchmarking, hardware supported, GPU side by side experience, Hypervisor vs Bare metal with a GPU. Watch out for cool things….