Lakeside Case Study – sizing NVIDIA GRID vGPU

aibelcoverLakeside Case Study – sizing NVIDIA GRID vGPU

Hi all

I am very proud to share the results of the work I did with my friend and coworker Magnar Johnsen from FirstPoint. We have been developing for a long time a smart new way of analyzing data (iOPS, CPU, Memory, GPU, Latency..) and different pieces of software was evaluated and Lakeside Software was the product we decided to go with and both our companies Poppelgaard.com and Firstpoint who are a Lakeside partner so we are legit to make assessments for clients using Lakeside Software. Below article helps you if you are about to size a NVIDIA GRID vGPU solution using either Citrix XenServer or VMware vSphere. If you have been at NVIDIA GTC, Citrix Synergy, BriForum, E2EVC or seen me talking at a Citrix User Group you might have seen the results of the work we did, thats changing on how we think applications are impacting that requires a GPU.

Magnar and I go way back.

Magnar and I have a long past with remote graphics and when we put our minds together we create something that is beautiful. Last year at Citrix Synergy audience could see how people could virtualize 3d remote graphics from Virtual Reality solution using Oculus Rift “Facebook” from the cloud. This was showed for the first time for audience and Citrix loved the idea and redesigned their exhibit about HDX 3D Pro next to FrameHawk, and sorry FrameHawk we stole all the attention with everybody wants to try the VR solution. The cool thing was VR was cool, but me and Magnar worked months “hard” to analyze the data we find at Aibel and we did very successful assessment and we shared the findings at Brian Madden BriForum in May 2014, just after Citrix Synergy, and the audience could for the first time learn the lessons we experienced and we could see applications behavior is not “just” what you might think or expect it to be. Analytics is the key to understand your app. Lets talk about why…

Lets dig into the case study we did at Aibel

Aibel is an industrial pioneer with a history dating back more than a century. With around 5,500 employees worldwide the company is a leading supplier of engineering services related to oil, gas and renewable energy.

Aibel is a huge user of desktop virtualisation technology and is currently delivering virtual desktops to around 4,500 users. The challenge was to work out what to do with the remaining 1500 CAD/CAM workstations where graphics intensive modelling of 3D designs tied them to powerful physical workstations.

With projects occurring at the eight Aibel locations in Norway as well as at the Aibel offices in Thailand and Singapore the amount of data flowing across the WAN was increasing and becoming unwieldy. Large engineering models needed to be accessed on location and shared with engineers and designers in all Aibel office locations.

Virtualisation was seen as the solution to these issues but the challenge was how to virtualise graphics intensive workloads without impacting user experience or drive uneconomic datacentre specifications.

Partner Expertise

Firstpoint AS, a trusted Citrix Gold partner and virtualisation specialists, were brought in to advise Aibel on how to best virtualise this tricky user group.

They teamed up with Thomas Poppelgaard, an independent expert in virtualisation and GPU technologies and together they started an initial survey of Aibel’s situation.

GPU Acceleration Approach

It became quickly clear that the only way to successfully virtualise the 3D graphics workstations in an economically viable fashion was to deploy GPU acceleration technology in the datacentre. NVIDIA’s GRID cards would allow dedicated and shared GPU accelerators to be placed in the datacentre to be used by virtual desktops offloading server CPUs and removing user experience impact for other virtual desktop users.

The Challenge

Having identified the technology approach the challenge now was to work out how to size the workload of these 1500 users and design a solution that would deliver user experience at least as good as their current physical desktop experience. In order to do that the team needed to work out:

  • What graphics applications are in use today
  • Who is using which applications and workstations
  • What GPU processor power is being consumed today by application, user and workstationPoppelgaard.com and FirstPoint brought in Lakeside Software’s SysTrack to do this job. Using SysTrack’s granular data collection model they could model a complete picture of application and GPU workload across all the users and workstations in use at Aibel today.

The Outcome

aibel-results1Using SysTrack the team has been able to build a complete inventory of the existing application and workstation estate and user behaviour. Using these analytics and SysTrack technologies a detailed specification and design for the future NVIDIA GRID based desktop virtualisation infrastructure has been created. Aibel is able to trust this design

as it is completely based on observed data captured from their existing estate and modelled using industry leading technologies from Lakeside and NVIDIA to create the optimal solution for Aibel’s situation. Key in any design is to not over provision the solution and thereby inflate the cost of the solution. On the other hand under provisioning will lead to a poor user experience and potential project failure. SysTrack ensures the right data is used to make the right decision for the future estate.

The assessment was primarily executed to understand Aibel engineers CAD applications such as Aveva PDMS, Bentley Microstation, etc. Shown below is a summary of average GPU usage for 1500 physical machines which revealed surprisingly that internet browsers are very GPU intensive compared to other applications.

Another benefit of leveraging SysTrack was the SysTrack MarketPlace report that had been co-authored by Lakeside and NVIDIA which allowed the team to convert all the data collected into accurate sizing of the number of NVIDIA GRID cards by model number required to offload the GPU workload.

aibel-resultsvgpuThe output from SysTrack also showed how GPUs were being used across the existing estate and how hard they were being utilised. Using jointly authored Lakeside/NVIDIA reports this data was then used to calculate estimated vGPU profiles.

 

 

 

 

Source

 

thomas poppelgaard CTP & MVP

Citrix technology professional – CTP, and Microsoft Most Valuable Professional MVP, Thomas Poppelgaard provides professional services. Write to me on my email thomas@poppelgaard.com or call on my cell +45 53540356

 

Introducing The End User Computing Podcast

Introducing The End User Computing Podcast

Over the last several years, many of us in the industry have discussed the need for community driven End User Computing podcasts focusing on virtualization topics for people designing, deploying, and using Citrix, Microsoft, VMware and surrounding†technologies. I am excited to share that this month, two new Podcasts are being launched! First, a warm congratulations to Jarian Gibson and Andy Morgan on the successful launch of their Podcast, Frontline Chatter. Here’s to many years of continued success! Next, allow me to introduce the End User Computing Podcast!

Continue reading

Citrix XenServer 6.5

Citrix XenServer 6.5

Citrix have released a major release of their hypervisor XenServer 6.5

I have with this blogpost gathered all the public informations available and created a blogpost on what I think is new with Citrix XenServer 6.5 and why this is great and how you can use this.

Continue reading

Webinar I did with XenAppblog – “GPU in virtualization, learn why it’s important” now available

Hi All

I am very excited to share this great news with you all. I did a webinar with fellow CTP  Trond Eirik Håvarstein from XenAppBlog.com, and we had a special guest surprise Jeroen Van De Kamp CTP and CTO, LoginVSI announcing ground breaking stuff in the webinar. We had over 700 people signed up for the Webinar, if you was among the crowd that missed the opportunity to see the webinar here is your chance, the webinar is now available for everyone for free. There was a lot of Q/A and I will the next couple of days reply to all the Q/A and make them available in this article.

The webinar has been re-mastered and the audio & graphical demo videos is even better now  than in the actual webinar, make sure to check it out now:

Download the presentation here (PDF format)

Summary of webinar product announcements from LoginVSI, Lakeside Software, Uberagent for Splunk.

loginvsi
LoginVSI upcoming new version support’s GPU benchmark…

LoginVSI is working on next version that will support benchmark, capacity planning, stress testing the “missing component in virtualization” GPU. If you are interested you can write to get access to the beta version of LoginVSI.

Here are some screen shots from the session…. watch it to here what Jeroen tells about the upcoming version

Note if you want to get more info on the next version of LoginVSI that supports GPU, write to info@loginvsi.com subject GFX

LoginVSI_gpu_01 LoginVSI_gpu_02

LoginVSI_gpu_03

 

Lakeside

Lakeside Software Monitoring/Assessing NVIDIA GRID

Another groundbreaking product announcement was from Lakeside Software, they are about to release version 7 of Systrack that will support NVIDIA GPU Monitoring/assessing.

Application Graphics Benchmarking

The transformation of an existing software portfolio first begins with the identification of all of the actively used software packages in the environment. The added complication in the case of a project to begin advanced application delivery is the need to understand multiple facets of usage: resource consumption, graphics utilization, frequency of use, user access habits, and mobility needs. Because the state of IT is already so complex it only becomes possible to fully understand and plan with a complete set of descriptive information that really characterizes the unique aspects of every environment. Of particular interest is the ability to first identify applications that have GPU demands, and then begin to segment them into tiers of utilization. SysTrack continually collects information about software packages as they’re used and normalizes all data points for cross platform comparison. One of the key performance parameters that’s identified in this process is a graphical intensity measure (Graphics Index) that provides a way to identify those applications in the portfolio that have higher GPU demands than others. With this critical information it becomes possible to segment the portfolio into groupings based on their requirements for specific resources. By tying a general sense of which applications have peak demand to total length of usage it becomes easier to start developing a portfolio made up of different combinations of usage styles. This includes separating applications that may be used by a small set of the population with intense requirements versus widely used applications with a smaller footprint. Of course, this also allows for much deeper analytics centering on the behaviors of users that is quite important in planning the GPU profiles in use in provisioning. Figure 1 displays this relationship in a bubble chart format, this format groups applications based on their similar characteristics presenting clusters of similar applications in larger bubbles. The vast majority of applications exist in the “low graphics demand – Low Time Active” area in the bottom left, while only a select few have either high graphics demand or high time active.

lakesidesoftware_systrack7-gpu2

SysTrackTracks graphics usage frequency across on physical clients and allows you to group users based on graphics usage & frequency

A natural expansion of this is grouping users into distinct workload types to understand how best to configure the profile types and GPU assignments for users. Once the target applications and users have been characterized and a plan has been developed it’s critical to begin the process of sizing the environment. This includes determining the architecture, sizing the desktops and servers that will be worked with, and identifying resources that will be required to support the needs of the planned deployment.

Resource Modeling & Capacity Planning

NVIDIA Marketplace report from Systrack’sVirtual Machine Planner (VMP) outlines the number of users that fall into different use cases making it easier to forecast how many users per board can be allocated

With a complete portfolio plan it now becomes possible to move into the next phase and start creating a model for what resources will be required for a complete environment. Because each of the users have been fully characterized throughout the assessment data collection interval it’s possible to use SysTrack’s Virtual Machine Planner (VMP) for powerful mathematical analysis to provide deep insight into infrastructure provisioning. The first component of this involves using the profile information above to help develop a plan for what kind of solution will be provided to the end-users. By segmenting the population into different delivery strategies using Citrix FlexCast options as a guideline, a more complete and accurate picture of how the net new environment will operate can be created. An additional benefit of segmentation is the ability to take advantage of grouping by general graphics consumption to identify the number of GPUs required for the environment based on the user density information for each profile type

vgpu-profile

The NVIDIA MarketPlace report from VMP outlines the number of users that fall into the various use cases (e.g. “high” for a designer or higher end power user), making it much easier to forecast how many users per board can be allocated and in turn how many total boards may be needed

lakesidesoftware_systrack7-gpu0

This information creates an easy to use design for a set of user profiles, both for the actual desktop delivery and for the vGPU assignment. By ensuring the best possible analysis of the environment prior to the actual deployment the end-user experience is much simpler to forecast and control. This results in higher end-user satisfaction and a shorter transition time.

User Experience Optimization

After the successful implementation of the solution the environment still requires observation to prevent interruption of service and the potential for productivity impact. The best way to ensure optimal end-user service quality is to have a real-time alerting and analytical engine to collect and report instantly on degradation of any aspect of the systems the users interact with. SysTrack provides this in the form of proactive alerting, detailed system analysis in Resolve, and aggregate trending through Enterprise and Site Visualizer. An even more interesting feature is vScape, a tool designed to examine utilization across multiple virtual machines and correlate resource consumption to concurrency of application utilization. vScape provides real-time updates of all of the application usage across all virtual platforms in an enterprise, including information about what applications are currently demanding GPU resources. It also provides insight into other resource demands as well, such as CPU, memory, and I/O. This can help automate the discovery of co-scheduled or highly concurrent applications to pinpoint the root cause of oversubscription issues much more quickly. It also provides key insight into guest health characteristics with trending to correlate precisely which events may lead to service degradation

lakesidesoftware_systrack7-gpu3

Another key feature introduced in SysTrack version 7.0 is the result of close collaboration with NVIDIA to leverage APIs presented in the guest operating system. This allows the capture of detailed GPU performance metrics to correlate vGPU consumption to end-user service quality. Specifically, with NVIDIA drivers present in the guest OS or on a physical system, the GPU utilization and key metrics (see table 2 for a sample of selected metrics) from the graphics card can be captured and analyzed in the same way as CPU or other system metrics are currently in SysTrack.

lakesidesoftware_systrack7-gpu1

In Systrack 7 after provisioning users in VDI environment the IT admins can monitors performance, which enables to optimize density over time.

This completes the set of KPIs used in SysTrack to calculate the end-user experience score, including categories like resource limitation, network configuration, latency, guest configuration, protocol specific data for ICA, and virtual infrastructure. With a complete set of relevant information the proactive and trending health analysis provided in SysTrack yields a thorough analysis in an easy to understand, quantitative score that summarizes performance on an environmental, group based, or individual system level.

NVIDIA GPU Monitoring/Assessing: (Works with all NVIDIA GPU) Quadro, Kepler, GRID

 

You will be able to look at following parameters:

  • Device ID
  • Power State
  • GPU Usage
  • Frame Buffer Usage
  • Video Usage
  • Bus Usage
  • Memory Usage (Bytes and Percent)
  • # of Apps
  • Temperatures and Fan RPMS

Use this data to accurately plan and size GRID and HDX 3D Pro deployments based on actually observed usage and utilization.

Monitor users post-deployment to provide the best user experience

I recommend reading the whitepaper Lakeside Software have created:
White Paper: SysTrack Delivery Optimization and Planning for NVIDIA GRID and Citrix HDX

 

uberagent

UberAgent 1.8 for Splunk adds GPU performance monitoring

Helge Klein have developed a new version of Splunk that now supports monitoring of GPU, this was a feature request I talked with Helge Klein about in 2013, and I am so happy to see the results what he have done with UberAgent for Splunk, lets dig in what it can do.

uberAgent measures:

  • GPU compute usage per machine
  • GPU memory usage per machine
  • GPU compute usage per process
  • GPU memory usage per process
  • uberAgent shows memory usage separately for shared and dedicated memory (dedicated = on the GPU, shared = main system RAM)
  • uberAgent shows compute usage per GPU engine. The various GPU engines serve different functions, e.g. 2D acceleration, 3D acceleration, video decoding, etc.

uberAgent - process GPU usage uberAgent - single machine GPU usage over time uberAgent - single process GPU usage over time uberAgent - machine GPU usage

For more information visit uberAgent’s website.

My 5 cents

I am very excited to share my findings of some of the things I do in poppelgaard professional services. Feel welcome to contact me at thomas@poppelgaard.com if you are interested in using my professional services and you need help with GPU solutions.

You will see more upcoming blogs from me covering this topic. End User experience, assessments of GPU workload, scaling/sizing, benchmarking, hardware supported, GPU side by side experience, Hypervisor vs Bare metal with a GPU. Watch out for cool things….

Source

Watch the webinar here (YouTube)
Download the presentation here (PDF format)

Lakeside Software
LoginVSI
White Paper: 
SysTrack Delivery Optimization and Planning for NVIDIA GRID and Citrix HDX
UberAgent for Splunk

Citrix XenDesktop HDX3D Pro
Citrix XenApp with GPU Sharing
Citrix XenServer vGPU
NVIDIA GRID
AMD FirePro
VMware vSphere vDGA
VMware vSphere vSGA with NVIDIA GRID