Transforming BIM to Remote Workspace with HPE Edgeline, NVDIA and Citrix

HPE Edgeline BIM/Engineering Workstation – Overview

The new offerings have new enhanced GPU capability with the HPE Proliant m750 and providing a dedicated GPU from NVIDIA more specific the NVIDIA P1000 or NVIDIA T4. The solution requires no hypervisors and lowers complexity while enabling the mobility of users and enhancing user experience. This solution is designed to meet the requirements of power users such as designers, architects and visualizers that require high performance with reduced physical footprint, and targeted towards 1-499 users.

Challenges with CAD workstations today

  • Engineers are chained to desk
  • Not flexible with getting compute as they need
  • No scalable solution
  • Data requires to be copy down from network locations and can take long time
  • Data requires to be modified quickly and takes long time with synchronising−Engineers can’t work on any device 

Transforming high end workspace to current working style

Benefits of replacing physically BIM workstations with HPE remote workstations.

  • Move physically workstation to a central location and unchain engineers for their desk
  • Work from Home−Centralizing compute
  • Benefit of a scalable GPU solution
  • Benefit from central datacenter performance accessing the CAD/BIM files
  • IT Security and IP Protection 
  • Providing mobility to work from any device such as tablet
  • Providing engineers/designers a great dedicated and guaranteed user experience

Expanding the bare metal VDI family

HPE now have 3 hardware solutions and its 3 different form factors.

  • HPE Moonshot 1500
  • HPE Edgeline EL4000
  • HPE Edgeline EL8000 / EL8000t

HPE now supports Citrix CVAD on all Moonshot/Edgeline solutions with bare metal but also running virtualised with Citrix Hypervisor & Citrix Virtual Apps & Desktop.

I am proud to share that I was interviewed as part of HPE announcement of the new Bare Metal VDI for Engineering & BIM workstations; see the above video where you can see the new HPE Moonshot/Edgeline offerings and also a customer sharing their experience.

Citrix supporting Bare Metal

Citrix has been supporting bare metal setup for a while actually let’s dig into the facts.

  • Citrix Virtual Apps & Desktops (XenDesktop) has been supporting Bare metal since 2009 when HDX 3D Pro got available, which is 11 years. The technology was made in 2006 but first GA with deep compression codec in 2009.

HPE Edgeline and Citrix Cloud Architecture

Let’s dig in to the architecture.

HPE Edgeline and Citrix Cloud Virtual Apps & Desktops show in the below picture you can use Citrix cloud, but it’s also fully supported with onprem Citrix CVAD infrastructure if you have a requirement for that. You can use a thin client from HP T740 which is the best hardware thin client on the market that also can decode multiple 4K monitors without any user experience gets impacted. The great thing about below archictecture is you can have 4 physical workstation’s in 1U centralised in your datacenter or edge location without compromising user experience with virtualising. The power of bare metal is you get exactly what the hardware can deliver. Now that said, if you wanna virtualise below hardware, you can do that with no issues and it is fully supported as well.

HPE Edgeline offerings with user segmentation – one size does not fit all

Solutions varies what you requirements are, HPE now offers a very flexible solution in the EL4000 solution platform. Lets dig in what you can design.

  • Solution 1 is for calculation user applications
  • Solution 2 is for entry level CAD user with up to 4GB frame buffer requirement
  • Solution 3 is for Scalable workstation 1-n, which is from entry level CAD to high end CAD.
  • Solution 4 is for high end CAD user

Example on how BIM user’s can be mapped to each solution.

We will dig further into the solutions offerings, lets look at what the hardware is and how it can benefit you.

What is HPE Edgeline EL4000v2

HPE Edgeline is an incredible its not requiring a datacenter with high end cooling you can use the EL4000 chassis in small business offices or rooms where there is not much space. The EL4000 platform is perfect and can be used on the Edge or in the datacenter. Maximum input is 800Watt which is incredible its not more for a fully configured 4 physically servers in 1U.
Some customers also mount the EL4000 on wall as you can see on above picture.

What is HPE Proliant m750

HPE m750 is the card many customers have been waiting on for years and I can share why, its 5Ghz CPU, yes 5Ghz this have not been possible before in a datacenter, today its only workstation CPUs that are capable of this. The CPU also have 8core and 16core with hyper threading which is double the core compared to cartridges that have been released the last 7 years. This is great news for users that requires multithreading but it can also benefit if you build hyper converge systems or do virtualisation. The cartridge can also be configured with up to 16tb nvme which is incredible to think on special on the formfactor and with how little power it is consuming which is maximum 90W. The HPE m750 also runs any OS, any hypervisor which is awesome. Oh did I forget its using iLO 5 which is awesome and you can better manage the cartridge.

What is NVIDIA P1000

NVIDIA P1000 is a single slot GPU from NVIDIA, its using the Pascal architecture from NVIDIA and is part of the Quadra series. This GPU have 4 GB dedicated frame buffer which application demanding vendors are requiring when working in CAD/CAM/CAE. The idea of below concept is to give the user the dedicated power of the m750 (5GHz CPU 8c/16c, up to 16TB NVME, 64GB fast memory) and then dedicated NVIDIA P1000 GPU power to the user so the user replaces their existing physically workstation and the desk and put it in the datacenter.

What is NVIDIA T4

The NVIDIA T4 is a single slot GPU from NVIDIA, its using the Turing architecture from NVIDIA and is part of the Quadra series. This GPU have RTX capabilities more specific realtime raytracing. The GPU have 16GB GDDR6 dedicated frame buffer which high end CAD applications are demanding. The NVIDIA T4 can be used in bare metal setups but it can also be used virtualised and combined with NVIDIA vGPU technology the GPU frame buffer can be segmented in profiles from 2GB all the way up to 16GB, this means you can run from 1 user to up 8 users on the T4 virtualised with NVIDIA vGPU technology. The idea of below concept is to give the user the dedicated power of the m750 (5GHz CPU 8c/16c, up to 16TB NVME, 64GB fast memory) and then dedicated NVIDIA T4 GPU power to the user so the user replaces their existing physically workstation and the desk and put it in the datacenter. NVIDIA T4 requires vGPU licenses in both bare metal setup or in virtualised environment I have covered this in end of the blogpost under FAQ / datasheet references section.

Which HPE product can you add the m750 + NVIDIA P1000 or T4 in

Now HPE has a new server appliance available and a new solution that targets a new business opportunity for SMB/medium customers. Previously you needed to buy a Moonshot 1500 solution that was a 4.3 U chassis, where you could add up to 45 server blades. This have changed and new offerings are now available from HPE. These include Edgeline. HPE Edgeline EL1000 which support up to 1 HPE Proliant server blade and HPE Edgeline EL4000, which support up to 4 HPE Proliant server blades. This is great for business that would like CPU/GPU/Fast storage and IO in small capacity but require fast performance. These appliances are also built for IoT systems. This means that HPE Edgeline is a more accessible solution for companies no matter their size or demand. Use cases are pretty much endless.

Reference architecture’s

HPE Edgeline Engineering Workstation
with NVIDIA and Citrix CVAD 7 1912

  • Solution 1 is for calculation user applications
  • Solution 2 is for entry level CAD user with up to 4GB frame buffer requirement
  • Solution 3 is for Scalable workstation 1-n, which is from entry level CAD to high end CAD.
  • Solution 4 is for high end CAD user

Solution 1: Remote Workstations (calculation user)

Solution 2: Remote Workstations (Entry level CAD)

Solution 3: Remote Workstations (Scalable workstation 1-N)

Solution 4: Remote Workstation
(High End CAD)

 

Benchmarking 2x 4K monitors with HPE Workspace and Citrix

Citrix HDX protocol is delivering a great user experience on solution 2-4 where its using hardware encoding with NVIDIA Nvenc H264 and can easily run CAD applications and 4K video stream in full screen.

Citrix HDX leverages CPU & GPU and even Intel P630 does ok job to run 2x 4K monitors as long as no intensive 3D applications are used

Below is running a 2D application with a Youtube streaming on the secondary 4K monitor and Citrix HDX protocol is decoding using adaptive

 

Testing Methology for HPE Edgeline m750 + NVIDIA GPUs with Citrix CVAD

I have been testing the HPE m750 with Intel Xeon CPU and onboard Intel P630 GPU disabled and attached a NVIDIA P1000 together with Windows 10 1909  + Citrix CVAD 7 1912. I tested several Media entertainment applications such as Adobe Photoshop CC 2020 + more high end CAD such as Autodesk Inventor Professional 2020 and lots of industry standard GPU benchmark tools such as Basemark GPU, Unigine Superposition, Furmark, Specviewperf + SPECworkstation 3.

To summary up

Solution 2 vs Solution 4 – Test Setup

HPE EL4000v2 is the chassis
To the left the HPE m750 with NVIDIA P1000 and to the right HPE m750 with NVIDIA T4

Hardware component overview

Software component overview

Use case and benchmarking applications

These Application are tested and benchmarked

  • Autodesk Inventor Professional 2020 benchmark
  • Adobe Photoshop CC 2020 with PugetBench benchmark
  • Furmark
  • SpecviewPerf
  • Specworkstation3
  • Basemark GPU
  • Unigine Superposition

Results with application: Autodesk Inventor Professional 2020

I used this application because its a very popular CAD application in the engineer/designer BIM world. To automate it I used a tool Micrographics have developed which

Source: Autodesk Inventor Professional 2020 with Inventor benchmark tool

Benchmark Results with Autodesk Inventor Professional 2020:

NVIDIA T4 is 25% faster in Graphics Intensive Test than the NVIDIA P1000 in Autodesk Inventor 2020 benchmark.

The benchmark is run in 4K resolution with Citrix CVAD to a HP T740 thin client to 2x 4K 55″ monitors

Results with application: Adobe Photoshop CC 2020

Adobe Photoshop CC 2020 is very popular for creators, engineers and designers and Adobe keep improving the user experience adding more GPU capabilities into the product, this is why I have run test with latest version and to automate a benchmark simulation I ran the Puget System benchmark that is made for Adobe Photoshop CC 2020.

It would be nearly impossible to test everything in Photoshop, but Puget System benchmark is designed to test a broad range of tasks in order to give an accurate overall picture of how a system performs in Photoshop. Puget System benchmark test how long it takes to complete general tasks, apply filters, and generate a panorama (photomerge). The tested actions are:

Adobe Photoshop CC 2020 with Puget System benchmark

Benchmark Results with Adobe Photoshop CC 2020:

NVIDIA T4 is 25% faster in “Adobe Photoshop CC 2020”  than NVIDIA P1000

The benchmark is run in 1K resolution with Citrix CVAD to a HP T740 thin client to 2x 4K 55″ monitors

Results with application: Furmark

FurMark is a very intensive OpenGL benchmark that uses fur rendering algorithms to measure the performance of the graphics card. Fur rendering is especially adapted to overheat the GPU and that’s why FurMark is also a good stability and stress test tool (also called GPU burner) for the graphics card.

Source: Furmark (OpenGL)

Benchmark Results with Furmark:

Results is NVIDIA T4 is delivering 3x more FPS and deliver a score more than 3x  than the NVIDIA P1000

The benchmark is run in 1K resolution with Citrix CVAD to a HP T740 thin client to 2x 4K 55″ monitors

 

Results with application: SPECworkstation 13

SPECworkstation 13 is a very popular benchmark tool

The SPECworkstation 3 benchmark measures all key aspects of workstation performance based on diverse professional applications.

Source: SPECworksation 3

Benchmark Results with SPECworkstation3 running Autodesk 3DSMax:

NVIDIA T4 is 3x faster in “Autodesk 3DSMax”  than the NVIDIA P1000

The benchmark is run in 4K resolution with Citrix CVAD to a HP T740 thin client to 2x 4K 55″ monitors

Results with SPECworkstation3 running Dassault CATIA & PTC Creo:

NVIDIA T4 is 3x faster in “Dassault CATIA & PTC Creo”  than the NVIDIA P1000

The benchmark is run in 4K resolution with Citrix CVAD to a HP T740 thin client to 2x 4K 55″ monitors

Benchmark Results with SPECworkstation3 running Siemens NX & Dassault Solidworks and Autodesk Showcase:

NVIDIA T4 is 2-3x faster in “Siemens NX, Dassault Solidworks, Autodesk Showcase” than the NVIDIA P1000

The benchmark is run in 4K resolution with Citrix CVAD to a HP T740 thin client to 2x 4K 55″ monitors

Results with application: Basemark GPU

Basemark GPU is a new vendor in the benchmarking world, the company is from Finland.

Basemark GPU directly supports OpenGL 4.5, OpenGL ES 3.1 and Vulkan graphics APIs across Windows

Source: Basemark GPU

Benchmark Results with Basemark GPU:

NVIDIA T4 is 3x faster in Basemark GPU than the NVIDIA P1000

The benchmark is run in 4K resolution with Citrix CVAD to a HP T740 thin client to 2x 4K 55″ monitors

Results with application:Unigine Superposition

Unigine have very popular benchmark tools that the EUC community, partners, customers uses this is why I have included this great tool.

Source: Unigine Superposition

Benchmark Results with Unigine Superposition:

NVIDIA T4 is delivering 3x more FPS on averages and max and delivers a 3x more score in UNiGINE Superposition than the NVIDIA P1000.

The benchmark is run in 1K & 4K resolution with Citrix CVAD to a HP T740 thin client to 2x 4K 55″ monitors

Benchmark is run in 1K resolution

Benchmark is run in 4K resolution

Solution 2 vs Solution 3 – Test setup

Hardware component overview

Software component overview

Solution 2 vs Solution 3 – Furmark Benchmark run in 1K resolution

FurMark is a very intensive OpenGL benchmark that uses fur rendering algorithms to measure the performance of the graphics card. For rendering is especially adapted to overheat the GPU and that’s why FurMark is also a good stability and stress test tool (also called GPU burner) for the graphics card.

Source: Furmark (OpenGL)

Benchmark Results with Furmark:

Results is NVIDIA T4 4Q 4 simultaneous users is delivering 3x more FPS in max than the NVIDIA P1000. But averages FPS with T4 vGPU 4 users is less than 6FPS than P1000.

The benchmark is run in 1K resolution with Citrix CVAD to a HP T740 thin client to 2x 4K 55″ monitors

Solution 2 vs Solution 3 – comparison of price/performance

Bare metal or virtualized

  • The flexibility of deploying either a baremetal (m750/P1000) or virtualized solution (m750/T4) allows for efficient persona matching based on application requirements by changing the compute to GPU ratio.
  • Virtualizing on the EL4000/m750/T4 proves to be a cost-efficient method to centralize and remotely deliver workstations to CAD users for small and medium organizations. 
  • You can either have 1-2 units of EL4000 wall-mounted for small numbers of users (up to 32) or stack up to 20 units in a rack for mixed deployments of up to 300 users

My Overall conclusions

  • HP T740 delivers a great user experience working on 2x 4K monitors using Citrix HDX to m750 with NVIDIA P1000 working in Autodesk Revit 2020, Autodesk Navisworks 2020, Autodesk Inventor 2020, Autodesk 3DSMax 2020, users are able to work with HP Thinpro using Citrix Workspace App and getting a local experience. The NVIDIA P1000 delivers a great experience with normal BIM samples, if customers would require larger datasets the NVIDIA T4 is the preferred choice combined with HPE m750. NVIDIA vGPU is the prefered choice if customers would like to scale their solution and increase density with having most performance per dollar, NVIDIA vGPU delivers great performance combined with Citrix/HPE.
  • We typical see 1Mbit average bandwidth usage with 2x 4K and you should consider peaks of up to 30Mbit per user when zoom in/out or rotating large visual objects in CAD applications. Citrix HDX protocol is adaptable and is optimized out of the box in latest release with 2D/3D applications and requires only few tweak to get hardware acceleration out of the box.
  • When going from dual HD to dual 4K you should consider bandwidth will be used 4x more and thin client such as HP T740 is recommended as minimum to decode 2x 4K without any performance impact.
  • Office 365, Microsoft Teams, Microsoft Edge works great on 2x 4K and streaming 4K video from YouTube is possible by Citrix HDX and encoding capability with either M750 Onboard GPU Intel P630 or NVIDIA P1000 or T4 and then the experience is decoded on the HP T740 client which is attached to 2 x 4K monitors delivering a great user experience.

Overall conclusions and recommended solutions

  • Solution 1 is for customers that would like to use calculation application and use the 5Ghz CPU and use the great onboard GPU for 2D applications such as AutoCAD/4K video/Office365. The solution together with Citrix CVAD delivers great performance on even 2x 4K monitors, where Citrix HDX delivers hardware decoding capability with both CPU/GPU rendering.
  • Solution 2 is for customers that are using Entry level CAD applications such as Autodesk AutoCAD, Revit, Navisworks, Inventor. The NVIDIA P1000 delivers an excellent performance with Citrix CVAD and GPU decoding out of the box on 2x 4K monitors.
  • Solution 3 is for customers that are using Entry level CAD application such as Autodesk AutoCAD, Revit, Navisworks, Inventor but would like to have more users with virtualizing the M750 cartridge and combining NVIDIA T4 with vGPU technology running on Citrix Hypervisor. The solution is great as it can scale from 4 users to 32users in a fully configured EL4000 vs Solution 2 which only can run max 4 users. Customers gets the best performance for the money as they can use the hardware efficient. Realtime Raytracing is also supported with NVIDIA vGPU but for more high end workloads using large framebuffers below is recommended path
  • Solution 4 is for customers that are using High level CAD application such as Autodesk VRED, Maya, Arnold, 3DSMax or Dassault Catia. The solution fully supports Realtime Raytracing as the NVIDIA T4 GPU is capable of delivering these capabilities. The solution is also prefered if customers would like to use VR solutions such as NVIDIA CloudXR where they can combine it with Citrix CVAD.

Poppelgaard.com – Professional Services

Above article is written myself as an independent blogger/consultant and the results of the testing of the CAD applications was accomplished by Thomas Poppelgaard and sponsored by HPE. The methodology are as is and samples can be required if the readers would like the material to reproduce the same results.

if you are interested in testing your applications on the HPE Edgeline EL4000v2 with HPE ProLiant Server blade m750 with NVIDIA P1000 or NVIDIA T4 – please dont hesitate to contact me a thomas@poppelgaard.com

I can also help with above value added tasks plus much more 🙂

Other Sources

This video shows the user experience with Bare Metal VDI. Star Wars Squadon is streamed from a physical server (m750) to an endpoint using Citrix HDX protocol.

FAQ / datasheet references

  • NVIDIA supporting bare metal Tesla T4 GPU with vGPU technology for below operating systems

Bare Metal VDI – Engineering & BIM workstations

Leave a Reply

Your email address will not be published. Required fields are marked *

Turn on pictures to see the captcha *