Findings Video conference with Azure Virtual Desktop using Meet

Introduction
Over the last couple of years, there has been an impressive flux with many businesses and institutions adopting and relying on large-scale remote working and remote learning environments to maintain workforce and learning continuity. During this time, it’s generally been recognised that this type of remote working/learning has been quite successful, with many businesses and institutions continuing remoting working/learning practices or introducing hybrid models with a combination of remote and office work for their staff.
One of the reasons why remote working/learning has been successful is the availability of supporting technologies that have delivered a high standard of human communication and engagement across large numbers of workers or students/faculty in remote environments. Video conferencing applications, which includes video conferencing, screen sharing, IMs and more, are such technologies that have contributed to viable remote working/learning environment success.
But to use these applications to their fullest potential, a robust IT infrastructure is also a must. Many large enterprise companies, as well as SMB and other institutions have centralised their IT environment into virtualized desktop infrastructure (VDI), either as an on-premises solution or as a managed-service by cloud service providers (CSP). Centralizing resources, applications and data into a single infrastructure allows for better IT management and security of vital resources and data which can help improve workforce productivity, data security and IT efficiencies.
Investigation overview
This blog details a recent technical investigation where popular video conferencing applications are deployed on AMD-based Azure instances to determine the performance of each application, the number of deployable users in a multi-session environment, and the user experience each person would receive. The AMD-based instances includes both CPU-only based instances and CPU+GPU based instances to understand the impact of GPU-enabled resources to the density and experience of the users.
So next let’s look at the various parameters for the investigation.
The Lab:
For the investigation, we had three areas of consideration:
![]() | ![]() | ![]() |
1) Azure session host | 2) Application | 3) End-point devices |
Azure Session Host
For the host, we used Microsoft Azure. The system used Windows 10 Multisession 1909, running on Microsoft Remote Desktop Protocol (RDP) in Azure Virtual Desktop. The instances used are NV32asV4 (32 vCPU/112 GB/ 1xGPU), D32av4 (32 vCPU/112 GB) and D32sv4 (32 vCPU/112 GB). The host was in West Europe (Amsterdam) and the tests we conducted in Hinnerup, Denmark – with a rough distance of 500 miles from the datacenter.
Applications
The applications we used as part of this investigation are listed below. These are the most used video conferencing applications used to for remote working/learning.
1) Meet
Google Meet is running from a Google Chrome Enterprise browser which is installed in the multisession machines in AVD.
Endpoint devices
The investigation looked at using physical end point devices as opposed to virtual, giving a truer representation of the environment and experience. This as a result limited the sample size to 15+1 concurrent users, due to lab space.
Within the sample size, we used 11x Windows PCs, 5x Chromebooks and 1x MacBook Air laptop. Each device used the latest OS. In relation to the workload, we have 15 users connected and 1 additional user as host of the video call.
The workload:
For the investigation we looked at three types of workloads typically seen with video conferencing and run over a 30-minute period:
Workload 1 (length of time: 30mins)
1x host and 14x guest video conference (video and audio sharing) +Screen sharing static (PDF) content
Workload 2 (length of time: 30mins)
1x host and 14x guest video conference (video and audio sharing) +Screen sharing dynamic (video) content
Workload 3 (length of time: 30mins)
1x host and 14x guest video conference (video and audio sharing) +Screen sharing dynamic (video) content+Guests are multi-tasking, taking notes on office 365 suite
These workloads will become more demanding because of the increase in workload requirements – with video and audio sharing to dynamic screen-sharing to to multi-tasking. This gives a set of different types of common uses cases seen in remote working/learning environments.
Methodology
There are two areas in which we are collecting the data for this investigation.
How are we collecting the data?
![]() Session host | ![]() End-point device |
CPU utilization, Memory utilization, GPU utilization, GPU memory utilization | In/Out Frames Per Second (FPS) Encode time In/output bandwidth per user Output Bandwidth pr User Input Bandwidth pr User Latency per user |
The data was captured each 3 seconds using the Windows OS system from Sepago’s Azure Monitor for AVD monitoring tools. This allowed for cross checking of the data delivered
Considerations
Not all applications and devices are built the same
Before we look at the results of this investigation, we also need to reflect on the parameters and support each of the applications has for the different end-point devices as well as how they are viewed/installed.
Meet
Meet is the video conference system from Google.
Meet is run from Google Chrome Enterprise application. There was at current time the research project was accomplished not an Google Meet application available.
There is no optimization client for Meet for Azure Virtual Desktop or in general VDI. This means no offload to endpoint is possible all is processed on the multisession hosts in Azure and then video/sound is sync in Google Meet video services.
Azure Virtual Desktop redirection support
Another area to consider is Azure Virtual Desktop support for audio and camera redirection with end-point devices. Redirection helps to improve latency with camera and audio as its essentially a pass-through to the host.

Speakers – with AVD, speaker redirect support is across all platforms, whether is Windows, ChromeOS, MacOS and HTML5.

Camera – redirection camera support for AVD is supported with Windows devices and MacOS – this means no redirect support for ChromeOS and HMTL5 (Web client). So there is an expectation to see more latency with ChromeBooks and devices connected via a Web client.
Investigation findings
In the section we are going to review the findings from the investigation. Just to reiterate, there was 3x session hosts with 1x applications tested against 3x workloads, giving a total of 9 findings. Let’s begin….
Microsoft Azure NV32as_v4
Metrics captured each 3seconds with 15 concurrent users using AVD
![]() Session host | CPU Utilization | RAM memory | GPU Utilization | GPU memory |
Workload 1 | 47% | 13,7GB | 98% | 1.8GB |
Workload 2 | 67% | 13,3GB | 99% | 1.9GB |
Workload 3 | 50% | 13,5GB | 99% | 2.5GB |
![]() End-point device | Input FPS | Output FPS | Encode time | Input Bandwidth | Output Bandwidth | Latency |
Workload 1 | 17 FPS | 18 FPS | 5 MS | 36 MB/s | 13 MB/s | 64ms |
Workload 2 | 19 FPS | 25 FPS | 16 MS | 52 MB/s | 4 MB/s | 65ms |
Workload 3 | 18 FPS | 21 FPS | 8 MS | 21 MB/s | 4 MB/s | 54ms |
Observations:
- ALL 3 workloads works great across devices, video in sync and audio.
- Delivered a great user experience
- 4-> camera redirection takes 100% gpu, 3 takes 80% video encode*
Microsoft Azure D32as_v4
Metrics captured each 3seconds with 15 concurrent users using AVD
![]() Session host | CPU Utilization | RAM memory | – | – |
Workload 1 | 74% | 11.4GB | – | – |
Workload 2 | 97% | 11.3GB | – | – |
Workload 3 | 73% | 12GB | – | – |
![]() End-point device | Input FPS | Output FPS | Encode time | Input Bandwidth | Output Bandwidth | Latency |
Workload 1 | 31 FPS | 33 FPS | 10 MS | 25 MB/s | 19 MB/s | 83ms |
Workload 2 | 29 FPS | 29 FPS | 13 MS | 38 MB/s | 13 MB/s | 89ms |
Workload 3 | 31 FPS | 33 FPS | 13 MS | 40 MB/s | 12 MB/s | 77ms |
Observations:
- ALL 3 workloads works great across devices, video in sync and audio.
- Delivered a ok user experience
Microsoft Azure D32s_v4
Metrics captured each 3seconds with 15 concurrent users using AVD
![]() Session host | CPU Utilization | RAM memory | – | – |
Workload 1 | 69% | 11.4GB | – | – |
Workload 2 | 97% | 11.3GB | – | – |
Workload 3 | 72% | 12GB | – | – |
![]() End-point device | Input FPS | Output FPS | Encode time | Input Bandwidth | Output Bandwidth | Latency |
Workload 1 | 31 FPS | 32 FPS | 11 MS | 25 MB/s | 12 MB/s | 83ms |
Workload 2 | 28 FPS | 29 FPS | 11 MS | 34 MB/s | 12 MB/s | 131ms |
Workload 3 | 31 FPS | 33 FPS | 13 MS | 33 MB/s | 12 MB/s | 77ms |
Observations:
- ALL 3 workloads works great across devices, video in sync and audio.
- Delivered a ok user experience
Summary, scalability, user experience and recommendations
I will cover in my findings what does the data means when it comes to scalability and user experience.
Scalability
Lets look at some raw data. Instance was benchmarked with 15 CCU and if you divide CPU util with 100% and multiply with 15x this is how I did the raw estimate.
Please keep in mind this is a raw estimate so keep the data with caution
Microsoft Azure NV32as_v4
- Workload 1 could potential be scaled up to 31 CCU according to 100% CPU utilization pr instance
- Workload 2 could potential be scaled up to 21 CCU according to 100% CPU utilization pr instance
- Workload 3 could potential be scaled up to 30 CCU according to 100% CPU utilization pr instance
Microsoft Azure D32as_v4
- Workload 1 could potential be scaled up to 19 CCU according to 100% CPU utilization pr instance
- Workload 2 could potential be scaled up to 15 CCU according to 100% CPU utilization pr instance
- Workload 3 could potential be scaled up to 19 CCU according to 100% CPU utilization pr instance
Microsoft Azure D32s_v4
- Workload 1 could potential be scaled up to 21 CCU according to 100% CPU utilization pr instance
- Workload 2 could potential be scaled up to 15 CCU according to 100% CPU utilization pr instance
- Workload 3 could potential be scaled up to 19 CCU according to 100% CPU utilization pr instance
Summary scalability
AMD instance NV32as_v4 gets the highest density users
NV32as_v4 (GPU) instance gets approx 2x more users on Workload 1, Workload 2 & Workload 3 compared to nonGPU instances D32as_v4, D32s_v4.
Meet utilise GPU when maxed reverted to s/w rasterise on CPU.
D32s_v4 delivers more users than D32as_v4 with Meet with workload 1, but is equal user dense with Workload 2&3 compared to D32as_v4 vs D32s_v4.
Lower utilization of CPU with Meet compared without GPU
Summary User Experience
User experience works great across all 3 instance type.
GPU delivers less latency which means that user input is faster than NonGPU.
Greater GPU encode is used when 1-2 MacOS endpoints devices are connected.
NV32as_v4 delivers less latency compared to D32s_v4 and D32as_v4
NV32as_v4 delivers less encode time compared to D32s_v4 and D32as_v4
Recommendations
Google Meet is working best with GPU enabled instance in Azure Virtual Desktop.
Google Meet is a high demanding application that requires high utilization, so make sure to size correct accordingly. You can gain 2x users with GPU enabled instance in Azure with NV32as_v4 compared to D32s_v4 / D32as_v4.
There is no camera redirection supported yet on Android or HTML5 so ChromeOS is limited when it comes to if users wants to use camera and get these redirected into Meet in Azure Virtual Desktop on these endpoints.