Skip to main content

Delivering Softphones with Citrix XenDesktop 5.5 / 5.6

Citrix have released this great article on how to deliver Softphones with Citrix XenDesktop 5.5 / 5.6


This article describes a generic approach to delivering softphones and voice chat applications with XenDesktop 5.5 and 5.6.

Alternatives for delivering softphones

XenDesktop supports several alternatives for delivering softphones:

  • Control mode, where the hosted softphone simply controls a physical telephone set
  • Generic softphone support (VoIP-over-ICA)
  • Optimized softphone support (media engine runs on user device, and VoIP traffic flows peer-to-peer)

This article focuses on generic softphone support, where an unmodified softphone is hosted on XenDesktop in the data center and the audio traffic goes over Citrix ICA protocol (UDP or TCP) to the user device running the Citrix Receiver. Generic softphone support is a feature of HDX RealTime.

Generic softphone support

There are two aspects to softphone delivery using XenDesktop:

  • How the softphone application is delivered to the virtual desktop
  • How the audio is delivered to and from the user’s headset, microphone and speakers, or USB telephone set

XenDesktop 5.5 and the Citrix Receiver 3.0 for Windows introduced several valuable enhancements to generic softphone delivery:

  • Low latency audio path
  • Client-side jitter buffer – Ensures smooth audio even when network latency fluctuates
  • Audio plug-n-play – Audio devices do not need to be plugged in before starting a XenDesktop session, they can be plugged in at any time during a session
  • Improved echo cancellation – Allows for greater variation in the distance between microphone and speakers for workers who do not use a headset
  • Audio device routing – Users can direct ringtone to speakers but the voice path to their headset
  • Multi-stream ICA including UDP/RTP – Enables flexible Quality of Service (QoS)-based routing over the network
  • Packet tagging (DSCP and WMM) for QoS
    DSCP tagging for RTP packets (Layer 3)
    WMM tagging for WiFi

How the softphone application is delivered to the virtual desktop

There are two methods by which a softphone can be delivered to the XenDesktop virtual desktop:

  • It can, of course, be installed in the virtual desktop image.
  • Alternatively, as a best practice, it can be streamed to the virtual desktop using On-Demand Applications by XenApp, a feature of XenDesktop Enterprise Platinum Edition. This second approach has manageability advantages because the virtual desktop image is kept uncluttered. Once streamed to the virtual desktop, the application executes in that environment just as if it had been installed in the traditional manner.

How the audio is delivered to and from the User Device

XenDesktop supports two methods of delivering audio to and from the user device: Generic USB redirection (for LAN-connected users only) and the Citrix Audio Virtual Channel.

  • Isochronous USB Redirection
    Citrix’s Generic USB Redirection technology (CTXGUSB virtual channel) provides a generic means of remoting USB devices, including isochronous USB devices such as headsets and webcams. This approach is generally limited to LAN-connected users because the USB protocol tends to be sensitive to network latency and requires considerable network bandwidth. Isochronous USB redirection has been found to work very well with some softphones, providing excellent voice quality and low latency, but it is generally preferred to use the Citrix Audio Virtual Channel which is optimized for audio traffic. An exception is when using a USB telephone attached to a user device that is LAN-connected to the data center; in this case, Generic USB Redirection offers the advantage of supporting buttons on the phone set that control features by sending a signal back to the softphone.
  • Citrix Audio Virtual Channel
    The Citrix Audio Virtual Channel (CTXCAM) and the Bidirectional Audio feature of XenDesktop enable audio to be delivered very efficiently. XenDesktop takes the audio from the user’s headset/microphone, compresses it, and sends it over ICA to the softphone application on the virtual desktop using the audio virtual channel. Likewise, the softphone’s audio output is compressed and sent in the other direction to the user’s headset or speakers. This compression is independent of the compression used by the softphone itself (such as G.729 or G.711). It is done using the Optimized-for-Speech codec. This is, in fact, the Speex codec (see, and its characteristics are ideal for voice-over-IP (VoIP).

Citrix generally recommends using Bidirectional Audio (leveraging the XenDesktop audio driver) rather than raw isochronous USB redirection because this consumes less bandwidth and puts less of a load on the server. However, if using a USB telephone on the LAN, then Generic USB Redirection (CTXGUSB virtual channel) is recommended because both signaling and audio are involved.

To use either isochronous USB redirection or the optimized-for-speech audio codec, the user device must be equipped with either the Citrix on-line plug-in for Windows version 11.2 or later, or the Citrix Receiver for Linux version 11.100 or later. It is recommended to use the latest versions of the Citrix Receiver to get the benefit of ongoing HDX enhancements. For example, significant improvements to audio quality were introduced in the Citrix Receiver 3.0 for Windows (13.0 online plug-in) and the Citrix Receiver 12.0 for Linux.

System Configuration Recommendations

Client hardware and software

For optimal audio quality, Citrix recommends the Citrix Receiver 3.x for Windows and a good quality headset with echo cancellation.

The 12.0 online plug-in for Windows introduced echo cancellation into the client software, allowing the use of speakers and a microphone as an alternative to using a headset. This has been further enhanced in the Citrix Receiver 3.0 with version 13.0 of the online plug-in.

On each user device, install the latest Citrix Receiver for Windows or Linux. These versions include the Optimized-for-Speech audio codec technology required for softphone use.

CPU considerations

Monitor CPU utilization on the VDA to determine if it is necessary to assign two virtual CPUs to each virtual machine. Real-time voice and video are data intensive and configuring two virtual CPUs reduces the thread switching latency. Note that having two virtual CPUs does not necessarily mean doubling the number of physical CPUs, because physical CPUs can be shared across sessions.

Citrix Gateway Protocol (CGP), which is used for the Session Reliability feature, also increases CPU consumption. Improvements in XenDesktop 5.5 have greatly reduced the CPU impact of Session Reliability / Citrix Gateway Protocol (CGP). Nevertheless, on high quality network connections this feature could be disabled to further reduce CPU consumption on the VDA.

Neither of the above steps might be necessary on a powerful server.

Settings for use on WAN connections

Voice chat can be used over both LAN and WAN connections. On a WAN connection, audio quality depends on the latency, packet loss, and jitter on the connection. If delivering softphones to users on a Wide Area Network (WAN) connection, the following additional configuration settings are recommended:

  • Use XenDesktop 5.5 or above for best results.
  • Use Citrix Repeater and Branch Repeater between the data center and the remote office for Quality-of-Service (QoS). The Citrix Branch Repeater supports Multi-Stream ICA, including UDP. Also, in the case of a single TCP stream, it able to distinguish the priorities of the various ICA virtual channels to ensure that high priority real-time Audio data gets preferential treatment.

Use the HDX Monitor to validate your HDX configuration.

Audio service priority

If you are running XenDesktop 5.5 or above, there should be no need to adjust the priority of the Audio service. On earlier versions of XenDesktop, however, check the priority of the Citrix Audio Service (CtxAudioService) on the Virtual Desktop Agent. If it is set to Normal, increase the priority to Above Normal. (See Microsoft article; this topic is also discussed in CTX124516 – How to Optimize HDX MediaStream Server-Rendered Video).

Audio virtual channel priority

If you are running XenDesktop 5.5 or later, there should be no need to adjust the priority of the Audio virtual channel. On earlier version of XenDesktop, Citrix recommends setting the priority of the Audio virtual channel (Client Audio Mapping) to 0 (real-time priority).

If using XenDesktop 5.0, see CTX128190 – How to Change Virtual Channel Priority in XenDesktop 5. To do this on XenDesktop 4 and earlier, see CTX118836 – How to Optimize Audio for XenDesktop.

Note: If the clients are not configured to use the Optimized-for-Speech audio codec, then this setting is not recommended on WAN connections.

UDP audio

Audio over UDP provides excellent tolerance of network congestion and packet loss, and is preferred over TCP when available. XenDesktop 5.5 offers an Audio over UDP Real-time Transport user policy setting.

UDP audio requires XenDesktop 5.5 or later, and the Citrix Receiver 3.x for Windows, which includes the 13.0 online plug-in. The feature can only be used with medium quality audio (Optimized-for-Speech).

To enable UDP Audio, refer to the following links:

Codec Selection and Bandwidth Consumption

Between the user device and the XenDesktop VDA platform in the data center, Citrix recommends using XenDesktop’s Optimized-for-Speech codec setting, also known as Medium Quality audio. Choosing the Medium quality setting, rather than the default High Definition setting, minimizes bandwidth consumption and latency (encoding time). The Medium quality codec is specially optimized for voice-over-IP. It consumes approximately 56 kilobits per second of network bandwidth (28 kilobits per second in each direction), peak.

Between the VDA platform and the IP-PBX, the softphone uses whatever codec is configured or negotiated:

• G711 provides best voice quality but has the highest bandwidth requirement of 80 to 100 kilobits per second per call (depending on Network Layer2 overheads).

• G729 provides good voice quality and has the lower bandwidth requirement of 30 to 40 kilobits per second per call (depending on Network Layer 2 overheads).

More information

CTX118216 – Microsoft Office Communications Server 2007 Application Delivery Best Practices

CTX124655 – Best Practice: How to Configure XenApp 6 Voice and Video Chat Features for Microsoft Office Communicator 2007 R2

CTX130912 – Testing and Using Audio and Video on Microsoft Lync 2010 with XenDesktop 5.5 or 5.6

CTX124634 – XenDesktop Support for Avaya Softphones

CTX124438 – Delivering Cisco IP Communicator from Citrix XenDesktop

CTX111309 – Voice over IP Support with the Citrix Access Gateway Standard Edition

Leave a Reply

Your email address will not be published. Required fields are marked *

Turn on pictures to see the captcha *