How to manage audio flow through WASAPI
[This tutorial applies to Windows Vista and later versions only]
Starting from Windows Vista, Microsoft has rewritten the multimedia sub-system of the Windows operating system from the ground-up; at the same time Microsoft introduced a new API, also known as Core audio API, which allows interacting with the multimedia sub-system and with audio endpoint devices (sound cards).
The Core Audio APIs implemented in Windows Vista and higher versions are the following:
• Multimedia Device (MMDevice) API. Clients use this API to enumerate the audio endpoint devices in the system.
• DeviceTopology API. Clients use this API to directly access the topological features (for example, volume controls and multiplexers) that lie along the data paths inside hardware devices in audio adapters.
• EndpointVolume API. Clients use this API to directly access the volume controls on audio endpoint devices. This API is primarily used by applications that manage exclusive-mode audio streams.
• Windows Audio Session API (WASAPI). Clients use this API to create and manage audio streams to and from audio endpoint devices.
All of the mentioned stuffs, with the exception of WASAPI, are accessible through the CoreAudioDevicesMan class accessible through the CoreAudioDevices property as described inside the tutorial How to access settings of audio devices in Windows Vista and later versions.
In general, WASAPI operates in two modes:
|•||In exclusive mode (also called DMA mode), unmixed audio streams are rendered directly to the audio adapter and no other application's audio will play and signal processing has no effect. Exclusive mode is useful for applications that demand the least amount of intermediate processing of the audio data or those that want to output compressed audio data such as Dolby Digital, DTS or WMA Pro over S/PDIF.|
|•||In shared mode, audio streams are rendered by the application and optionally applied per-stream audio effects known as Local Effects (LFX) (such as per-session volume control). Then the streams are mixed by the global audio engine, where a set of global audio effects (GFX) may be applied. Finally, they're rendered on the audio device. Differently from Windows XP and older versions, there is no more direct path from DirectSound to the audio drivers, indeed DirectSound and MME are totally emulated through WASAPI working in shared mode, which results in pre-mixed PCM audio that is sent to the driver in a single format (in terms of sample rate, bit depth and channel count). This format is configurable by the end user through the "Advanced" tab of the Sounds applet of the Control Panel as seen on the picture below:|
In order to enable the usage of WASAPI you must call the InitDriversType method with the nDriverType parameter set to DRIVER_TYPE_WASAPI. The call to the InitDriversType method is mandatory before performing calls to the InitSoundSystem, GetOutputDevicesCount and GetOutputDeviceDesc methods: if the InitDriversType method should be called at a later time, it would report back an error; if for any reason you should need calling it at a later time, you would need performing the following sequence of calls:
1. ResetEngine method
2. InitDriversType method
4. ResetControl method
Differently from usage of DirectSound and ASIO drivers, when using WASAPI drivers there is no need to perform a reset of the engine when an audio device is added or removed from the system: new calls to the WASAPI.DeviceGetCount and WASAPI.DeviceGetDesc methods will report the change.
WASAPI can manage three different types of devices:
|•||Render devices are playback devices where audio data flows from the application to the audio endpoint device, which renders the audio stream..|
|•||Capture devices are recording devices where audio data flows from the audio endpoint device, that captures the audio stream, to the application..|
|•||Loopback devices are recording devices that capture the mixing of all of the audio streams being rendered by a specific render device, also if audio streams are being played by third-party multimedia application like Windows Media Player: each render device always has a corresponding loopback device.|
Available WASAPI devices can be enumerated through the WASAPI.DeviceGetCount and WASAPI.DeviceGetDesc methods: if you only need enumerating output devices you can also use the GetOutputDevicesCount and GetOutputDeviceDesc methods. In both cases, only devices reported as "Enabled" by the system will be listed: unplugged or disabled devices will not be enumerated.
As seen for DirectSound and for ASIO, the index of the output device used for playback of each single player can be set through the InitSoundSystem method. Before using an output device for playback, starting the device itself is a mandatory operation: for exclusive mode you can use the WASAPI.DeviceStartExclusive method while for shared mode you can use the WASAPI.DeviceStartShared method. In both cases the started device can be stopped through the WASAPI.DeviceStop method. You can check if a device is already started through the WASAPI.DeviceIsStarted method.
For exclusive mode you need to start the device by specifying, inside the call to the WASAPI.DeviceStartExclusive method, the playback format which is represented by the frequency and number of channels: you can know if a WASAPI device supports a specific format through the WASAPI.DeviceIsFormatSupported method.
For shared mode you directly rely on the playback format chosen from the Sound applet of the Windows control panel: you can know which is the current format through the WASAPI.DeviceSharedFormatGet method.
WASAPI devices can support multi-channel configuration: this specific setting is not available through CoreAudio APIs so it cannot be modified programmatically but only through the Sounds applets of the Windows control panel; how to perform speakers configuration is briefly described inside the section Speakers management under Windows Vista and newer versions of Windows of the tutorial How to work with multiple output devices and speakers . After performing speakers configuration, you can know how many channels are assigned to a certain WASAPI device through the WASAPI.DeviceChannelsGet method.
WASAPI clients can individually control the volume level of each audio session. WASAPI applies the volume setting for a session uniformly to all of the streams in the session; you can modify the session volume through the WASAPI.DeviceVolumeSet method and to retrieve the current volume through the WASAPI.DeviceVolumeGet method. In case you should need to get/set the master volume for the given WASAPI device, shared by all running processes, you should use the CoreAudioDevices.MasterVolumeGet / CoreAudioDevices.MasterVolumeSet methods.
This latest topic brings to mind an important issue: while the list of CoreAudio devices, depending upon the value of the nStateMask parameter of the CoreAudioDevices.Enum method, may not contain unplugged or disabled devices, the list of WASAPI devices will always contain all of the devices installed inside the system, also if currently unplugged or disabled: in order to know the one-to-one correspondence between a specific WASAPI device and a specific CoreAudio device, which may be listed at different positions inside the respective lists, you should use the WASAPI.DeviceCoreAudioIndexGet method: with this method, given the zero-based index of the device inside the list of WASAPI devices, you could be informed about the corresponding zero-based index of the same physical device inside the list of CoreAudio devices.
As a final feature, WASAPI gives the possibility to perform a direct playback on a render device of audio data actually being received through a capture device or through a loopback device: this can be performed through the WASAPI.AttachInputDeviceToPlayer method.
The configuration of audio devices is not static: for example when dealing with USB connected audio devices the configuration may change when a USB device is plugged/unplugged or when speakers or headphones are physically inserted or removed from respective connectors. When the configuration of audio devices changes, it could be needed to enumerate again the available output devices and to properly assign them to the instanced players by resetting the multimedia engine and the control as described at the beginning of this tutorial.
In case a USB audio device should be added to the system, for example when an USB sound card is installed for the first time, the container application can be informed in real time by catching the CoreAudioDeviceAdded event.
When a device has been added to the system, it may still be reported as "Disabled" due to the jack-sensing feature that keeps a device disabled until the speakers are not physically plugged into the sound card; when the audio device is set as "Enabled" or "Disabled", one or more CoreAudioDeviceStateChange events could be generated.
If the USB audio device should be uninstalled from the system, by physically removing the audio device and by uninstalling its driver from the system, the CoreAudioDeviceRemoved event would be generated.
As mentioned, you could receive more than one CoreAudioDeviceStateChange event at the same time so, when eventually starting the reset procedure, you should ignore CoreAudioDeviceStateChange events after the first one.
In order to allow applying further effects to the audio stream being processed by a specific render device, we had the need to internally create an embedded "Stream Mixer" for each output device; this approach is quite similar to what has been described inside the How to use custom Stream Mixers tutorial but in this case you don't need to allocate a new custom Stream Mixer through your own code because the control will automatically allocate it for you. Below you can see the architecture applied:
In case the output of Player 0 and/or Player1 should be redirected to a different output device, the StreamOutputDeviceSet method would automatically disconnect the PCM stream from the internal Stream Mixer of the current output device and will reconnect it to the internal Stream Mixer of the new output device.
As for custom Stream Mixers described inside the How to use custom Stream Mixers tutorial, also in this case the Stream Mixer owns a unique identifier that can be obtained through the StreamMixerGetIdFromOutput method: through this unique identifier you can apply further effects to the mixed streams, for example you could apply a custom DSP or modify the Preamplifier volume: for this purpose the unique identifier can be used in place of the nPlayer parameter for all of the methods that will modify the output stream, like volume-related methods and all of the methods described inside the How to apply special effects to a playing sound tutorial. For obvious reasons, respect to custom Stream Mixers, you cannot redirect the mixed stream in output from the Stream Mixer to a different output device.
As seen on the graphic above, as for custom Stream Mixers, the Mixed stream being sent to the output device can be redirected to one of the following destinations:
|•||Through the use of an external encoder (Lame.exe for MP3, Fdkaac.exe for AAC+, OggEnc.exe for Ogg Vorbis), to a Shoutcast or Icecast server: in this case the control behaves as a Shoutcast/Icecast source.|
|•||In combination with our Audio Sound Recorder component, directly to an output file whose format can be predisposed inside Audio Sound Recorder itself.|
Samples of usage of WASAPI in Visual C#.NET and Visual Basic.NET can be found inside the following samples installed with the product's setup package:
If the Audio Sound Suite for .NET package has been installed, a further sample demonstrating the possibility to record and casting the output of the stream mixer is available: