Audio Programming 


There is a hidden pointer somewhere in this text to a page containing deeper information about using audio. You should have perfect understanding about the features described in this page before jumping into more complicated information. Just make sure you read this text carefully enough so you will be able to find the link. 


Digital audio is the most commonly used method to represent sound inside a computer. In this method sound is stored as a sequence of samples taken from the audio signal using constant time intervals. A sample represents volume of the signal at the moment when it was measured. In uncompressed digital audio each sample require one or more bytes of storage. Number of bytes required depends on number of channels (mono, stereo) and sample format (8 or 16 bits, mu-Law, etc.). The length of this interval determines the sampling rate. Normally used sampling rates are between 8 kHz (telephone quality) and 48 kHz (DAT tapes). 

The physical devices used in digital audio are called ADC (Analog to Digital Converter) and DAC (Digital to Analog Converter). A device containing both ADC and DAC is commonly known as codec. The codec device used in Sound Blaster cards is called DSP which is somehow misleading since DSP also stands for Digital Signal Processor (the SB DSP chip is very limited when compared to "true" DSP chips). 

Sampling parameters affect quality of sound which can be reproduced from the recorded signal. The most fundamental parameter is sampling rate which limits the highest frequency than can be stored. It is well known (Nyquist's Sampling Theorem) that the highest frequency that can be stored in sampled signal is at most 1/2 of the sampling frequency. For example 8 kHz sampling rate permits recording of signal in which the highest frequency is less than 4 kHz. Higher frequency signals must be filtered out before feeding them to DAC. 

Sample encoding limits dynamic range of recorded signal (difference between the faintest and the loudest signal that can be recorded). In theory the maximum dynamic range of signal is number_of_bits * 6 dB . This means that 8 bits sampling resolution gives dynamic range of 48 dB and 16 bit resolution gives 96 dB. 

Quality has price. Number of bytes required to store an audio sequence depends on sampling rate, number of channels and sampling resolution. For example just 8000 bytes of memory is required to store one second of sound using 8 kHz/8 bits/mono but 48 kHz/16bit/stereo takes 192 kilobytes. A 64 kbps ISDN channel is required to transfer a 8kHz/8bit/mono audio stream and about 1.5 Mbps is required for DAT quality (48kHz/16bit/stereo). On the other hand it is possible to store just 5.46 seconds of sound to a megabyte of memory when using 48kHz/16bit/stereo sampling. With 8kHz/8bits/mono it is possible to store 131 seconds of sound using the same amount of memory. It is possible to reduce memory and communication costs by compressing the recorded signal but this is out of the scope of this document. 

OSS has three kind of device files for audio programming. The only difference between these device files is the default sample encoding used after opening the device. /dev/dsp uses 8 bit unsigned encoding while /dev/dspW uses 16 bit signed little endian (Intel) encoding and /dev/audio uses logarithmic mu-Law encoding. There are no other differences between the devices. All of them work in 8 kHz mono mode after opening them. It is possible to change sample encoding by using the ioctl interface after which all of these device files behave in similar way. However it is recommended that the device file is selected based on the encoding to be used. This gives the user more possibilities in establishing symbolic links for these devices. 

In short it is possible to record from these devices using the normal open(), close(), read() and write() system calls. Default parameters of the device files (see above) has been selected so that it is possible to record and play back speech and other signals with relatively low quality requirements. It is possible to change many parameters of the devices by calling the ioctl() functions defined below. All codec devices have capability to record or playback audio. However there are devices which don't have recording capability at all. Most audio devices have the capability of working in half duplex mode which means that they can record and playback but not at the same time. Devices having simultaneous recording and playback capability are called full duplex devices. 

The simplest way to record audio data is to use normal UNIX commands such as cat or dd. For example cat /dev/dsp > xyz records data from the audio device to a disk file called xyz until the command is killed (ctrl-C). Command cat xyz > /dev/dsp can be used to play back the recorded sound file. (Note that you may need to change recording source and level using a mixer program before recording to disk works properly). 

Audio devices are always opened exclusively. If another program tries to open the device when it is already open, the driver returns immediately an error (EBUSY). 

General programming guidelines

It is highly recommended that you carefully read the the following notes and also the programming guidelines chapter of the introduction page. These notes are likely to prevent you from making the most common mistakes with OSS API. At least you should read them if you have problems in getting your program to work. 

The following is a list of things that must be taken in account before starting programming digital audio. Features referred in these notes will be explained in detail later in this document. 
  • Avoid extra features and tricks. They don't necessarily make your program better but may make it incompatible with some (future) devices.
  • Open the device files using O_RDONLY or O_WRONLY flags whenever it is possible. The driver uses this information when making many optimizing decisions. Use O_RDWR only when writing a program which is going to both record and play back digital audio. Even in this case try to find if it is possible to close and reopen the device when switching between recording and playback.
  • Beware of little and big endian encoding of 16 bit data. This is not a problem when using 8 bit data or normal 16 bit sound cards in little endian (Intel) machines. However endianess is likely to cause problems in big endian machines (68k, PowerPC, Sparc, etc.). You should not try to access 16 bit samples blindly as signed short.
  • Default recording source and recording level is undefined when a audio device is opened. You should inform the user about this and to instruct him/her to use a mixer program to change these settings. It is possible to include mixer features to a program using digital audio. However it is not recommended since it is likely to make your program more hardware dependent (mixers are different and not always present).
  • Explicitly set all parameters your program depends on. There are default values for all parameters but it is possible that some (future) devices may not support them. For example the default sampling speed (8 kHz) or sampling resolution (8 bits) may not be supported by some high end professional devices.
  • Always check if an error (-1) is returned form a system call such as ioctl(). This indicates that the driver was not able to execute the request made by your program.
  • In most cases ioctl() modifies the value passed in as an argument. It is important to check this value since it indicates value that was actually accepted by the device. For example if the program requests higher sampling rate than supported by the device, the driver uses automatically the highest possible speed. The actually used value is the returned as the new value of the argument. As well the device may not support all possible sampling rates but just few of them. In this case the driver uses the supported sampling rate that is closest to the requested one.
  • Set sampling parameters always so that number of channels (mono/stereo) is set before selecting sampling rate (speed). Failing to do this will make your program incompatible with SB Pro (44.1 kHz speed in mono but just 22.05 kHz in stereo). Program which selects 44.1 kHz speed and then sets the device to stereo mode will incorrectly believe that the device is still in 44.1 kHz mode (actually the speed is decreased to 22.05 kHz).
  • Don't use older programs as an example before checking that it doesn't break these rules and that it actually works. Many oldest programs were made for early prototype versions of the driver and they are not compatible with later driver versions (2.0 or later).
  • Avoid writing programs which work only in 16 bit mode since some audio devices don't support other than 8 bit mode. It is relatively easy to write programs so that they are capable to output both in 8 and 16 bit modes. This makes the program usable for other than 16 bit sound card owners. At least you should check that the device supports 16 bit mode before trying to output 16 bit data to it. 16 bit data played in 8 bit mode (and vice versa) is just annoying loud noise.
  • Don't try to use full duplex audio before checking that the device actually supports full duplex mode.
  • Always read and write full samples. For example in 16bit/stereo mode each sample is 4 bytes long (two 16 bit sub-samples). In this case the program must read and write always N*4 bytes (N is integer). Failing to do so will cause lost sync between the program and the device sooner or later. In this case the output/input will be just noise or left and right channels will be swapped together.
  • Avoid writing programs which keep audio devices open when they are not required. This prevents other programs from using the device. Implement interactive programs so that the device is opened only when user activates recording and/or playback or when the program needs to validate sampling parameters (in this case it should handle EBUSY situations intelligently). However the device can be kept open when it is necessary to prevent other programs from accessing the device.
  • Always view all error codes returned by system calls to the driver. This can be done using perror(), strerror() or some other standard method which interprets the error code returned in errno. Omitting this information may make it impossible to solve problems with your program.

Simple audio

For simplicity recording and playback will be described separately. It is possible to write programs which both record and play back audio data but writing this kind of applications is not simple. They will be covered in the later sections. 

Declarations for an audio program

In general all programs using OSS API should include soundcard.h which is a C language header file containing definitions for the API. The other header files to be included are ioctl.h, unistd.h and fcntl.h. Other mandatory declarations for an audio application are file descriptor for the device file and a program buffer which is used to store the audio data during processing by the program. The following is an example of declarations for a simple audio program: 
 * Standard includes

#include <ioctl.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/soundcard.h>

 * Mandatory variables.
#define BUF_SIZE    4096
    int audio_fd; 
    unsigned char audio_buffer[BUF_SIZE]; 
In the above the BUF_SIZE macro is used to define size of buffer allocated for audio data. It is possible to reduce system call overhead by passing more data in eact read() and write() call. However shorter buffers give better results when recording. Effect of buffer size will be covered in detail in the "Improving real time performance" section. Buffer sizes between 1024 and 4096 are good choices for normal use. 

Selecting and opening the device

An audio device must be opened before it can be used (obvious). As mentioned earlier, there are three possible device files which differe only in the default sample encoding they use (/dev/dsp=8 bit unsigned, /dev/dspW=16 bit signed little endian and /dev/audio=mu-Law). It is important to open the right device if the program doesn't set the encoding explicitly. 

The device files mentioned above are actually just symbolic links to the actual device files. For example /dev/dsp points normally to /dev/dsp0 which is the first audio device detected on the system. User has freedom to set the symbolic links to point to other devices if it gives better results. It is good practice to use always the symbolic link (/dev/dsp) and not the actual device (/dev/dsp0). Programs should access the actual device files only if the device name is made easily configurable. 

It is recommended that the device file is opened in read only (O_RDONLY) or write only (O_WRONLY) mode. Read write mode (O_RDWR) should be used only when it is necessary to record and play back at the same time (duplex mode). 

The following code fragment can be used to to open the selected device (DEVICE_NAME). open_mode should be O_WRONLY, O_RDONLY or O_RDWR. Other flags are undefined and must not be used with audio devices. 
if ((audio_fd = open(DEVICE_NAME, open_mode, 0)) == -1) 
{ /* Opening device failed */ 
exit(Error code); 
It is recommended that programs display the error message returned by open using standard methods such as perror() or strerror(). This information is likely to be very important to the user or support personnell trying to guess why the device cannot be opened. There is no need to handle various error messages differently. Only EBUSY (Device busy) can be handled by the program by trying to open the device again after some time (it is not guaranteed the the device ever becomes available). 

Simple recording application

Writing an application which reads from an audio device is very easy as long as recording speed is relatively low, the program doesn't perform time consuming computations and when there are no strict real time response requirements. Solutions to this kind of problem will be presented later in this document. All the program needs to do is to read data from the device and to process or store it in some way. The following code fragment can be used to read data from the device: 
int len; 

if ((len = read(audio_fd, audio_buffer, count)) == -1) 
perror("audio read"); 
exit(Error code); 
In the above example the count is number of bytes the program wants to read from the device. It must be less or equal than size of audio_buffer (obvious). In addition it must always be an integer multiple of sample size. Using an an integer power of 2 (4, 8, 16, 32, 64, 128, 256, 512, ...) is recommended since it works best with internal buffering used by the driver. 

Number of bytes recorded from the device can be used to measure time precisely. Audio data rate (bytes per second) depends on sampling speed, sample size and number of channels. For example when using 8 kHz/16bits/stereo sampling the data rate is 8000*2*2 = 32000 bytes/second. This is actually the only way to know when to stop recording. It is important to notice that there is no end of file condition defined for audio devices. 

Error returned by read() usually means that there is a (permanent) hardware error or that the program has tried to do something which is not possible. It is not possible to recover from errors by trying again (closing and reopening the device may help in some cases). 

Simple playback application

A simple playback program works exactly like a recording program. The only difference is that a playback program calls write()

Setting sampling parameters

There are three parameters which affect quality (and memory/bandwidth requirements) of sampled audio data. These parameters are the following: 
It is important to set these parameters always in the above order. Setting speed before number of channels doesn't work with all devices. 

It is possible to change sampling parameters only between open() and first read(), write() or other ioctl() call made to the device. Effect of changing sampling parameters when the device is active is undefined. The device must be reset using ioctl(SNDCTL_DSP_RESET) before it can accept new sampling parameters. 

Selecting audio format

Sample format is an important parameter which affects quality of audio data. OSS API supports several different sample formats but most devices support just few of them. soundcard.h defines the following sample format identifiers: 
  • AFMT_QUERY is not an audio format but an identifier used when querying current audio format.
  • AFMT_MU_LAW is logarithmic mu-Law audio encoding.
  • AFMT_A_LAW is logarithmic A-Law audio encoding.
  • AFMT_IMA_ADPCM is 4:1 compressed format where 16 bit audio sequence isrepresented using average of 4 bits per sample. There are several different ADPCM formats and this one is defined by Interactive Multimedia Association (IMA). The Creative ADPCM format used in SB16 is not compatible with this one.
  • AFMT_U8 this is the standard unsigned 8 bit audio encoding used in PC soundcards.
  • AFMT_S16_LE is the standard 16 bit signed little endian (Intel) sample format used in PC soundcards.
  • AFMT_S16_BE is a big endian (M68k, PPC, Sparc, etc) variant of the 16 bit signed format.
  • AFMT_S8 is signed 8 bit audio format.
  • AFMT_U16_LE is unsigned little endian 16 bit format.
  • AFMT_U16_BE is unsigned big endian 16 bit format.
  • AFMT_MPEG is the MPEG audio format (currently not supported).
It is important to know that just the 8 bit unsigned format (AFMT_U8) is supported by most devices in hardware level. (however there are "high end" devices which support only 16 bit formats). Other commonly supported formats are AFMT_S16_LE and AFMT_MU_LAW. With many devices AFMT_MU_LAW is emulated using software based translation (lookup table) between mu-Law and 8 bit encoding (causes poor quality when compared with straight 8 bits). 

Applications should check that the sample format they require is supported by the device. Unsupported formats should be handled by converting data to another format (usually AFMT_U8). Alternatively the program should abort if it cannot do the conversion. Trying to play data in unsupported format is a fatal error. The result is usually just LOUD noise which may damage ears, headphones, speakers, amplifiers, concrete walls and other unprotected objects. 

The above format identifiers have been selected so that AFMT_U8=8 and AFMT_S16_LE=16. This makes these identifiers compatible with older ioctl() calls which were used to select number of bits. This is valid just for these two formats so format identifiers should not be used as sample sizes in programs. 

AFMT_S16_NE is a macro provided for convenience. It is defined to be AFMT_S16_LE or AFMT_S16_BE depending of endianess of the processor where the program is being run. 

Number of bits required to store a sample is: 
  • 4 bits for the IMA ADPCM format.
  • 8 bits for 8 bit formats, mu-Law and A-Law.
  • 16 bits for the 16 bit formats
  • Undefined for the MPEG audio format
Sample format can be set using ioctl call SNDCTL_DSP_SETFMT. The following code fragment sets audio format to AFMT_S16_LE. It can be easily modified for other formats too: 
int format;
format = AFMT_S16_LE; 
if (ioctl(audio_fd, SNDCTL_DSP_SETFMT, &format)==-1) 
{ /* Fatal error */ 
exit(Error code); 

if (format != AFMT_S16_LE) 
The device doesn't support the requested audio format. The program 
should use another format (for example the one returned in "format") 
or alternatively it must display an error message and to abort. 
The above ioctl() call returns currently used format if AFMT_QUERY is passed in the argument. 

It is very important to check that the value returned in the argument after the ioctl call matches the requested format. If the device doesn't support this particular format, it rejects the call and returns another format which is supported by the hardware. 

A program can check which formats are supported by the device by calling ioctl SNDCTL_DSP_GETFMTS like in the following: 
    int mask;
    if (ioctl(audio_fd, SNDCTL_DSP_GETFMTS, &mask) == -1) 
            {Handle fatal error}
    if (mask & AFMT_MPEG) 
            {The device supports AFMT_MPEG}

SNDCTL_DSP_GETFMTS returns only the sample formats that are actually supported by the hardware. It is possible that the driver supports more formats using some kind of software conversions (signed <-> unsigned, big endian <-> little endian or 8bits <-> 16bits). These emulated formats are not reported by this ioctl() but SNDCTL_DSP_SETFMT accepts them. The software conversions consume significant amount of CPU time so they should be avoided. Use this feature only if it is not possible to modify the application to produce supported data format directly. 

AFMT_MU_LAW is a data format which is supported with all devices. OSS versions before 3.6 reported this format always in SNDCTL_DSP_GETFMTS.Versions 3.6 and later report it only if the device supports mu-Law format in hardware. This encoding is to be used only with applications and audio files ported from systems using mu-Law encoding (SunOS). 

Selecting number of channels

Most modern audio devices support stereo mode (the default mode is mono). An application can select stereo mode by calling ioctl SNDCTL_DSP_STEREO like below. It is important to notice that only values 0 and 1 are allowed. Result of using any other value is undefined. 
    int stereo = 1;     /* 0=mono, 1=stereo */
    if (ioctl(audio_fd, SNDCTL_DSP_STEREO, &stereo)==-1)
    { /* Fatal error */
        exit(Error code);
    if (stereo != 1)
        The device doesn't support stereo mode.
Alternatively you can use ioctl(SNDCTL_DSP_CHANNELS)which accepts number of channels (currently only 1 or 2) as the argument. 
NOTE! Applications must select number of channels and number of bits before selecting speed. There are devices which have different maximum speeds for mono and stereo modes. The program will behave incorrectly if number of channels is changed after setting the card to high speed mode. Speed must be selected before first read or write call to the device. 
An application should check the value returned in the variable pointed by the argument. Many older (SB1.x and SB2.x compatible) devices don't support stereo. As well there are high end devices which support only stereo modes. 

Selecting sampling rate (speed)

Sampling rate is the parameter that determines much of the quality of an audio sample. OSS API permits selecting any frequency between 1 Hz and 2 GHz. However in practice there are limits set by the audio device being used. The minimum frequency is usually 5 kHz while the maximum frequency varies widely. Oldest sound cards supported at most 22.05 kHz (playback) or 11.025 kHz (recording). Next generation supported 44.1 kHz (mono) or 22.05 kHz (stereo). With modern sound devices the limit is 48 kHz (DAT quality) but there are still few popular cards that support just 44.1 kHz (audio CD quality). 

The default sampling rate is 8 kHz. However an application should not depend on the default since there are devices that support only higher sampling rates. The default rate could be as high as 48 kHz with such devices. 

Codec devices usually generate the sampling clock by dividing frequency of a high speed crystal oscillator. In this way it is not possible to generate all possible frequencies in the valid range. For this reason the driver always computes the valid frequency which is closest to the requested one and returns it to the calling program. The application should check the returned frequency and to compare it with the requested one. Differences of few percents should be ignored since they are usually not audible. 

The following code fragment can be used to select the sampling speed: 
int speed = 11025;
if (ioctl(audio_fd, SNDCTL_DSP_SPEED, &speed)==-1) 
{ /* Fatal error */ 
exit(Error code); 

if (returned speed differs significantly from the requested one) 
The device doesn't support the requested speed. 
NOTE! Applications must select number of channels and number of bits before selecting speed. There are devices which have different maximum speeds for mono and stereo modes. The program will behave incorrectly if number of channels is changed after setting the card to high speed mode. Speed must be selected before first read or write call to the device. 

Other commonly used ioctl calls

It is possible to implement most audio processing programs without using other ioctl calls than the three ones described earlier. This is possible if the application just opens the device, sets parameters, calls reads or writes continuously (without noticeable delays or pauses) and finally closes the device. This kind of applications can be described as "stream" or "batch" applications. 

There are three additional calls which may be required with slightly more complicated programs. These new calls are the following: 
  • ioctl(audio_fd, SNDCTL_DSP_SYNC, 0) can be used when application wants to wait until last byte written to the device has been played (it doesn't wait in recording mode). After that the call resets (stops) the device and returns back to the calling program. Note that this call may take several seconds to execute depending on the amount of data in the buffers. close() calls SNDCTL_DSP_SYNC automaticly.
  • ioctl(audio_fd, SNDCTL_DSP_RESET, 0) stops the device immediately and returns it to the state where it can accept new parameters.
  • ioctl(audio_fd, SNDCTL_DSP_POST, 0) is light weight version of SNDCTL_DSP_SYNC. It just tells to the driver that there is likely to be a pause in the output. This makes it possible to the device to handle the pause more intelligently.
Note! All of these calls are likely to cause clicks or unnecessary pauses to output. You should use them only when they are required (see below). 

There are few places where these calls should be used: 
  • You should call SNDCTL_DSP_POST when your program is going to pause continuous output of audio data for relatively long time. This kind of situations are for example the following:
    • After playing a sound effect when a new one is not started immediately (another way is to output silence until next effect starts).
    • Before the application starts waiting for user input.
    • Before starting lengthy operation such as loading a large file to memory.
  • SNDCTL_DSP_RESET or SNDCTL_DSP_SYNC should be called when the application wants to change sampling parameters (speed, number of channels or number of bits).
  • The application must call SNDCTL_DSP_SYNC or SNDCTL_DSP_RESET. Before switching between recording and playback modes (or alternatively it should close and reopen the audio device).
  • You can use SNDCTL_DSP_RESET when playback should be stopped (cancelled) immediately.
  • Call SNDCTL_DSP_RESET after recording (after last read from the device) if you are not going to close the device immediately. This stops the device and prevents the driver from displaying an unnecessary error message about recording overrun.
  • Call SNDCTL_DSP_SYNC when you want to wait until all data has been played.

Interpreting audio data

Encoding of audio data depends on the sample format. There are several possible formats and only the most common ones are described here. 
  • mu-Law (logarithmic encoding)

  • This is a format originated from digital telephone technology. Each sample is represented as a 8 bit value which is compressed from the original 16 bit value. Due to logarithmic encoding, the value must be converted to linear format before used in computations (summing two mu-Law encoded values gives nothing useful). The actual conversion procedure is beyond the scope of this text. Avoid mu-Law if possible and use the 8 or 16 bit linear formats. 
  • 8 bit unsigned

  • This is the normal PC sound card ("Sound Blaster") format which is supported by practically any hardware. Each sample is stored in a 8 bit byte. Value of 0 represents the minimum level and 255 the maximum. The neutral level is 128 (0x80). However in most cases there is some noise in recorded "silent" files so the byte values may vary between 127 (0x7f) and 129 (0x81). 
    The C data type to be used is unsigned char. In case there is need to convert between unsigned and signed 8 bit formats, you should add or subtract 128 from the value to be converted (depending on the direction). In practice, XORing the value with 0x80 does the same (value ^= 0x80). 
  • 16 bit signed

  • Caution! Great care must be taken when working with 16 bit formats. 16 bit data is not portable and depends on design of both CPU and the audio device. The situation is simple when using a (little endian) x86 CPU with a "normal" soundcard. In this case both the CPU and the soundcard use the same encoding for 16 bit data. However the same is not true when using 16 bit encoding in big endian environment such as Sparc, PowerPC or HP-PA. 

    The 16 bit encoding normally used by sound hardware is little endian (AFMT_S16_LE). However there are machines with built in audio chip which support only big endian encoding. 
    When using signed 16 bit data, the C data type best matching this encoding is usually signed short. However this is true only in little endian machines. In addition C standards don't define sizes of particular data types so there is no guarantee that short is 16 bits long in all machines (in future). For this reason using array of signed short as an audio buffer should be considered as a programming error although it is commonly used in audio applications. 
    The proper way is to use array of unsigned char and to manually assemble/disassemble the buffer to be passed to the driver. For example: 
    unsigned char devbuf[4096];  
    int applicbuf[2048];
    int i, p=0;
    /* Place 2048 samples (16 bit) into applicbuf[] here */
    for (i=0;i<2048;i+=2)
    /* first send the low byte then the high byte */
       devbuf[p++] = (unsigned char)(applicbuf[i] & 0xff); 
       devbuf[p++] = (unsigned char)((applicbuf[i] >> 8) & 0xff); 
    /* Write the data to the device */
    Disassembling the data after input from the file should be performed in similar way (exercise). 

Encoding stereo data

When using stereo data, there are two samples for each time slot. The left channel data is always stored before right channel data. The samples for both channels are encoded as described above. 

Representation of samples to be used with more than 2 channels is to be defined in future. 


The above is all you need to use when implementing "basic" audio applications. There are many other ioctl calls but they are usually not required. However there are "real time" audio applications such as games, voice conferencing systems, sound analysis tools, effect processors and many others. Unfortunately the above is not enough when implementing this kind of programs. More information about other audio programming features can be found in the Making audio complicated section. Be sure you understand everything described above before jumping into that page. 

Mixer programming Guide Menu Music programming