Speakfreely Speex Codec
Speakfreely is an internet telephone program, written by John Walker in the early 1990s. Since 2004 it is being maintained collaboratively at sourceforge.net. It offers some features that can't be found elsewhere, particularly with interoperability between Linux, MS Windows, and various Unix variants. Here I extend it's capability through providing a patch to incorporate the Speex codec.
The Speex project is aimed at providing an open-source low-bitrate speech codec to complement the vorbis music encoder. It offers good voice reproduction down to 8kbps, and even lower with it's variable bit-rate options. In addition, it has a number of attractive qualities for telephony applications:
- Designed for handling lost voice packets, and includes extrapolation code.
- Decoder can handle frame bit-rate changes on a frame-by-frame basis. This allows easy incorporation of low bit-rate redundancy information into the packets.
These patches are still in development, and this is aimed at developers or anyone who wants to find bugs in what I've done. The main additions are:
- The speex codec calls themselves.
- A receiver queue to queue and sort incoming packets according to their sequence number. This also allows lost packets to be handled intelligently.
- Adaptive jitter compensation to minimise audio delay while avoiding unnecessary loss of data packets.
- NAT firewall traversal for the MS windows versions via an option to enable sharing of the listening and sending sockets.
It is primarily intended that the speex be transmitted over RTP, rather than the speakfreely protocol. In particular, this provides the necessary sequence numbers and timestamps. However, it should also work using the speakfreely native protocol.
I'm not using the proposed Speex RTP standard. This is because I want to use packets larger than 20ms, and also incorporate redundancy without additional packet header overhead. I don't see the compatibility issue to be a great loss, because I don't know of any other Speex over RTP application that doesn't use SIP or H323 session management.
The RTP payload consists of eight packed 20ms speex frames to span a time of 160ms. The first four frames are encoded from a 2150bps fixed-rate encoder. The second four are encoded from a separate variable bit-rate encoder, which is generally running at a higher bit-rate. These packets are sent at 80ms intervals, and under normal operation the first four (low-bit-rate) frames of each packet are discarded. However, when a packet is lost, and the following packet is available, the four redundant frames are used to bridge across the lost packet.
Now that I know how to do it, I may swap the order of the main and the redundant bits before a final release. That would allow for good compatibility between redundant and non-redundant implementations.
The adaptive jitter delay requires the presence of timestamps in the incoming packets. RTP provides this, but standard speakfreely does not implement them correctly. This patch from 20040314 provides proper timestamps, so you need at least that version at BOTH ends for adaptive jitter to work. Using the speakfreely protocol, timestamps are faked from the sequence number. This works adequately for discontinuous transmission having gaps of less than 10 seconds, and using the speex codec. To do better than that will require changes to that protocol.
Speex 1.1.1 or later is now used. Because it is incompatible with earlier versions, it is now statically linked. To build, unpack the speex source into a subdirectory called speex, and perform a ./configure and make in the speex directory. Then make speakfreely. Alternatively, just adjust the path to the libspeex.a file in the makefile to suit your system.
- speakfs76_win_speex_20040314.patch.gz (25kB)
- speakfs76_win_speex_20040314.update.zip (137kB) - A zip archive of the modified files (for those with trouble using patch)
- speakf76_20040314.exe.zip (323kB)- a pre-built executable.
- speakf76_plain.exe.zip (323kB)- an executable without the console status window.
The windows version has been tested under the Cygwin gcc compiler in native mode, and a cross-compile using mingw32 as packaged under Debian Linux. There is a makefile and also a minimal float.h in the patch to allow it to compile.
The Makefile is designed for the speex source tarball to be unpacked in the speakfreely source directory, under a subdirectory called speex (without versioning in the name)
Once you have the original source zip and the patch in the same directory, the patch would be applied as follows:
- mkdir speakfs76
- cd speakfs76
- unzip ../speakfs76.zip
- gunzip -c ../speakfs76_win_speex_20040314.patch.gz | patch -p1
Under Cygwin, you would need the following packages installed: gcc, mingw(I think), gzip, zip, patch, make
The pre-built version here was built with gcc, and if run from the Windows GUI, you'll find it will open up an additional console window for stdout to be displayed. That build includes status info, which shows a period (.) for each incoming packet, and some codes for special cases:
- X: packet lost and extrapolated.
- I: packet lost and interpolated from redundancy.
- D: packet dropped from arriving too late. (should increase jitter delay if you have too many of these)
- : start of spurt
- : end of spurt
Versions prior to 2003-10-25 use an obsolete format.
- speak_freely-7.6a_unix_speex_031025.patch.gz (16kB)
- speakfs76_win_speex_20031030.patch.gz (20kB)
- speakf76_20031030.exe.zip (252kB)- a pre-built executable.
- On the Windows version, adaptive jitter needs to be enabled from the Jitter Compensation delay menu. On unix, it is selected via the -j option (negative values are adaptive, positive values are fixed).
- The speex codec is best used in conjunction with RTP.
- Please report bugs to me - especially if you have a fix :-).
- 2003-10-16: Web page created
- 2003-10-18: (unix version) Added missing getoutputbufferusage function for non-ALSA compilation. Note: this may also need to be added for other platforms. Also increased the default jitter interval to 250ms
- 2003-10-20: (unix version) Better handling of inconsistent ioctl return values in getoutpufferusage. Adjusted default jitter from 250us(oops) to 250ms. More sensible handling of missing packets in non-RTP mode. (SF protocol still needs sequence numbers though)
- 2003-10-25: Protocol Changed: An extra 4 bits are now inserted to make the redundant bits up to an integer number of bytes. This will mean packets will be 4bits larger on average, but it makes handling packets more efficient. Note that as a result this version will not inter-operate with previous ones. Sequence Numbers are now implemented for the speakfreely protocol. This means that packet re-ordering and bridging will work without having to use RTP. Note that RTP is still more efficient, and I'm only worrying about speakfreely protocol because it still offers better crypto options. Speex Preprocessing is implemented to allow for software AGC(turned on in windows) and noise reduction(turned on in both).
- 2003-10-31: NAT Socket Sharing implemented for the windows version (already done for unix). There is an option in workarounds->network->socket sharing that will share the sockets between listening and sending. Selecting this (it will be now be set by default) will also seltect the “Don't use connect()” option. The Remote sending port is now displayed in the connection window to aid NAT debugging. Options Added to windows version to enable/disable the AGC and noise reduction for Speex.
- 2004-03-14 Adaptive Jitter compensation has been added to both unix and windows versions. This uses statistics from received packets to calculate the optimum delay to wait before playing the first packet in a sequence. This is mainly useful with used in conjunction with the Voice Activation mode. Crypto will now compile with gcc using the new makefile. The unix version can now compile a single application - sfapp - that combines the functionality of both sfmike and sfspeaker into one one multi-threaded application. It supports echo cancellation and also NAT-traversal implicitly. It's alpha-quality only however, and mainly to test the speex echo-cancellation code. Comments welcome.
- Output of OGG-Speex audio recording files for Unix.
- Controls to set bitrate and encoding/decoding options for Speex.
- Alter Speex packet format to allow better compatibility with alternate implementations without redundancy.
- Settle on a choice for the RTP packet type number for speex. As we don't have session management, it can't really be dynamically allocated.
- Improve the NAT implementation for unix, giving more options for the spawning of the sfmike process.