Real-time Steganography with RTP September, 2007 I)ruid, C²ISSP druid@caughq.org http://druid.caughq.org Abstract: Real-time Transfer Protocol (RTP) is used by nearly all Voice-over-IP systems to provide the audio channel for calls. As such, it provides ample opportunity for the creation of a covert communication channel due to its very nature. While use of steganographic techniques with various audio cover-medium has been extensively researched, most applications of such have been limited to audio cover-medium of a static nature such as WAV or MP3 file audio data. This paper details a common technique for the use of steganography with audio data cover-medium, outlines the problem issues that arise when attempting to use such techniques to establish a full-duplex communications channel within audio data transmitted via an unreliable streaming protocol, and documents solutions to these problems. An implementation of the ideas discussed entitled SteganRTP is included in the reference materials. 1) Introduction This paper describes a research effort within the disciplines of steganography, Internet telephony, and data communications. 1.1) Overview This paper is structured in the following order: The first chapter provides an introduction, describes the motivation for this research, and covers some basic concepts and terminology for the subjects of Voice over IP (VoIP), Real-time Transport Protocol (RTP), Steganography, and, more specifically, the use of steganography with an audio cover-medium. The second chapter defines the concept of real-time steganography, discusses using steganography with RTP, and describes some of the identified problems and challenges. The third chapter details the reference implementation entitled SteganRTP including a description of the project's goals, the implementation's operational architecture, process flow, message data structure, and functional sub-systems. The fourth chapter addresses the identified problems and challenges that were met and describes how they were solved. The fifth and final chapter concludes the paper with observations made as a result of this research effort. 1.2) Voice over IP The term Voice over IP (VoIP) is nearly synonymous with Internet Telephony. The majority of VoIP systems are designed to utilize separate signaling and media channels to provide calling services to users. The signaling channel is generally used to set-up, manage, and tear-down calls between two or more parties whereas the media channel is used to transmit the audio, video, or other media that may be associated with the call. A number of competing protocol standards exist for use as the VoIP system's signaling channel which include Session Initiation Protocol (SIP)[1], H.323[2], Skinny[3], and many others. Real-time Transport Protocol (RTP)[4], however, is used almost ubiquitously to provide VoIP systems with the required media channel. 1.3) Real-time Transport Protocol Real-time Transport Protocol (RTP) is described by the protocol authors as ``a transport protocol for real-time applications.'' RTP provides an end-to-end network transport suitable for applications transmitting real-time data such as audio, video or any other type of streamed data. RTP generally utilizes the User Datagram Protocol (UDP)[5] for its transport and can do so in both multicast or unicast network environments. When employed by a VoIP system, RTP generally handles the media channel of a call. The call's media channel is generally handled independent of the VoIP signaling channel. However, per the RTP specification, there are no default network ports defined. As such, the RTP endpoint network ports must be negotiated between the endpoints via the signaling channel. Other events in the signaling channel may also influence the operation of the media channel as handled by RTP such as requests to change audio encoding, add or remove parties from the call, or tear down the call. One of RTP's current deficiencies is that it is entirely clear-text while traversing the network. An RTP profile has been defined for encrypting parts of the RTP data packet called Secure Real-time Transport Protocol (SRTP)[6]. However, the specification defines no mechanism for negotiating or securely exchanging keying information to be used for the encryption and decryption processes. At the time of this writing, a number of keying mechanisms have been defined but no standard has either been agreed upon by the standards bodies or determined by the free market. As such, most implementations of RTP do not currently use the SRTP profile and instead continue to transmit call media data in the clear. As will be detailed in full in Section 3.2, this property of the media channel provides ample opportunity for multiple types of operational scenarios where unknown third-parties to the legitimate callers may hijack all or part of the call's media traffic for transmission of covert communications. Making use of this blatantly insecure property of RTP is the primary motivation for this research effort. 1.4) Steganography The term steganography originates from the Greek root words ``steganos'' and ``graphein'' which literally mean ``covered writing''. As a sub-discipline of the academic discipline of information hiding, the primary goal of steganography is to hide the fact that communication is taking place[7, 8, 9] by concealing a message within a cover-medium in such a way that an observer can not discern the presence of the hidden message. Conversely, steganalysis is the act of attempting to detect a concealed message which was hidden via the use of steganographic techniques[8], thus preventing a steganographer from achieving their primary goal. Common steganalysis techniques include statistical analysis of the properties of potential stego-medium, statistical analysis of extracted potential message data for properties of language, and many others such as specific techniques that target known steganographic embedding methods. 1.4.1) Terminology The following terminology as used in the discipline of steganography and steganalysis has been set forth over many years of compounding research[7, 8, 9]. As such, the following terminology will be used consistently within this research paper: 1. Cover-medium - Data within which a message is to be hidden. 2. Stego-medium - Data within which a message has been hidden. 3. Message - Data that is or will be hidden within a stego-medium or cover-medium, respectively. 4. Redundant Bits - Bits of data in a cover-medium that can be modified without compromising that medium's integrity. 1.4.2) Digitally Embedding Digitally embedding a message into a cover-medium usually involves three basic steps. First, the redundant bits of the target cover-medium must be identified. Second, it must be decided which of the identified redundant bits are to be utilized. Finally, the bits selected for use must be modified to store the message data. In many cases, a cover-medium's redundant bits are likely to be the least-significant bit or bits of each of the encoded data's word values. 1.5) Steganography With Audio Media formats in general, and audio formats specifically, tend to be very inaccurate data formats simply because they do not need to be accurate; the human ear is not very adept at differentiating sounds. As an example, an orchestra performance which is recorded with two separate recording devices will produce vastly different recordings when viewed digitally, but will generally sound the same when played back if they were recorded in a similar manner. Due to this inherent inaccuracy, changes to an audio bit-stream can be made so slightly that when played back the human ear won't be able to distinguish the difference between the cover-medium audio and the stego-medium audio. With many audio formats, the least-significant bit from each audio sample can be used as the medium's redundant bits for the embedding of message data. To illustrate, assume that an audio file encoded with an 8-bit sample encoding has the following 8 bytes of data in it, which will be used as cover-data: 0xb4 0xe5 0x8b 0xac 0xd1 0x97 0x15 0x68 In binary this would result in the following bit-stream: 10110100 11100101 10001011 10101100 11010001 10010111 00010101 01101000 In order to hide the message byte value 0xd6, or 11010110 in binary, each sample word's least-significant bit would be modified to represent all 8 bits of the message byte: 10110101 11100101 10001010 10101101 11010000 10010111 00010101 01101000 The modifications result in the following 8 bytes of stego-data: 0xb5 0xe5 0x8a 0xad 0xd0 0x97 0x15 0x68 When compared to the original 8 bytes of cover-data, it is noticeable that on average only half of the bytes of data have actually changed value, however the resulting stego-data's least-significant bits contain the entire message byte. It is also noticeable that when utilizing this embedding method with a cover-medium with these word size properties, the cover-medium must be at least eight times the size of the message in order to successfully embed the entire message. 1.5.1) Previous Research Audio Steganography Much research has been done in the field of steganography utilizing an audio cover-medium. Techniques such as using audio to convey messages in both the human audible and inaudible spectrum as well as various methods for the digital embedding of information into the audio data itself have all been explored; so much in fact that many methods are now considered standard. Many of the most recent implementations cannot be considered to advance the state of research in the area as they generally only implement the standard methods. It is important to note that the significant majority of previous research in the sub-discipline of audio steganography, however, has focused on static, unchanging audio data files. Tools such as S-Tools[10], MP3Stego[11], Hide 4 PGP[12], and many others, are just such implementations, employing standard embedding methods with WAV, MP3, and VOC audio file cover-mediums, respectively. Very few practical implementations have been developed that utilize audio steganography with a cover-medium that is in a flux state or within streaming or real-time media sessions. VoIP Steganography A few previous research efforts have been made to employ steganography with various VoIP technologies. A complete analysis of such efforts identified prior to embarking upon the research presented in this paper has previously been provided[13]. In summary, most identified research efforts were utilizing steganographic techniques but not achieving the primary goal of steganography or otherwise employing steganographic techniques to accomplish an otherwise overt goal. 2) Real-time Steganography This paper defines ``real-time'' use of steganography as the utilization of steganographic techniques to embed message data within an active, or real-time, media stream. The research and reference implementation presented herein focuses on VoIP call audio as the active media stream being targeted as cover-medium. Nearly all uses of steganography targeting audio cover-medium in general, or VoIP cover-medium specifically, that were evaluated prior to performing this research were found to operate on a target cover-medium as a storage channel and provided separate ``hide'' and ``retrieve'' modes. In addition, most cover-medium that were targeted by such implementations were of a static nature such as WAV or MP3 files or were unidirectional such as streaming stego-audio to a recipient. A few weeks prior to the research contained herein being initially presented[14] at the DEFCON 15[15] hacker conference on August 3rd through 5th 2007, another use of steganography in a real-time fashion was made public via a research effort entitled Vo(2)IP[16]. An analysis of this research effort and its deficiencies has been included in an updated version of the previously mentioned analysis paper[13]. 2.1) Context Terminology The disciplines of steganography and data networking share some common terminology which have different meanings relative to each discipline. This paper discusses research that lies within the realm of both disciplines, and as such will use terms that may be confusing when taken out of context. The following terms are defined here and used consistently without to prevent confusion when interpreting the content of this paper. 1. Packet - Used in the data networking sense; A data packet which is routed through a network, such as an IP/UDP/RTP packet. 2. Message - Used in the steganography sense; Data to be hidden or retrieved. 2.2) RTP Payload Redundant Bits RTP packet payloads are essentially encoded multimedia data. RTP payloads may contain any type of multimedia data. However, this research effort focused entirely on audio. Specifically, audio encoded with the G.711 Codec[17]. Any number of audio Codecs can be used to encode the RTP payload, the identifier of which is included in the RTP packet's header as the payload type (PT) field. The frequency, locations, and number of redundant bits found within the RTP packet's encoded payload are determined by the Codec that is used to encode the audio transmitted by an individual packet. The Codec focused on during this research, G.711, uses a 1-byte sample encoding and is generally resilient to modifications to the least significant bit (LSB)[18] of each sample. Codecs with larger samples may provide for one or more bits per sample to be modified without any discernible audible change in the encoded audio, which is defined as the audio's audible integrity. 2.2.1) Audio Word Size The data value word size, or sample size in audio terminology, used by various audio encoding formats is one factor in determining the amount of available space within the cover-medium for embedding a message. Generally only the least significant bit of each word value can be expected to be modifiable without any perceptible impact to audible integrity. Thus, only half the amount of available space in an audio cover-medium encoded in a format with a 16-bit word size will be available in comparison with a cover-medium with an 8-bit word size. 2.2.2) Common VoIP Audio Codecs For reference, some common VoIP audio Codecs and their encoding and sample properties[19] are listed in the table below. +------------+----------+-------------+-------------+------------+ | Codec | Standard | Bit Rate | Sample Rate | FrameSize | | | by | (kb/s) | (kHz) | (ms) | +------------+----------+-------------+-------------+------------+ | G.711 | ITU-T | 64 | 8 | Sampling | | G.721 | ITU-T | 32 | 8 | Sampling | | G.722 | ITU-T | 64 | 16 | Sampling | | G.722.1 | ITU-T | 24/32 | 16 | 20 | | G.723 | ITU-T | 24/40 | 8 | Sampling | | G.723.1 | ITU-T | 5.6/6.3 | 8 | 30 | | G.726 | ITU-T | 16/24/32/40 | 8 | Sampling | | G.727 | ITU-T | variable | | Sampling | | G.728 | ITU-T | 16 | 8 | 2.5 | | G.729 | ITU-T | 8 | 8 | 10 | | GSM 06.10 | ETSI | 13 | 8 | 22.5 | | LPC10 | U.S. Gov | 2.4 | 8 | 22.5 | | Speex (NB) | | 8, 16, 32 | 2.15 - 24.6 | 30 | | Speex (WB) | | 8, 16, 32 | 4 - 44.2 | 34 | | iLBC | | 8 | 13.3 | 30 | | DoD CELP | U.S. DoD | 4.8 | | 30 | | EVRC | 3GPP2 | 9.6/4.8/1.2 | 8 | 20 | | DVI | IMA | 32 | Variable | Sampling | | L16 | | 128 | Variable | Sampling | +------------+----------+-------------+-------------+------------+ Common VoIP Audio Codecs 2.2.3) G.711 (alaw/ulaw) The G.711 audio Codec is a fairly straight-forward sample-based encoding. It encodes audio as a linear grouping of 8-bit audio samples arranged in the order in which they were sampled. Throughput Utilizing the LSB of every sample in a G.711 encoded RTP payload, which is commonly of 160 bytes in size, a total of 20 bytes of message data can be successfully embedded. Given an average of 50 packets per second unidirectional, this results in approximately 1,000 bytes of full-duplex throughput of message data within the established covert channel. 2.3) Identified Problems and Challenges Many problems and challenges that arise when considering the use of steganography with RTP stem from properties of the underlying transport mechanism, the nature of real-time audio, or the RTP protocol itself. The following sections outline various problems and challenges that were identified when attempting to use steganography with RTP. 2.3.1) Unreliable Transport One of the most significant challenges to utilizing RTP packet payloads as cover-medium is that RTP generally employs UDP as its underlying transport protocol. This is appropriate for a streaming multimedia protocol, however it is less than ideal for a reliable covert communications channel. UDP is a datagram messaging protocol which is considered connectionless and unreliable[5]. As such, each packet's successful delivery and order of arrival is not guaranteed. Any message data which is split across multiple RTP cover-packets may arrive out of order or not arrive at all. 2.3.2) Cover-Medium Size Limitations The RTP protocol, being designed for ``real-time'' transport of media, behaves like a streaming protocol should. RTP datagram packets are relatively small and there are usually tens to hundreds of packets sent per second in the process of relaying audio between two peers. Additionally, different audio Codecs provide for different encoded audio sample sizes, resulting in a variable amount of available space for embedding which is dependent upon which Codec the audio for any individual RTP packet is encoded with. Due to the small size of these packets and the common constraint among many steganographic embedding methods which limits the amount of data that is able to be embedded to a fraction of the size of the cover-medium, a very limited amount of space is actually available for the embedding of message data. As such, large message data will inevitably be required to be split across multiple cover-packets and thus must be reassembled at its destination. 2.3.3) Latency RTP is, by design, extremely susceptible to media degradation due to packet latency. As such, any processing overhead from the embedding of message data into the cover-medium or delay due to inspection of potential cover-medium packets may have a noticeable impact on the end-user's quality of experience. When manipulating an RTP stream between two endpoints that are expecting packet delivery in a timely manner, a steganographic system cannot be overly invasive when packets are not needed for embedding and must be efficient at its task when they are. 2.3.4) Tracking of RTP Streams In normal operation, RTP establishes two packet streams to form a session between two endpoints. Each endpoint uses one stream to send multimedia data to the other, thus achieving full-duplex communication via two unidirectional packet streams. When identifying an RTP session to be utilized as cover-medium for a full-duplex covert communications channel, the two paired streams must be correctly identified and tracked. 2.3.5) Raw vs. Compressed Audio It is important to consider that audio being transported via RTP may be compressed. To successfully embed message data into a cover-medium, it is generally required that it is performed against the raw data so as to properly identify and utilize the cover-medium's redundant bits. As such, identification of compressed cover-medium, decompression, modification of the raw data, and then re-compression may be required. Lossy vs. Lossless Compression When considering the potential use of compression within the cover-medium, it is also important to consider the type of compression used. Most compression methods can be categorized into two types; lossy compression and lossless compression. If the compression method used is of the lossy type, the integrity of any message data embedded into the cover-medium prior to compression may be compromised when the stego-medium is uncompressed as some of the original audio data may be lost. Due to this property of lossy compression types, audio data compressed in this manner may not be appropriate for use as cover-medium without additional safeguards against this loss. 2.3.6) Media Gateway Audio Modifications RTP, as a protocol being potentially routed across multiple networks by its underlying transport, network, and data-link protocols, may also be routed or gatewayed along its path by other intermediary telephony devices like Media Gateways or Back-to-Back User Agent (B2BUA) devices. At such transition points, the media being transported may undergo potential modification. Some of these modifications include translation from one audio Codec to another, down-sampling, normalization, or mixing with other audio streams. Invasive changes such as these can potentially impact the integrity of any message data embedded within the stego-medium. Audio Codec Conversion Codec conversion takes place when an intermediary device such as a Media Gateway is providing translation services for two endpoints that support disparate sets of Codecs. For example, one endpoint may support GSM encoding of audio and the other only G.711 or Speex encoding. Unless an intermediary translator is involved, these two devices cannot directly establish an RTP audio channel. The intermediary device essentially translates audio from the Codec being used by one endpoint to a Codec that can be understood by the other. Audio Codec conversion may also take place if the inherent latency or Quality-of-Service (QoS) properties of the transport network on either side of the intermediary device requires a lighter-weight Codec. Down-sampling and Normalization Down-sampling and normalization may be performed on an audio payload to bring the properties of the audio such as volume and background white-noise more in line with the other party's audio stream. Occasionally this task is handled by the endpoint devices when playing the media for the user. In that scenario the integrity of the stego-medium will likely remain intact as the audio payload isn't actually modified in transit. However, there are scenarios where an intermediary media device may actually re-sample or otherwise modify the payload of the media stream specifically to alter its audible properties. In these cases, the integrity of the stego-medium may become compromised. Audio Stream Mixing When performing conferencing or other types of multi-party calls, it is possible that multiple party's audio streams may be mixed together. Such invasive modification of the audio will almost certainly compromise the integrity of the stego-medium. 2.3.7) Mid-session Audio Codec Change Most VoIP signaling protocols provide methods for VoIP endpoints to change the audio encoding method on the fly. Due to this functionality an RTP session may begin using one Codec and then switch to a completely different Codec mid-session. This functionality may be used for a variety of reasons including QoS metrics not being met, inclusion of a new endpoint in the call that does not support the original Codec, or any number of other reasons. Due to this dynamic nature, any steganographic system attempting to embed data into an RTP stream's packets must be able to dynamically adjust its message embedding algorithm to accommodate different Codecs' various sample sizes and layout within the RTP packet payload. 3) Reference Implementation: SteganRTP 3.1) Design Goals The goals set forth for the SteganRTP reference implementation[20] are described in the following subsections. 3.1.1) Achieve Steganography As stated in Section 1.4, the primary goal of steganography is to hide the fact that communication is taking place. Therefore, it is the primary goal of this reference implementation to prevent indication to a third-party observer of the RTP audio stream that anything other than the overt communication between the two RTP endpoints is taking place. 3.1.2) Full-Duplex Communications Channel This reference implementation intends to achieve a full-duplex covert communication channel between the two RTP endpoints, mirroring the utility of RTP itself. This will be accomplished through the use of both RTP streams that comprise an RTP session. By utilizing both RTP streams within the session, either application will be able to both send and receive data simultaneously. 3.1.3) Compensate for Unreliable Transport This reference implementation intends to compensate for the unreliable transport inherent to RTP. This will be accomplished by providing a data sequencing, tracking, and resending mechanism. 3.1.4) Identical User Experience Regardless of Mode of Operation This reference implementation intends to provide two distinct modes of operation. The first mode of operation is described as the SteganRTP application running locally on the same host as the RTP endpoint. The second mode of operation is described as the SteganRTP application running on an intermediary host along the route from one RTP endpoint to another. This intermediary host must be forwarding or bridging the RTP traffic as an active man-in-the-middle (MITM). The reference implementation intends for the user experience of running the SteganRTP application to be identical regardless of the mode of operation. This will be accomplished by interfacing directly with the host operating system's network stack in order to hook the desired packet streams. 3.1.5) Multi-type Data Transfer The reference implementation intends to provide simultaneous transfer of multiple types of data, such as text chat, file transfer, and remote shell access. This will be accomplished by providing type indication and formatting for each type of supported data being transferred. 3.2) Operational Architecture As mentioned in Section 3.1.4 above, the application will operate in one of two distinct modes: the application running locally on the same host as the RTP endpoint or the application running as an active MITM . It is not intended that the two SteganRTP applications which are communicating be operating in the same mode. Thus, a mixed-mode operation such as is described below is entirely possible. It is important to note that the SteganRTP application is only required to be bridging or forwarding the RTP stream considered outbound from the closer RTP endpoint destined for the more remote RTP endpoint. Conversely, the application is only required to be able to observe the inbound RTP stream flowing in the other direction as it does not need to invasively modify any packets from the inbound stream. 3.3) Application Flow When the SteganRTP application begins it performs an initialization phase by setting up internal memory structures and configuration information from the command-line. Next, it observes network traffic until it identifies an RTP session which falls within the constraints specified by the user on the command-line. These constraints are how the user controls selection of the RTP sessions between specific RTP endpoints to utilize as cover-medium and, by virtue, which remote SteganRTP application to communicate with. After identifying an RTP session, SteganRTP inserts hooks into the host's network stack in order to receive the desired packets upon transmission or arrival, or both if the SteganRTP application is operating in the active MITM scenario. From these hooks a packet queue is created which the application then reads individual packets from. Whether the packet is considered inbound or outbound determines the further course of the application. Whether a packet is considered inbound or outbound is determined by which RTP endpoint network address and port is defined as ``local'' or ``remote'', which in the case of the active MITM operation can be inferred as ``near'' or ``far'', respectively. When an inbound RTP packet is read from the queue, it is copied for the application's use and the original packet is immediately sent as the SteganRTP application does not need to invasively modify it. All received inbound packets are assumed to be potential cover-medium for the covert channel, so potential message data is then extracted from each inbound packet. The potential message data is then decrypted, and the result is checked for a valid checksum value in the potential message's header. If the checksum is valid, the message data is sent to the message handler component for processing. When an outbound RTP packet is read from the queue, the SteganRTP application immediately polls its outbound data queues for any message data waiting to be sent. If there is no data waiting to be sent, the packet is immediately sent unmodified. If there is message data waiting to be sent, as much of that data as will fit into the cover-medium packet's payload is read from its file descriptor, packaged as a formatted message, encrypted, and then steganographically embedded into the RTP packet's payload. The modified RTP packet is then sent in place of the original RTP packet. 3.3.1) Initialization Upon start-up, SteganRTP first initializes various memory structures such as message caches, configuration settings, and an RTP session context structure. The most notable task performed during the initialization phase is the computation of keying information used by various components. The method chosen for creation of this keying information is to create a 20-byte SHA-1[21] hash of a user-supplied shared secret text string. Due to the result of this operation being used as keying information by various components of the overall SteganRTP system, this shared secret must be provided to both SteganRTP applications that wish to communicate with each other. The 20-byte result of the SHA-1 hash function against the user-supplied shared secret is defined here as the keyhash and described by Equation below where f represents the SHA-1 hash function. keyhash = f( sharedsecret ) SHA-1 Collision Irrelevance In February of 2005, a group of Chinese researchers developed an algorithm for finding SHA-1 hash collisions faster than brute force. They proved it possible to find collisions in the full 80-step SHA-1 in less than 269 hash operations, about 2,000 times faster than brute force of the 280 hash operation theoretical bound. The paper also includes search attacks for finding collisions in the 58-step SHA-1 in 233 hash operations and SHA-0 in 239 hash operations. The biggest impact that this discovery has pertains to use of SHA-1 hashes in digital signatures and technologies where one of the pre-images is known. By searching for a second pre-image which hashes to the same value as the original, a digital signature for the original may theoretically be used to authenticate a forgery. The use of SHA-1 by the SteganRTP reference implementation is solely to compute a bit-pad of keying information with a longer, seemingly more random bit distribution than what is likely provided directly by user input as the shared secret. The result of the SHA-1 hash of the user's shared secret is used directly as keying information. In order to launch a collision attack against the hash used as the bit-pad, the attacker would have to either obtain the original user-supplied shared secret or the hash itself. Due to the hash being used directly as keying information, the possession of it by an attacker has already compromised the security of the data being obfuscated with it; computing one or more additional pre-images which hash to a collision provides no additional value for the attacker. 3.3.2) RTP Session Identification RTP session identification is performed using libfindrtp. libfindrtp is a C library that identifies sessions between two endpoints by observing VoIP signaling traffic and watching for call set-up. Constraints can be passed to the library to limit session identification to a single endpoint, specific multiple endpoints, or even specific multiple endpoints using specific UDP ports. These constraints are passed through to libfindrtp from the input provided to the SteganRTP application via the command-line. At the time of this writing, libfindrtp supports session identification via the Session Initiation Protocol (SIP)[1] and Cisco Skinny Call Control Protocol (SCCP)[3] VoIP signaling protocols. 3.3.3) Hooking Packets The SteganRTP application makes use of NetFilter[23] hook points in order to receive both inbound and outbound RTP session packets. The Linux kernel is instructed to pass specific packets to an application by inserting an iptables rule describing the packets with a target of QUEUE. Packets which match a rule with a target of QUEUE are queued to be read by a registered NetFilter user-space queuing agent. Access to this queue is provided to the SteganRTP application via an API provided by the NetFilter C library libipq. An iptables rule used to hook packets via this interface may be inserted at any of the NetFilter hook points. For the most beneficial use by the SteganRTP application, packets must be hooked at points where their integrity as stego-medium is maintained. Thus, inbound packets are hooked at the PRE-ROUTING hook point and outbound packets are hooked at the POST-ROUTING hook point. In this manner, incoming packets are able to be processed by the SteganRTP application prior to any potential modification by the local system and outbound packets are able to be modified by SteganRTP after the local system is essentially finished with them. SteganRTP registers itself as a user-space queuing agent for NetFilter via libipq. SteganRTP then creates two iptables rules in the NetFilter engine with targets of QUEUE. The first rule matches the inbound RTP stream at the PRE-ROUTING hook point. The second rule matches the outbound RTP stream at the POST-ROUTING hook point. 3.3.4) Reading Packets Using the packet hooks described in the previous section, SteganRTP is then able to read packets from the provided packet queue, determine if they are considered inbound or outbound packets, and pass them to the appropriate processing functions. The processing functions may then analyze them, modify them if needed, place modified versions back into the queue in place of the original, and instruct the queue to accept the packet for further routing. 3.3.5) Inbound Processing As outlined above, the basic steps for inbound packet processing are as follows: 1. Immediately accept the packet for routing. 2. Extract potential message data. 3. Decrypt potential message data. 4. Verify the potential message header's checksum. 5. Send valid messages to the message handler. 3.3.6) Outbound Processing As outlined above, the basic steps for outbound packet processing are as follows: 1. Poll for message data waiting to be sent. 2. If there is no message data waiting, immediately send the packet and return. 3. Create a new formatted message with header based on the properties of the RTP packet who's payload is being used as cover-medium. 4. Read as much of the waiting data as will fit in the formatted message. 5. Encrypt the message. 6. Embed the message into the RTP payload cover-medium. 7. Send the modified RTP packet in place of the original via the NetFilter user-space queue. 3.3.7) Session Timeout In the event that no RTP packets are available in the NetFilter queue for a period of time, all session information is dropped and process flow returns to the RTP session identification phase to locate a new session for use. In the event that RTP packets are being received but no valid messages have been received for a period of time, the SteganRTP application attempts to solicit a response from the remote application. If these solicitations have failed by the timeout period, all session information is dropped and process flow returns to the RTP session identification phase to locate a new session for use. 3.4) Communication Protocol Specification The SteganRTP communication protocol makes use of formatted messages which are steganographically embedded into the payloads of individual RTP packets. This steganographic embedding creates the covert channel within which the communication protocol described in the following sections operates. 3.4.1) The cover medium: RTP Packet Below, reproduced verbatim from the RTP specification[4], describes the RTP packet header. Of special interest are the payload type (PT), sequence number, and timestamp fields, all of which will become relevant when building, encrypting, and steganographically embedding the message data into the packet's payload. The remainder of the packet contains an optional number of header extensions which are irrelevant to the SteganRTP communication protocol, and finally the encoded media data, otherwise known as the RTP packet's payload, which will be utilized by SteganRTP as cover-medium. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | contributing source (CSRC) identifiers | | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The 7-bit payload type field indicates the audio Codec used to encode the payload. The 16-bit sequence number field is a standard incrementing sequence number. The 32-bit timestamp field describes the sampling instant of the first sample in the payload, and the remaining packet data is the audio payload as encoded by the indicated Codec. 3.4.2) Message Format The format of the messages that the SteganRTP applications use to communicate with each other is described in the following sections. Below describes the core message format of all types of SteganRTP formatted messages. This format consists of two fields, the Checksum / ID and Sequence fields followed by a standard Type-Length-Value (TLV)[24] structure. The Checksum / ID, Sequence, Type, and Length fields comprise the message header, while the Value field is considered the message body, or payload. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum / ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence | Type | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Value (Type-Defined Body) | ! ! . . The 32-bit Checksum / ID field contains a hash value which is used to identify whether or not a potential message that is extracted from the payload of an inbound RTP packet is indeed a valid SteganRTP communication protocol message. The hashword[25] function is used to compute this hash. The function's two primary operands consist of the keying information defined as keyhash in Section 3.3.1 and the sum of the message's Sequence, Type, and Length header fields. This value is defined as checksumid and is described by Equation below. checksumid = hashword( keyhash, (Sequence + Type + Length) ) The verification of extracted potential messages is required due to the fact that some packets in the inbound RTP stream may not contain SteganRTP messages if there was no outbound data waiting to be sent by the remote application when the RTP packet in question traversed it. The hash function used to compute this checksum value incorporates the keyhash so as not to be computable solely from message data, which would allow an observer to also verify that a message is embedded within the RTP payload. The 16-bit Sequence field is a standard incrementing sequence number, the 8-bit Type field indicates what type of message it is, and the 8-bit Length field indicates the length, in bytes, of the Value field. The Value field contains the message's payload. 3.4.3) Message Types The currently defined message types are listed in the table below. +----+---------------------+ | ID | Type | +----+---------------------+ | 0 | Reserved | | 1 | Control | | 10 | Chat Data | | 11 | File Data | | 12 | Shell Input Data | | 13 | Shell Output Data | +----+---------------------+ Control Messages Below describes the format of SteganRTP control messages. Control messages are used to send non-user data to the remote SteganRTP application to convey operational information such as requesting a message resend or indicating that a file is about to be sent and providing that file's context information. Control messages consist of one or more stacked TLV structures and are not required to be 32-bit aligned. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Control Type | Length | Value | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ! ! . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Control Type | Length | Value | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ! ! . . The 8-bit Control Type field indicates the type of control data contained in the TLV structure whereas the 8-bit Length field indicates the size, in bytes, of the Value field. The Value field contains the control data of the indicated type. Control Message Types The currently defined control message types are listed in the table below. +----+---------------+ | ID | Type | +----+---------------+ | 0 | Reserved | | 1 | Echo Request | | 2 | Echo Reply | | 3 | Resend | | 4 | Start File | | 5 | End File | +----+---------------+ Type 1: Echo Request The Echo Request control message is used to prompt the remote SteganRTP application for a response, allowing the local application making the request to determine if the remote application is still present and communicating. This message is sent when a session inactivity timeout limit is approaching. The format of an Echo Request control message: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1 | 2 | Seq | Payload | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The Control Type field's value is 1, indicating that it is an Echo Request control message, and the Length field's value is 2, indicating the 2-byte control message payload. The control message payload consists of an 8-bit Seq field which contains a standard incrementing sequence number specific to Echo Requests, and an 8-bit Echo Request Payload, which contains a random bit-string. The Seq value is used to correlate sent Echo Request messages with received Echo Reply messages and the Payload field received in an Echo Reply message must match the random bit-string sent in its corresponding Echo Request message. Type 2: Echo Reply The Echo Reply control message is used to respond to the remote SteganRTP application's Echo Request message. The format of the Echo Reply message is identical to the Echo Request message as described in above, however the Control Type field's value is 2 rather than 1. Type 3: Resend The Resend control message is used to request the resending of a specified message by the remote SteganRTP application, allowing the local application to request missing or corrupted messages. This message is sent when the application begins to receive messages which contain sequence numbers beyond the next sequence number that is expected. The format of a Resend control message: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 3 | 2 | Requested Seq Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The Control Type field's value is 3, indicating that it is a Resend control message, and the Length field's value is 2, indicating the 2-byte control message payload. The control message payload consists of a 16-bit Requested Seq Number field which indicates the sequence number of the message to be resent. Type 4: Start File The Start File control message is used to indicate to the remote application that that local application will begin sending file data for a new file transfer. This message is sent when the user executes the command to transfer a file. The format of a Start File control message: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 4 | # | File ID | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | Filename | ! ! . . The Control Type field's value is 4, indicating that it is a Start File control message, and the Length field's value is 1 plus the string length, in bytes, of the filename of the file being sent, indicating the total size of the control message payload. The control message payload consists of an 8-bit File ID field which indicates the sending application's unique ID value for the file, and the Filename field is the name of the file being sent in ASCII. Type 5: End File The End File control message is used to indicate to the remote application that that local application is finished sending file data for a particular file transfer. This message is sent when the local application has finished sending all data related to the open file descriptor being used to send data from a file. The format of a End File control message: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 5 | 1 | File ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The Control Type field's value is 5, indicating that it is a End File control message, and the Length field's value is 1, indicating the 1-byte control message payload. The control message payload consists of an 8-bit File ID field which indicates the sending application's unique ID value for the file who's transfer is now complete. Data Messages Non-control messages are considered data messages and contain some form of actual data for the user, whether it be text chat data, incoming file data, a command for the local shell service, or a response from the remote shell service. These various types of data are differentiated by the value of the message header's Type field. Chat Data Messages The Chat Data Message is used to transmit text chat data between SteganRTP applications. This type of data requires no context information, thus the message payload contains only a single field, Chat Data. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Chat Data | ! ! . . File Data Messages The File Data Message is used to transmit data file contents between SteganRTP applications. Because multiple file transfers may be in progress at any given time, this type of data must be accompanied with context information indicating which file transfer the chunk of data belongs to. The format of a File Data message: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | File ID | File Data | +-+-+-+-+-+-+-+-+ | ! ! . . The File ID field's value is a unique file ID number chosen for the particular file transfer taking place and is used to indicate which file transfer the chunk of data contained in the File Data field belongs to. The File Data field is a chunk of data from the file being transferred. The proper order for reconstruction of the file chunks transferred by these messages is ensured by the message header's sequence number. Shell Data Messages The Shell Input Data and Shell Output Data Messages are used to transmit shell input to, and receive shell output from, a remote SteganRTP shell service, respectively. This type of data requires no context information, thus the message payload contains only a single field, Shell Data, as described by below. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Shell Data | ! ! . . 3.5) Functional Components 3.5.1) File Descriptor Lists Two separate file descriptor lists are maintained; destinations for inbound data and sources of outbound data. The data structure for storage of a file descriptor and its data for inclusion in either list is defined below. /* Structure used for file descriptor information */ typedef struct file_info_t { u_int8_t id; char *name; u_int8_t type; int fd; struct file_info_t *next; struct file_info_t *prev; } file_info; The independence of the file descriptor lists from the outbound data polling and message handler components provides for a flexible and versatile environment within which to expand functionality. In order to include new data types for transfer, all that is required is to define a new data type ID for both applications to correlate messages upon, open a file descriptor to the appropriate place to read or write the data, and include the file descriptor in the appropriate list. Inbound File Descriptors Inbound File Descriptors are a list of file descriptors for various destinations that inbound data may be directed to. The order of these file descriptors as included in the list is irrelevant as which file descriptor data is destined for is looked up by matching the message type and properties with the file descriptor's type and properties. Chat Interface Inbound chat data is written to this file descriptor. This file descriptor is tied to the chat window of the SteganRTP ncurses interface. Remote Shell Interface Inbound shell data from the remote application's shell service is written to this file descriptor. This file descriptor is tied to the shell window of the SteganRTP ncurses interface. Local Shell Service Inbound shell data to the local application's shell service is written to this file descriptor. This file descriptor is tied to the local process providing shell access. This file descriptor does not exist in the list if the local shell service is disabled. File Transfers Any number of file descriptors for data files being actively received may be appended or removed from the end of the inbound file descriptors list. Outbound File Descriptors Outbound File Descriptors are polled, in order, for data waiting to be sent. Due to being polled in order, they are essentially prioritized in that order and data waiting to be sent from a prior descriptor in the list will have precedence over data waiting to be sent from a latter descriptor. The file descriptors included in the outbound list are as follows: Raw Message Interface Entire, unencrypted outbound messages are written to this file descriptor. This file descriptor is used for the replaying of entire messages in response to a Resend control message as described in Section 3.4.3. Control Message Interface Outbound control messages as described in Section 3.4.3 are written to this file descriptor after creation. Chat Interface Outbound chat data is written to this file descriptor. This file descriptor is tied to the command window of the SteganRTP ncurses interface. All non-command text entered into the command window while in chat mode is considered chat data. Remote Shell Interface Outbound shell data is written to this file descriptor. This file descriptor is tied to the command window of the SteganRTP ncurses interface. All non-command text entered into the command window while in shell mode is considered shell data. Local Shell Service Outbound shell data from the local shell service is written to this file descriptor. This file descriptor is tied to the local process providing shell access. This file descriptor does not exist in the list if the local shell service is disabled. File Transfers Any number of file descriptors for data files being actively sent may be appended or removed from the end of the outbound file descriptors list. 3.5.2) Message Handler The SteganRTP application's message handler receives all valid incoming messages as verified by the RTP packet receiving system for inbound packets. This component performs all internal state changes and administrative tasks in response to control messages. It also handles the routing of inbound data message payloads to the appropriate file descriptor in the inbound file descriptors list. Administrative Tasks Echo Reply If an Echo Request control message is received from the remote application, the message handler constructs an appropriate Echo Reply control message as described in Section 3.4.3 and writes it to the Control Message Interface file descriptor in the outbound file descriptor list. Start File Transfer If a Start File control message is received from the remote application, the message handler opens a new file descriptor using the file's context information contained in the control message and appends the file descriptor to the inbound file descriptors list. End File Transfer If an End File control message is received from the remote application, the message handler closes the file descriptor for the file transfer indicated and removes it from the inbound file descriptors list. Data Routing Chat Data Inbound text chat data is buffered until a complete line of text is received and then is written to the Chat Interface file descriptor in the inbound file descriptors list. A complete line of text is defined as being terminated by a new-line character. File Data Inbound file transfer data is written to the appropriate file descriptor in the inbound file descriptors list for the file transfer that the data belongs to. Shell Input Data Shell Input Data messages contain input data for the local application's shell service and is written to the Local Shell Service file descriptor in the inbound file descriptors list. Shell Output Data Shell Output Data messages contain response data from the remote application's shell service and is written to the Remote Shell Interface file descriptor in the inbound file descriptors list. 3.5.3) Encryption System The encryption method chosen for use in the SteganRTP reference implementation is not really encryption at all. In favor of light-weight and speed, a simple bitwise exclusive-or (XOR)[26] obfuscation method was chosen as a symmetric cipher. The choice of encryption method here does not indicate that another, more robust type of encryption could not be used; rather, the modular design of the reference implementation promotes drop-in replacement of the current encryption system entirely, assuming that the replacement encryption method does not have a noticeable impact upon the latency of the overt RTP stream being used as cover-medium. The author does not claim that the obfuscation method used by the SteganRTP reference implementation to be cryptographically secure. Rather, it is well documented in the literature that XOR against a repeating keystream is insecure. The obfuscation of message data is merely meant to provide some rudimentary protections against statistical steganalysis which focuses upon perceptible properties of language within the stego-medium. The XOR obfuscation method employed by the SteganRTP reference implementation consists of the following steps: 1. Create a bit-pad for use as keying information. 2. Choose an offset into the bit pad to begin using the keying information. 3. XOR the message against the bit pad, byte by byte. Bit-pad Creation The method chosen for creation of the bit-pad is simply to duplicate the bit-string found in keyhash, the creation of which is described in detail in Section 3.3.1. Choose a Bit-pad Offset To help protect against some forms of statistical analysis that have proved effective against XOR obfuscation using repeated static keying information, it was decided against beginning every XOR loop at the same position within keyhash. To avoid this, a new offset into keyhash for each message must be chosen. The method that the SteganRTP reference implementation employs to determine this offset is to use the hashword[25] function to create a 32-bit hash of keyhash and the sum of the RTP packet being embedded into's Seq and Timestamp header fields. The resultant hash is then interpreted as a 32-bit integer. The integer modulus 20 is the chosen offset into keyhash. The integer which is the result of the offset choosing operation and is within the range of 0 through 19 is defined here as keyhashoffset and described by Equation below. keyhash_offset = hashword( keyhash, ( RTP_Seq + RTP_TS ) ) mod 20 The keyhash_offset equation incorporates keyhash so as to not be entirely computable from observable information in the RTP packet header. XOR Loop When used as a bit-pad for the XOR operation loop, keyhash is used 8-bits, or 1-byte, at a time. The XOR loop begins with the first byte of the message to be obfuscated and the byte located at index keyhash_offset within keyhash. The two bytes are XORed to produce a result byte. This result byte is placed into the obfuscated message buffer at the same byte index as the original message byte. If the end of the bit-pad is reached, the position of the next byte in the bit-pad returns to the beginning of the bit-pad. When the end of the original message is reached, the obfuscated message buffer should be of equal length to the original message and have one corresponding obfuscated byte for each original byte in the message. It is important to note that within the scope of steganography terminology, whether or not message data is obfuscated or encrypted is irrelevant. As such, further reference to the obfuscated message will still be referred to as the message, or message data. 3.5.4) Embedding System The embedding system that was developed for the SteganRTP reference implementation is a generalized least-significant-bit (LSB) steganographic data embedding method. It is generalized such that when provided with a cover-medium buffer, its length, the size of each word value within the cover-medium buffer, and the message buffer to be embedded, it is then able to perform the LSB embedding operation. In this way, any audio Codec which uses a linear grouping of fixed-length audio samples should be able to be utilized as cover-medium by the embedding system. For the purpose of discussion of the SteganRTP embedding system, the term word value used in this context is equivalent to audio sample. The example used here, as well as the only Codec currently supported by the reference implementation, is G.711. G.711 is a Codec which encodes audio as a linear grouping of 8-bit audio samples. This encoded data is transported by RTP packets as their payload and will serve as cover-medium. Using the generalized LSB embedding method, the LSB of each word value in the cover-medium is modified to be equivalent to a single bit from the message data buffer, in order. The properties of the RTP packet, such as its payload length and payload type header value, determine how much message data can be embedded into the packet's payload. The RTP packet's payload size is determined by subtracting the size of the RTP packet's header from the value of the UDP packet header's Length field. The wordsize is equivalent to the sample size used by the RTP packet's Codec, indicated by the RTP packet header's payload type field. Modifying 1 bit from each word value requires 8 word values to embed a single byte of message data. Thus, the amount of available space within an RTP packet's payload for embedding is found by multiplying the word value size by 8, then dividing the RTP packet payload size by the result. The resultant value is defined here as the RTP packet's available_space for embedding and is described by Equation below. available_space = RTP_payload_size / (wordsize * 8 ) The space available for user data after prepending the SteganRTP communication protocol's message header is defined here as the SteganRTP message's payload_size and is described by Equation below. payload_size = available_space - sizeof( message_header ) Thus, payload_size bytes of user data can be packaged as a SteganRTP message and embedded into an RTP packet payload cover-medium of availablespace bytes. If an RTP packet is too small to contain a valid message, it is passed along unmodified. If a message being embedded is smaller than the available space in the cover-medium, the message is padded out to the available size with random data. This ensures a more uniform distribution of modified values throughout the cover-medium. 3.5.5) Extraction System All inbound RTP packets are sent to the extraction system where potential message data is extracted, decrypted, and then verified. The extraction system is essentially a reverse of the embedding system described in Section 3.5.4 and then a pass through the symmetric encryption system described in Section 3.5.3. This results in an decrypted potential message where the message's Checksum / ID header field value can be verified to determine whether or not the extracted potential message is valid. If an extracted potential message is found to be valid, it is passed to the message handler component. 3.5.6) Outbound Data Polling System File descriptors in the outbound file descriptors list are polled, in order, for data waiting to be sent. When a file descriptor is found to have data, a new formatted message is created if needed and data is read to fill the payload of that message from the file descriptor. The message type is indicated by the file descriptor's record in the outbound file descriptors list. The result of this operation is a formatted SteganRTP message ready for encryption and embedding into the cover-medium. 3.5.7) Message Caching System All inbound and outbound SteganRTP messages are cached. The outbound message cache provides a mechanism for retrieval of any given message in the event that the remote application issues a Resend control message requesting that the message be resent. The inbound message cache provides a mechanism for storage of messages received that are beyond the expected sequence number. Once the expected message is received, the others may be read back from the cache rather than requesting that the remote application resend them. 3.5.8) Shell Service The local application's shell service is essentially a child process executing a shell. This process's standard input and output file descriptors are replaced with file descriptors which are stored in the inbound and outbound file descriptors lists, respectively. The local shell service is disabled by default in the SteganRTP reference implementation and must be enabled via the command-line. 3.6) Use 3.6.1) Command-line The SteganRTP application provides a number of command-line arguments allowing for control and configuration of various components. The following sections describe each in detail. Usage Output Overview The following usage output was copied verbatim from the most recent version of the reference implementation, SteganRTP 0.3b. Usage: steganrtp [general options] -t -k required options: at least one of: -a The "source" of the RTP session, or, host treated as the "close" endpoint (host A) -b The "destination" of the RTP session, or, host treated as the "remote" endpoint (host B) -k Shared secret used as a key to obfuscate communications general options: -c Host A's RTP port -d Host B's RTP port -i Interface device (defaults to eth0) -s Enable the shell service (DANGEROUS) -v Increase verbosity (repeat for additional verbosity) help and documentation: -V Print version information and exit -e Show usage examples and exit -h Print help message and exit Command-line Arguments The following command-line arguments are available from the SteganRTP application's command-line. -a host host is the name or IP address of the closest side of the RTP session desired to be utilized as cover-medium (Host A). -b host host is the name or IP address of the remote size of the RTP session desired to be utilized as cover-medium (Host B). -k keyphrase keyphrase is a shared secret between the users of the two SteganRTP instances which will be communicating. In some cases, a single user may be running both instances. The keyphrase is used to generate a bit-pad via the SHA-1 hash function which will later be used to obfuscate the data being steganographically embedded into the RTP audio cover-data. -c port port is the RTP port used by Host A. -d port port is the RTP port used by Host B. -i interface interface is the interface to use on the local host. This parameter defaults to "eth0". -s This argument enables the command shell service. If the command shell service is enabled, the user of the remote instance of SteganRTP will be able to execute commands on the local system as the user running SteganRTP. You likely don't want this unless you are the user running both instances of SteganRTP and intend to use the remote instance as an interface for a remote shell on that host. This feature can be useful for remote administration of a system without direct access to the system, assuming that RTP is allowed to traverse traffic policy enforcement points. -v This argument increases the verbosity level. Repeat for higher levels of verbosity. -V This argument prints SteganRTP's version information and exits. -e This argument prints a quick examples reference. -h This argument prints the usage (help) information and exits. Usage Examples You can print a quick reference of the following examples from the SteganRTP command-line by using the -e command-line argument. The simplest command-line you can execute to successfully run SteganRTP is: steganrtp -k -b This will begin a session utilizing any RTP session involving host-b as the destination endpoint. steganrtp -k -a -b -i This will begin a session utilizing any RTP session between host-a and host-b using interface interface steganrtp -k -a -b -i -s This is the same as the previous example but will enable the command shell service: steganrtp -k -a -b -c -d This will begin a session utilizing a specific RTP session between host-a on port a-port and host-b on b-port. Note, this will effectively disable RTP session auto-identification and will attempt to use an RTP session as described whether it exists or not. This is useful for when an RTP session that is desirable for utilization is already in progress as the other examples rely on libfindrtp to identify the RTP session as it is being set up by VoIP signaling and thus must be waiting for the call-setup. 3.6.2) User Interface SteganRTP provides a curses user interface featuring four windows; the Command window at the bottom of the screen, the large Main window in the middle of the screen, and the Input and Output Status windows at the top of the screen. Windows Command Window All keyboard input, if accepted, is displayed in the Command window. Lines of input that are not prefixed with a slash ('/') character are treated as chat text and are sent to the remote instance of SteganRTP as such. Lines of input that begin with a slash are considered commands and are processed by the local instance of SteganRTP. Main Window When in Chat mode, chat text and general SteganRTP information messages and events are displayed in the Main window. When in shell mode, this window is overloaded with the input to and output of the shell service provided by the remote instance of SteganRTP. Input Status Window Events related to incoming RTP packets or SteganRTP communication messages are displayed in the Input Status window. Output Status Window Events related to output RTP packets or SteganRTP communication messages are displayed in the Output Status window. Commands The following commands can be executed from within the Command window: /chat The "chat" command puts the interface into Chat Mode. /sendfile filename The "sendfile" command queues a file for transmission to the remote instance of SteganRTP. filename is the path location and filename of the local file to be sent. /shell The "shell" command puts the interface into Shell Mode. /quit /exit The "quit" and "exit" commands exit the program. /help /? The "help" and "?" commands print an available command list. 4) Solutions to Problems and Challenges The following sections describe this research effort's approach to solving many of the problems and challenges that were identified in Section 2.3, as implemented via the SteganRTP reference implementation. Most of the solutions that have been devised during this research effort involved the creation of a communications protocol to operate within the covert channel established within the cover-medium. This protocol employs a formatted message header which is prepended to user message data before being embedded in the cover-medium, providing various utility to the application making use of the protocol. 4.1) Unreliable Transport To mitigate the unreliable properties of the underlying transport protocols used to transmit the cover-medium, the message header contains a sequence number. This sequence number coupled with the message caching system allows the recipient to both identify when an expected message is missing as well as request a resend of a particular message via a control message. This property also provides the added benefit of detecting erroneously or maliciously replayed messages. When considering potential solutions for this problem, various types of Forward Error Correction (FEC) were considered. Due to the limited space available for message data as a result of the size of cover-medium available, the additional space required for redundant data by most algorithms considered deemed them to be unfit for purpose within this research effort's context. 4.2) Cover-Medium Size Limitations The same property of RTP which restricts the size of available cover-medium in each packet is luckily the same property which ensures that there are an abundance of packets being sent between RTP endpoints every second. User data can be spread over multiple messages and cover-packets and then reassembled at their destination. For this research effort's purposes and goals, namely the timely transfer of user text chat, interactive shell access, and transfer of small files, an achieved throughput of 1,000 bytes per second as described in Section 2.2.3 was found to be more than adequate. 4.3) Latency To prevent against unintended impact on RTP packet latency, care was taken to efficiently perform a number of operations: 4.3.1) Inbound Packet Processing When receiving inbound RTP packets for processing, the receiving system does not require making any modifications to the received packet. In the SteganRTP reference implementation, the packet is received and immediately accepted for continued routing by the packet queue prior to extracting, decrypting, and verifying any potential message data found within the payload. 4.3.2) Outbound Packet Processing When receiving outbound RTP packets for processing, the fewest number of operations possible must be performed in order to make a decision on whether or not the packet should be immediately accepted for continued routing or if it must be held for modification. In the SteganRTP reference implementation, the packet is received and then all active outbound file descriptors are polled for data waiting to be sent. If no data is waiting to be sent, the packet is then accepted for continued routing by the packet queue. 4.3.3) Encryption Overhead When encrypting the raw message prior to embedding into the cover-medium, a low-overhead algorithm was used. The SteganRTP reference implementation employs an XOR against a SHA-1 hash of a user-supplied shared-secret. 4.4) Tracking of RTP Streams Identification and tracking of RTP streams is handled by the libfindrtp C library paired with the NetFilter libipq C library for tracking and hooking packets. Both libraries were evaluated during this research effort's initial requirements phase and were deemed fit for purpose. 4.5) Media Gateway Audio Modifications 4.5.1) Audio Codec Conversion Due to the nature of VoIP, it is not always possible to detect whether or not an audio session such as RTP is terminating at the actual recipient of the call audio or at an intermediary. As such, it is not possible to reliably transmit stego-medium from end to end unless the actual network addresses of each endpoint are known. Due to this limitation, the SteganRTP reference implementation assumes that there are no intermediary devices along the media path making changes to the RTP payload. The reference implementation makes this assumption by also assuming that the sending and receiving applications are either running on the same hosts as the RTP endpoint applications or are along the network path between the two visible RTP endpoints which may or may not be intermediaries. The reference implementation requires that these endpoint network addresses are specified by the user or identified by the RTP session identification component. 4.6) Mid-session Audio Codec Change The SteganRTP reference implementation's embedding component addresses the issue of mid-session audio Codec change by determining the audio sample word size dynamically based on the Codec value supplied by the RTP packet's header. Thus, the embedding system's parameters are derived from each individual RTP packet that will be embedded into as cover-medium. If the RTP session were to change Codecs mid-session, or even to change Codecs for every other packet, the embedding system will only operate on RTP packets who's payloads are encoded with a Codec that the embedding system recognizes and has parameters defined for. If the embedding system does not recognize and support a particular packet's Codec, that packet is passed unmodified. 5) Conclusion 5.1) Design Goals It is the author's belief that all of the design goals set forth in Section 3.1 for the SteganRTP reference implementation were met. The primary goal of steganography, establishment of a full-duplex communications channel, compensation for the unreliable transport mechanism, identical user experience regardless of mode of operation, and multi-type data transfer were all accomplished. 5.2) Identified Challenges It is the author's belief that all but two of the identified problems and challenges identified in Section 2.3 were fully addressed. The two challenges that were not addressed were the various types of media gateway audio modifications outlined in Section 2.3.6 due to scope and the issue of compressed audio outlined in Section 2.3.5 due to time limitations of the research effort. 5.3) Secure Real-time Transfer Protocol It is important to note that use of the Secure Real-time Transfer Protocol (SRTP) RTP profile may prevent specific operational scenarios such as the active MITM scenario described in Section 3.2.2. Encrypting various parts of the RTP header and RTP payload will prevent invasive modification of the payload by an external entity to the RTP session. SRTP, however, won't protect against steganographic embedding of message data prior to the application of the SRTP encryption methods, such as may be performed within the RTP endpoint application itself. 5.4) Future Research It is the author's intention to continue this research effort at a later time. The identified areas for continued research include: 1. Replacement of the generalized LSB embedding system with Codec specific embedding algorithms. Utilizing Codec-specific properties, more intelligent embedding methods such as the inclusion of silence and voice detection can be performed as well as a wider variety of Codecs can be supported. 2. Creation of embedding algorithms for video Codecs. 3. Replacement of the XOR obfuscation system with real encryption. 4. Addition of support for fragmentation of larger formatted messages across multiple RTP packet payload cover-mediums. 5. Expansion of the shell service functionality into a more generalized services framework. References [1] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, R. Sparks, M. Handley, and E. Schooler. Sip: Session initiation protocol. RFC 3261, Internet Society (IETF), June 2002. [2] Wikipedia. H.323 ¿ wikipedia, the free encyclopedia. http:// en.wikipedia.org/w/index.php?title=H.323&oldid=146577248, 2007. [Online; accessed 2-September-2007]. [3] Wikipedia. Skinny client control protocol ¿ wikipedia, the free encyclopedia. http://en.wikipedia.org/w/index.php?title=Skinny Client Control Protocol&oldid=133621770, 2007. [Online; accessed 2- September-2007]. [4] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. Rtp: A transport protocol for real-time applications. RFC 1889, Internet Society (IETF), January 1996. [5] J. Postel. User datagram protocol. RFC 768, Internet Society (IETF), August 1980. [6] M. Baugher, D. McGrew, M. Naslund, E. Carrara, and K. Norrman. The secure real-time transport protocol (srtp). RFC 3711, Internet Society (IETF), March 2004. [7] Mehdi Kharrazi, Husrev T. Sencar, and Nasir Memon. Image steganography: Concepts and practice. Lecture Notes Series, Institute for Mathematical Sci- ences, National University of Singapore, 2004. [8] Huaiqing Wang and Shuozhong Wang. Cyber warfare: steganography vs. steganalysis. Commun. ACM, 47(10):76¿82, 2004. [9] Tayana Morkel, Jan H P Elo¿, and Martin S Olivier. An overview of im- age steganography. In Proceedings of the Fifth Annual Information Security South Africa Conference (ISSA2005), Sandton, South Africa, June/July 2005. Published electronically. [10] Unknown. S-tools 4.0. ftp://ftp.funet.fi/pub/crypt/mirrors/idea. sec.dsi.unimi.it/code/s-tools4.zip, August 2006. [11] Fabien A. P. Petitcolas. mp3stego. http://www.petitcolas.net/fabien/ steganography/mp3stego/, June 2006. [12] Heinz Repp. Hide 4 pgp. http://www.rugeley.demon.co.uk/security/ hide4pgp.zip, December 1996. [13] I)ruid. An analysis of voip steganography re- search e¿orts. http://druid.caughq.org/papers/ An-Analysis-of-VoIP-Steganography-Research-Efforts.pdf, September 2007. [14] I)ruid. Real-time steganography with rtp. http://druid.caughq.org/ presentations/Real-time-Steganography-with-RTP.pdf, August 2007. [15] Defcon 15. http://www.defcon.org/html/defcon-15/dc-15-schedule. html, August 2007. [16] T. Takahashi and W. Lee. An assessment of voip covert channel threats. http: //voipcc.gtisc.gatech.edu/download/securecomm.pdf, July 2007. [17] Wikipedia. G.711 ¿ wikipedia, the free encyclopedia. http:// en.wikipedia.org/w/index.php?title=G.711&oldid=151887535, 2007. [Online; accessed 6-September-2007]. [18] Wikipedia. Least signi¿cant bit ¿ wikipedia, the free en- cyclopedia. http://en.wikipedia.org/w/index.php?title= Least significant bit&oldid=150766150, 2007. [Online; accessed 6-September-2007]. [19] Voip foro - codecs. http://www.voipforo.com/en/codec/codecs.php, 2007. [Online; accessed 5-September-2007]. [20] I)ruid. Steganrtp. http://sourceforge.net/projects/steganrtp/, Au- gust 2007. [21] D. Eastlake 3rd and P. Jones. Us secure hash algorithm 1 (sha1). RFC 3174, Internet Society (IETF), September 2001. [22] I)ruid. lib¿ndrtp. http://sourceforge.net/projects/libfindrtp/, February 2007. [23] Net¿lter. http://www.netfilter.org/, 2007. [Online; accessed 6- September-2007]. [24] Wikipedia. Type-length-value ¿ wikipedia, the free encyclopedia. http: //en.wikipedia.org/w/index.php?title=Type-length-value&oldid= 128880452, 2007. [Online; accessed 3-September-2007]. [25] Bob Jenkins. Net¿lter. http://www.burtleburtle.net/bob/c/lookup3.c, May 2006. [Online; accessed 6-September-2007]. [26] Wikipedia. Exclusive or ¿ wikipedia, the free encyclopedia. http://en. wikipedia.org/w/index.php?title=Exclusive or&oldid=152332544, 2007. [Online; accessed 5-September-2007].