Real-time Steganography with RTP
September, 2007
I)ruid, C²ISSP
druid@caughq.org 
http://druid.caughq.org
   
Abstract: Real-time Transfer Protocol (RTP) is used by nearly all
Voice-over-IP systems to provide the audio channel for calls.  As such, it
provides ample opportunity for the creation of a covert communication channel
due to its very nature.  While use of steganographic techniques with various
audio cover-medium has been extensively researched, most applications of such
have been limited to audio cover-medium of a static nature such as WAV or MP3
file audio data.  This paper details a common technique for the use of
steganography with audio data cover-medium, outlines the problem issues that
arise when attempting to use such techniques to establish a full-duplex
communications channel within audio data transmitted via an unreliable
streaming protocol, and documents solutions to these problems.  An
implementation of the ideas discussed entitled SteganRTP is included in the
reference materials.

1) Introduction

This paper describes a research effort within the disciplines of
steganography, Internet telephony, and data communications.  

1.1) Overview

This paper is structured in the following order:  The first chapter provides
an introduction, describes the motivation for this research, and covers some
basic concepts and terminology for the subjects of Voice over IP (VoIP),
Real-time Transport Protocol (RTP), Steganography, and, more specifically, the
use of steganography with an audio cover-medium.  The second chapter defines
the concept of real-time steganography, discusses using steganography with
RTP, and describes some of the identified problems and challenges.  The third
chapter details the reference implementation entitled SteganRTP including a
description of the project's goals, the implementation's operational
architecture, process flow, message data structure, and functional
sub-systems.  The fourth chapter addresses the identified problems and
challenges that were met and describes how they were solved.  The fifth and
final chapter concludes the paper with observations made as a result of this
research effort.

1.2) Voice over IP

The term Voice over IP (VoIP) is nearly synonymous with Internet Telephony.
The majority of VoIP systems are designed to utilize separate signaling and
media channels to provide calling services to users.  The signaling channel is
generally used to set-up, manage, and tear-down calls between two or more
parties whereas the media channel is used to transmit the audio, video, or
other media that may be associated with the call.  A number of competing
protocol standards exist for use as the VoIP system's signaling channel which
include Session Initiation Protocol (SIP)[1], H.323[2], Skinny[3], and many others.
Real-time Transport Protocol (RTP)[4], however, is used almost ubiquitously to
provide VoIP systems with the required media channel.

1.3) Real-time Transport Protocol

Real-time Transport Protocol (RTP) is described by the protocol authors as ``a
transport protocol for real-time applications.''  RTP provides an end-to-end
network transport suitable for applications transmitting real-time data such
as audio, video or any other type of streamed data.  RTP generally utilizes
the User Datagram Protocol (UDP)[5] for its transport and can do so in both
multicast or unicast network environments.  When employed by a VoIP system,
RTP generally handles the media channel of a call.  The call's media channel
is generally handled independent of the VoIP signaling channel.  However, per
the RTP specification, there are no default network ports defined.  As such,
the RTP endpoint network ports must be negotiated between the endpoints via
the signaling channel.  Other events in the signaling channel may also
influence the operation of the media channel as handled by RTP such as
requests to change audio encoding, add or remove parties from the call, or
tear down the call.

One of RTP's current deficiencies is that it is entirely clear-text while
traversing the network.  An RTP profile has been defined for encrypting parts
of the RTP data packet called Secure Real-time Transport Protocol (SRTP)[6].
However, the specification defines no mechanism for negotiating or securely
exchanging keying information to be used for the encryption and decryption
processes.  At the time of this writing, a number of keying mechanisms have
been defined but no standard has either been agreed upon by the standards
bodies or determined by the free market.  As such, most implementations of RTP
do not currently use the SRTP profile and instead continue to transmit call
media data in the clear.  As will be detailed in full in Section 3.2, this
property of the media channel provides ample opportunity for multiple types
of operational scenarios where unknown third-parties to the legitimate
callers may hijack all or part of the call's media traffic for transmission
of covert communications.  Making use of this blatantly insecure property
of RTP is the primary motivation for this research effort.

1.4) Steganography

The term steganography originates from the Greek root words ``steganos'' and
``graphein'' which literally mean ``covered writing''.  As a sub-discipline of
the academic discipline of information hiding, the primary goal of
steganography is to hide the fact that communication is taking place[7, 8, 9] by
concealing a message within a cover-medium in such a way that an observer can
not discern the presence of the hidden message.

Conversely, steganalysis is the act of attempting to detect a concealed
message which was hidden via the use of steganographic techniques[8], thus
preventing a steganographer from achieving their primary goal.  Common
steganalysis techniques include statistical analysis of the properties of
potential stego-medium, statistical analysis of extracted potential message
data for properties of language, and many others such as specific techniques
that target known steganographic embedding methods.

1.4.1) Terminology

The following terminology as used in the discipline of steganography and
steganalysis has been set forth over many years of compounding research[7, 8,
9].  As such, the following terminology will be used consistently within this
research paper:

  1. Cover-medium - Data within which a message is to be hidden.
  2. Stego-medium - Data within which a message has been hidden.
  3. Message - Data that is or will be hidden within a stego-medium or
     cover-medium, respectively.
  4. Redundant Bits - Bits of data in a cover-medium that can be modified
     without compromising that medium's integrity.

1.4.2) Digitally Embedding

Digitally embedding a message into a cover-medium usually involves three basic
steps.  First, the redundant bits of the target cover-medium must be
identified.  Second, it must be decided which of the identified redundant bits
are to be utilized.  Finally, the bits selected for use must be modified to
store the message data.  In many cases, a cover-medium's redundant bits are
likely to be the least-significant bit or bits of each of the encoded data's
word values.

1.5) Steganography With Audio

Media formats in general, and audio formats specifically, tend to be very
inaccurate data formats simply because they do not need to be accurate; the
human ear is not very adept at differentiating sounds.  As an example, an
orchestra performance which is recorded with two separate recording devices
will produce vastly different recordings when viewed digitally, but will
generally sound the same when played back if they were recorded in a similar
manner.  Due to this inherent inaccuracy, changes to an audio bit-stream can
be made so slightly that when played back the human ear won't be able to
distinguish the difference between the cover-medium audio and the stego-medium
audio.

With many audio formats, the least-significant bit from each audio sample can
be used as the medium's redundant bits for the embedding of message data.  To
illustrate, assume that an audio file encoded with an 8-bit sample encoding
has the following 8 bytes of data in it, which will be used as cover-data:

0xb4 0xe5 0x8b 0xac 0xd1 0x97 0x15 0x68

In binary this would result in the following bit-stream:

10110100 11100101 10001011 10101100 11010001 10010111 00010101 01101000

In order to hide the message byte value 0xd6, or 11010110 in binary, each
sample word's least-significant bit would be modified to represent all 8 bits
of the message byte:

10110101 11100101 10001010 10101101 11010000 10010111 00010101 01101000

The modifications result in the following 8 bytes of stego-data:

0xb5 0xe5 0x8a 0xad 0xd0 0x97 0x15 0x68

When compared to the original 8 bytes of cover-data, it is noticeable that on
average only half of the bytes of data have actually changed value, however
the resulting stego-data's least-significant bits contain the entire message
byte.  It is also noticeable that when utilizing this embedding method with a
cover-medium with these word size properties, the cover-medium must be at
least eight times the size of the message in order to successfully embed the
entire message.

1.5.1) Previous Research

Audio Steganography

Much research has been done in the field of steganography utilizing an audio
cover-medium.  Techniques such as using audio to convey messages in both the
human audible and inaudible spectrum as well as various methods for the
digital embedding of information into the audio data itself have all been
explored; so much in fact that many methods are now considered standard.  Many
of the most recent implementations cannot be considered to advance the state
of research in the area as they generally only implement the standard methods.

It is important to note that the significant majority of previous research in
the sub-discipline of audio steganography, however, has focused on static,
unchanging audio data files.  Tools such as S-Tools[10], MP3Stego[11], Hide 4
PGP[12], and many others, are just such implementations, employing standard
embedding methods with WAV, MP3, and VOC audio file cover-mediums,
respectively.  Very few practical implementations have been developed that
utilize audio steganography with a cover-medium that is in a flux state or
within streaming or real-time media sessions.

VoIP Steganography

A few previous research efforts have been made to employ steganography with
various VoIP technologies.  A complete analysis of such efforts identified
prior to embarking upon the research presented in this paper has previously
been provided[13].  In summary, most identified research efforts were utilizing
steganographic techniques but not achieving the primary goal of steganography
or otherwise employing steganographic techniques to accomplish an otherwise
overt goal.

2) Real-time Steganography

This paper defines ``real-time'' use of steganography as the utilization of
steganographic techniques to embed message data within an active, or
real-time, media stream.  The research and reference implementation presented
herein focuses on VoIP call audio as the active media stream being targeted as
cover-medium.

Nearly all uses of steganography targeting audio cover-medium in general, or
VoIP cover-medium specifically, that were evaluated prior to performing this
research were found to operate on a target cover-medium as a storage channel
and provided separate ``hide'' and ``retrieve'' modes.  In addition, most
cover-medium that were targeted by such implementations were of a static
nature such as WAV or MP3 files or were unidirectional such as streaming
stego-audio to a recipient.

A few weeks prior to the research contained herein being initially presented[14]
at the DEFCON 15[15] hacker conference on August 3rd through 5th 2007, another use
of steganography in a real-time fashion was made public via a research effort
entitled Vo(2)IP[16].  An analysis of this research effort and its deficiencies has
been included in an updated version of the previously mentioned analysis
paper[13].

2.1) Context Terminology

The disciplines of steganography and data networking share some common
terminology which have different meanings relative to each discipline.  This
paper discusses research that lies within the realm of both disciplines, and
as such will use terms that may be confusing when taken out of context.  The
following terms are defined here and used consistently without to prevent
confusion when interpreting the content of this paper.

  1. Packet - Used in the data networking sense; A data packet which is routed
     through a network, such as an IP/UDP/RTP packet.  
  2. Message - Used in the steganography sense; Data to be hidden or retrieved.

2.2) RTP Payload Redundant Bits

RTP packet payloads are essentially encoded multimedia data.  RTP payloads may
contain any type of multimedia data.  However, this research effort focused
entirely on audio.  Specifically, audio encoded with the G.711 Codec[17].  Any
number of audio Codecs can be used to encode the RTP payload, the identifier
of which is included in the RTP packet's header as the payload type (PT)
field.

The frequency, locations, and number of redundant bits found within the RTP
packet's encoded payload are determined by the Codec that is used to encode
the audio transmitted by an individual packet.  The Codec focused on during
this research, G.711, uses a 1-byte sample encoding and is generally resilient
to modifications to the least significant bit (LSB)[18] of each sample.  Codecs
with larger samples may provide for one or more bits per sample to be modified
without any discernible audible change in the encoded audio, which is defined
as the audio's audible integrity.

2.2.1) Audio Word Size

The data value word size, or sample size in audio terminology, used by various
audio encoding formats is one factor in determining the amount of available
space within the cover-medium for embedding a message.  Generally only the
least significant bit of each word value can be expected to be modifiable
without any perceptible impact to audible integrity.  Thus, only half the
amount of available space in an audio cover-medium encoded in a format with a
16-bit word size will be available in comparison with a cover-medium with an
8-bit word size.

2.2.2) Common VoIP Audio Codecs

For reference, some common VoIP audio Codecs and their encoding and sample
properties[19] are listed in the table below.

    +------------+----------+-------------+-------------+------------+
    | Codec      | Standard | Bit Rate    | Sample Rate | FrameSize  |
    |            | by       | (kb/s)      | (kHz)       | (ms)       |
    +------------+----------+-------------+-------------+------------+
    | G.711      | ITU-T    | 64          | 8           | Sampling   |
    | G.721      | ITU-T    | 32          | 8           | Sampling   |
    | G.722      | ITU-T    | 64          | 16          | Sampling   |
    | G.722.1    | ITU-T    | 24/32       | 16          | 20         |
    | G.723      | ITU-T    | 24/40       | 8           | Sampling   |
    | G.723.1    | ITU-T    | 5.6/6.3     | 8           | 30         |
    | G.726      | ITU-T    | 16/24/32/40 | 8           | Sampling   |
    | G.727      | ITU-T    | variable    |             | Sampling   |
    | G.728      | ITU-T    | 16          | 8           | 2.5        |
    | G.729      | ITU-T    | 8           | 8           | 10         |
    | GSM 06.10  | ETSI     | 13          | 8           | 22.5       |
    | LPC10      | U.S. Gov | 2.4         | 8           | 22.5       |
    | Speex (NB) |          | 8, 16, 32   | 2.15 - 24.6 | 30         |
    | Speex (WB) |          | 8, 16, 32   | 4 - 44.2    | 34         |
    | iLBC       |          | 8           | 13.3        | 30         |
    | DoD CELP   | U.S. DoD | 4.8         |             | 30         |
    | EVRC       | 3GPP2    | 9.6/4.8/1.2 | 8           | 20         |
    | DVI        | IMA      | 32          | Variable    | Sampling   |
    | L16        |          | 128         | Variable    | Sampling   |
    +------------+----------+-------------+-------------+------------+
                   Common VoIP Audio Codecs

2.2.3) G.711 (alaw/ulaw)

The G.711 audio Codec is a fairly straight-forward sample-based encoding.  It
encodes audio as a linear grouping of 8-bit audio samples arranged in the
order in which they were sampled.

Throughput

Utilizing the LSB of every sample in a G.711 encoded RTP payload, which is
commonly of 160 bytes in size, a total of 20 bytes of message data can be
successfully embedded.  Given an average of 50 packets per second
unidirectional, this results in approximately 1,000 bytes of full-duplex
throughput of message data within the established covert channel.

2.3) Identified Problems and Challenges

Many problems and challenges that arise when considering the use of
steganography with RTP stem from properties of the underlying transport
mechanism, the nature of real-time audio, or the RTP protocol itself.  The
following sections outline various problems and challenges that were
identified when attempting to use steganography with RTP.

2.3.1) Unreliable Transport

One of the most significant challenges to utilizing RTP packet payloads as
cover-medium is that RTP generally employs UDP as its underlying transport
protocol.  This is appropriate for a streaming multimedia protocol, however it
is less than ideal for a reliable covert communications channel.  UDP is a
datagram messaging protocol which is considered connectionless and unreliable[5].
As such, each packet's successful delivery and order of arrival is not
guaranteed.  Any message data which is split across multiple RTP cover-packets
may arrive out of order or not arrive at all.

2.3.2) Cover-Medium Size Limitations

The RTP protocol, being designed for ``real-time'' transport of media, behaves
like a streaming protocol should.  RTP datagram packets are relatively small
and there are usually tens to hundreds of packets sent per second in the
process of relaying audio between two peers.  Additionally, different audio
Codecs provide for different encoded audio sample sizes, resulting in a
variable amount of available space for embedding which is dependent upon which
Codec the audio for any individual RTP packet is encoded with.  Due to the
small size of these packets and the common constraint among many
steganographic embedding methods which limits the amount of data that is able
to be embedded to a fraction of the size of the cover-medium, a very limited
amount of space is actually available for the embedding of message data.  As
such, large message data will inevitably be required to be split across
multiple cover-packets and thus must be reassembled at its destination.

2.3.3) Latency

RTP is, by design, extremely susceptible to media degradation due to packet
latency.  As such, any processing overhead from the embedding of message data
into the cover-medium or delay due to inspection of potential cover-medium
packets may have a noticeable impact on the end-user's quality of experience.
When manipulating an RTP stream between two endpoints that are expecting
packet delivery in a timely manner, a steganographic system cannot be overly
invasive when packets are not needed for embedding and must be efficient at
its task when they are.

2.3.4) Tracking of RTP Streams

In normal operation, RTP establishes two packet streams to form a session
between two endpoints.  Each endpoint uses one stream to send multimedia data
to the other, thus achieving full-duplex communication via two unidirectional
packet streams.  When identifying an RTP session to be utilized as
cover-medium for a full-duplex covert communications channel, the two paired
streams must be correctly identified and tracked.

2.3.5) Raw vs. Compressed Audio

It is important to consider that audio being transported via RTP may be
compressed.  To successfully embed message data into a cover-medium, it is
generally required that it is performed against the raw data so as to properly
identify and utilize the cover-medium's redundant bits.  As such,
identification of compressed cover-medium, decompression, modification of the
raw data, and then re-compression may be required.

Lossy vs. Lossless Compression

When considering the potential use of compression within the cover-medium, it
is also important to consider the type of compression used.  Most compression
methods can be categorized into two types; lossy compression and lossless
compression.

If the compression method used is of the lossy type, the integrity of any
message data embedded into the cover-medium prior to compression may be
compromised when the stego-medium is uncompressed as some of the original
audio data may be lost.  Due to this property of lossy compression types,
audio data compressed in this manner may not be appropriate for use as
cover-medium without additional safeguards against this loss.

2.3.6) Media Gateway Audio Modifications

RTP, as a protocol being potentially routed across multiple networks by its
underlying transport, network, and data-link protocols, may also be routed or
gatewayed along its path by other intermediary telephony devices like Media
Gateways or Back-to-Back User Agent (B2BUA) devices.  At such transition
points, the media being transported may undergo potential modification.  Some
of these modifications include translation from one audio Codec to another,
down-sampling, normalization, or mixing with other audio streams.  Invasive
changes such as these can potentially impact the integrity of any message data
embedded within the stego-medium.

Audio Codec Conversion

Codec conversion takes place when an intermediary device such as a Media
Gateway is providing translation services for two endpoints that support
disparate sets of Codecs.  For example, one endpoint may support GSM encoding
of audio and the other only G.711 or Speex encoding.  Unless an intermediary
translator is involved, these two devices cannot directly establish an RTP
audio channel.  The intermediary device essentially translates audio from the
Codec being used by one endpoint to a Codec that can be understood by the
other. Audio Codec conversion may also take place if the inherent latency or
Quality-of-Service (QoS) properties of the transport network on either side of
the intermediary device requires a lighter-weight Codec.

Down-sampling and Normalization

Down-sampling and normalization may be performed on an audio payload to bring
the properties of the audio such as volume and background white-noise more in
line with the other party's audio stream.  Occasionally this task is handled
by the endpoint devices when playing the media for the user.  In that scenario
the integrity of the stego-medium will likely remain intact as the audio
payload isn't actually modified in transit.  However, there are scenarios
where an intermediary media device may actually re-sample or otherwise modify
the payload of the media stream specifically to alter its audible properties.
In these cases, the integrity of the stego-medium may become compromised.

Audio Stream Mixing

When performing conferencing or other types of multi-party calls, it is
possible that multiple party's audio streams may be mixed together.  Such
invasive modification of the audio will almost certainly compromise the
integrity of the stego-medium.

2.3.7) Mid-session Audio Codec Change

Most VoIP signaling protocols provide methods for VoIP endpoints to change the
audio encoding method on the fly.  Due to this functionality an RTP session
may begin using one Codec and then switch to a completely different Codec
mid-session.  This functionality may be used for a variety of reasons
including QoS metrics not being met, inclusion of a new endpoint in the call
that does not support the original Codec, or any number of other reasons.  Due
to this dynamic nature, any steganographic system attempting to embed data
into an RTP stream's packets must be able to dynamically adjust its message
embedding algorithm to accommodate different Codecs' various sample sizes and
layout within the RTP packet payload.

3) Reference Implementation: SteganRTP

3.1) Design Goals

The goals set forth for the SteganRTP reference implementation[20] are described
in the following subsections.

3.1.1) Achieve Steganography

As stated in Section 1.4, the primary goal of steganography is to hide the fact
that communication is taking place.  Therefore, it is the primary goal of this
reference implementation to prevent indication to a third-party observer of
the RTP audio stream that anything other than the overt communication between
the two RTP endpoints is taking place.

3.1.2) Full-Duplex Communications Channel

This reference implementation intends to achieve a full-duplex covert
communication channel between the two RTP endpoints, mirroring the utility of
RTP itself.  This will be accomplished through the use of both RTP streams
that comprise an RTP session.  By utilizing both RTP streams within the
session, either application will be able to both send and receive data
simultaneously.

3.1.3) Compensate for Unreliable Transport

This reference implementation intends to compensate for the unreliable
transport inherent to RTP.  This will be accomplished by providing a data
sequencing, tracking, and resending mechanism.

3.1.4) Identical User Experience Regardless of Mode of Operation

This reference implementation intends to provide two distinct modes of
operation.  The first mode of operation is described as the SteganRTP
application running locally on the same host as the RTP endpoint.  The second
mode of operation is described as the SteganRTP application running on an
intermediary host along the route from one RTP endpoint to another.  This
intermediary host must be forwarding or bridging the RTP traffic as an active
man-in-the-middle (MITM).  The reference implementation intends for the user
experience of running the SteganRTP application to be identical regardless of
the mode of operation.  This will be accomplished by interfacing directly with
the host operating system's network stack in order to hook the desired packet
streams.

3.1.5) Multi-type Data Transfer

The reference implementation intends to provide simultaneous transfer of
multiple types of data, such as text chat, file transfer, and remote shell
access.  This will be accomplished by providing type indication and formatting
for each type of supported data being transferred.

3.2) Operational Architecture

As mentioned in Section 3.1.4 above, the application will operate in one of two
distinct modes: the application running locally on the same host as the RTP
endpoint or the application running as an active MITM .  It is not intended
that the two SteganRTP applications which are communicating be operating in
the same mode.  Thus, a mixed-mode operation such as is described
below is entirely possible.

It is important to note that the SteganRTP application is only required to be
bridging or forwarding the RTP stream considered outbound from the closer RTP
endpoint destined for the more remote RTP endpoint.  Conversely, the
application is only required to be able to observe the inbound RTP stream
flowing in the other direction as it does not need to invasively modify any
packets from the inbound stream.

3.3) Application Flow

When the SteganRTP application begins it performs an initialization phase by
setting up internal memory structures and configuration information from the
command-line.  Next, it observes network traffic until it identifies an RTP
session which falls within the constraints specified by the user on the
command-line.  These constraints are how the user controls selection of the
RTP sessions between specific RTP endpoints to utilize as cover-medium and, by
virtue, which remote SteganRTP application to communicate with.  After
identifying an RTP session, SteganRTP inserts hooks into the host's network
stack in order to receive the desired packets upon transmission or arrival, or
both if the SteganRTP application is operating in the active MITM scenario.
From these hooks a packet queue is created which the application then reads
individual packets from.  Whether the packet is considered inbound or outbound
determines the further course of the application.  Whether a packet is
considered inbound or outbound is determined by which RTP endpoint network
address and port is defined as ``local'' or ``remote'', which in the case of
the active MITM operation can be inferred as ``near'' or ``far'',
respectively.

When an inbound RTP packet is read from the queue, it is copied for the
application's use and the original packet is immediately sent as the SteganRTP
application does not need to invasively modify it.  All received inbound
packets are assumed to be potential cover-medium for the covert channel, so
potential message data is then extracted from each inbound packet.  The
potential message data is then decrypted, and the result is checked for a
valid checksum value in the potential message's header.  If the checksum is
valid, the message data is sent to the message handler component for
processing.

When an outbound RTP packet is read from the queue, the SteganRTP application
immediately polls its outbound data queues for any message data waiting to be
sent.  If there is no data waiting to be sent, the packet is immediately sent
unmodified.  If there is message data waiting to be sent, as much of that data
as will fit into the cover-medium packet's payload is read from its file
descriptor, packaged as a formatted message, encrypted, and then
steganographically embedded into the RTP packet's payload.  The modified RTP
packet is then sent in place of the original RTP packet.

3.3.1) Initialization

Upon start-up, SteganRTP first initializes various memory structures such as
message caches, configuration settings, and an RTP session context structure.

The most notable task performed during the initialization phase is the
computation of keying information used by various components.  The method
chosen for creation of this keying information is to create a 20-byte SHA-1[21]
hash of a user-supplied shared secret text string.  Due to the result of this
operation being used as keying information by various components of the
overall SteganRTP system, this shared secret must be provided to both
SteganRTP applications that wish to communicate with each other.

The 20-byte result of the SHA-1 hash function against the user-supplied shared
secret is defined here as the keyhash and described by Equation  below where f
represents the SHA-1 hash function.

keyhash = f( sharedsecret )

SHA-1 Collision Irrelevance

In February of 2005, a group of Chinese researchers developed an algorithm for
finding SHA-1 hash collisions faster than brute force.  They proved it
possible to find collisions in the full 80-step SHA-1 in less than 269 hash
operations, about 2,000 times faster than brute force of the 280 hash
operation theoretical bound.  The paper also includes search attacks for
finding collisions in the 58-step SHA-1 in 233 hash operations and SHA-0 in
239 hash operations.  The biggest impact that this discovery has pertains to
use of SHA-1 hashes in digital signatures and technologies where one of the
pre-images is known.  By searching for a second pre-image which hashes to the
same value as the original, a digital signature for the original may
theoretically be used to authenticate a forgery.

The use of SHA-1 by the SteganRTP reference implementation is solely to
compute a bit-pad of keying information with a longer, seemingly more random
bit distribution than what is likely provided directly by user input as the
shared secret.  The result of the SHA-1 hash of the user's shared secret is
used directly as keying information.  In order to launch a collision attack
against the hash used as the bit-pad, the attacker would have to either obtain
the original user-supplied shared secret or the hash itself.  Due to the hash
being used directly as keying information, the possession of it by an attacker
has already compromised the security of the data being obfuscated with it;
computing one or more additional pre-images which hash to a collision provides
no additional value for the attacker.

3.3.2) RTP Session Identification

RTP session identification is performed using libfindrtp.  libfindrtp is a C
library that identifies sessions between two endpoints by observing VoIP
signaling traffic and watching for call set-up.  Constraints can be passed to
the library to limit session identification to a single endpoint, specific
multiple endpoints, or even specific multiple endpoints using specific UDP
ports.  These constraints are passed through to libfindrtp from the input
provided to the SteganRTP application via the command-line.  At the time of
this writing, libfindrtp supports session identification via the Session
Initiation Protocol (SIP)[1] and Cisco Skinny Call Control Protocol (SCCP)[3]
VoIP signaling protocols.

3.3.3) Hooking Packets

The SteganRTP application makes use of NetFilter[23] hook points in order to
receive both inbound and outbound RTP session packets.  The Linux kernel is
instructed to pass specific packets to an application by inserting an iptables
rule describing the packets with a target of QUEUE.  Packets which match a
rule with a target of QUEUE are queued to be read by a registered NetFilter
user-space queuing agent.  Access to this queue is provided to the SteganRTP
application via an API provided by the NetFilter C library libipq.  An
iptables rule used to hook packets via this interface may be inserted at any
of the NetFilter hook points.

For the most beneficial use by the SteganRTP application, packets must be
hooked at points where their integrity as stego-medium is maintained.  Thus,
inbound packets are hooked at the PRE-ROUTING hook point and outbound packets
are hooked at the POST-ROUTING hook point.  In this manner, incoming packets
are able to be processed by the SteganRTP application prior to any potential
modification by the local system and outbound packets are able to be modified
by SteganRTP after the local system is essentially finished with them.

SteganRTP registers itself as a user-space queuing agent for NetFilter via
libipq.  SteganRTP then creates two iptables rules in the NetFilter engine
with targets of QUEUE.  The first rule matches the inbound RTP stream at the
PRE-ROUTING hook point.  The second rule matches the outbound RTP stream at
the POST-ROUTING hook point.

3.3.4) Reading Packets

Using the packet hooks described in the previous section, SteganRTP is then
able to read packets from the provided packet queue, determine if they are
considered inbound or outbound packets, and pass them to the appropriate
processing functions.  The processing functions may then analyze them, modify
them if needed, place modified versions back into the queue in place of the
original, and instruct the queue to accept the packet for further routing.

3.3.5) Inbound Processing

As outlined above, the basic steps for inbound packet processing are as
follows:

  1. Immediately accept the packet for routing.
  2. Extract potential message data.
  3. Decrypt potential message data.
  4. Verify the potential message header's checksum.
  5. Send valid messages to the message handler.

3.3.6) Outbound Processing

As outlined above, the basic steps for outbound packet processing are as
follows:

  1. Poll for message data waiting to be sent.
  2. If there is no message data waiting, immediately send the packet and return.
  3. Create a new formatted message with header based on the properties of the
     RTP packet who's payload is being used as cover-medium.
  4. Read as much of the waiting data as will fit in the formatted message.
  5. Encrypt the message.
  6. Embed the message into the RTP payload cover-medium.
  7. Send the modified RTP packet in place of the original via the NetFilter
     user-space queue.

3.3.7) Session Timeout

In the event that no RTP packets are available in the NetFilter queue for a
period of time, all session information is dropped and process flow returns to
the RTP session identification phase to locate a new session for use.

In the event that RTP packets are being received but no valid messages have
been received for a period of time, the SteganRTP application attempts to
solicit a response from the remote application.  If these solicitations have
failed by the timeout period, all session information is dropped and process
flow returns to the RTP session identification phase to locate a new session
for use.

3.4) Communication Protocol Specification

The SteganRTP communication protocol makes use of formatted messages which are
steganographically embedded into the payloads of individual RTP packets.  This
steganographic embedding creates the covert channel within which the
communication protocol described in the following sections operates.

3.4.1) The cover medium: RTP Packet

Below, reproduced verbatim from the RTP specification[4], describes the
RTP packet header.  Of special interest are the payload type (PT), sequence
number, and timestamp fields, all of which will become relevant when building,
encrypting, and steganographically embedding the message data into the
packet's payload.  The remainder of the packet contains an optional number of
header extensions which are irrelevant to the SteganRTP communication
protocol, and finally the encoded media data, otherwise known as the RTP
packet's payload, which will be utilized by SteganRTP as cover-medium.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X|  CC   |M|     PT      |       sequence number         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           timestamp                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           synchronization source (SSRC) identifier            |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|            contributing source (CSRC) identifiers             |
|                             ....                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The 7-bit payload type field indicates the audio Codec used to encode the
payload.  The 16-bit sequence number field is a standard incrementing sequence
number.  The 32-bit timestamp field describes the sampling instant of the
first sample in the payload, and the remaining packet data is the audio
payload as encoded by the indicated Codec.

3.4.2) Message Format

The format of the messages that the SteganRTP applications use to communicate
with each other is described in the following sections.  Below
describes the core message format of all types of SteganRTP formatted
messages.  This format consists of two fields, the Checksum / ID and Sequence
fields followed by a standard Type-Length-Value (TLV)[24] structure.  The Checksum
/ ID, Sequence, Type, and Length fields comprise the message header, while the
Value field is considered the message body, or payload.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          Checksum / ID                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          Sequence             |      Type     |    Length     | 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                   Value (Type-Defined Body)                   |
!                                                               !
.                                                               .

The 32-bit Checksum / ID field contains a hash value which is used to identify
whether or not a potential message that is extracted from the payload of an
inbound RTP packet is indeed a valid SteganRTP communication protocol message.
The hashword[25] function is used to compute this hash.  The function's two
primary operands consist of the keying information defined as keyhash in
Section 3.3.1 and the sum of the message's Sequence, Type, and Length header
fields.  This value is defined as checksumid and is described by Equation
below.


checksumid = hashword( keyhash, (Sequence + Type + Length) )


The verification of extracted potential messages is required due to the fact
that some packets in the inbound RTP stream may not contain SteganRTP messages
if there was no outbound data waiting to be sent by the remote application
when the RTP packet in question traversed it.  The hash function used to
compute this checksum value incorporates the keyhash so as not to be
computable solely from message data, which would allow an observer to also
verify that a message is embedded within the RTP payload.

The 16-bit Sequence field is a standard incrementing sequence number, the
8-bit Type field indicates what type of message it is, and the 8-bit Length
field indicates the length, in bytes, of the Value field.  The Value field
contains the message's payload.

3.4.3) Message Types

The currently defined message types are listed in the table below.
  
      +----+---------------------+
      | ID | Type                |
      +----+---------------------+
      | 0  | Reserved            |
      | 1  | Control             |
      | 10 | Chat Data           |
      | 11 | File Data           |
      | 12 | Shell Input Data    |
      | 13 | Shell Output Data   |
      +----+---------------------+

Control Messages

Below describes the format of SteganRTP control messages.  Control
messages are used to send non-user data to the remote SteganRTP application to
convey operational information such as requesting a message resend or
indicating that a file is about to be sent and providing that file's context
information.  Control messages consist of one or more stacked TLV structures
and are not required to be 32-bit aligned.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Control Type |    Length     |            Value              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
!                                                               !
.                                                               .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Control Type |    Length     |            Value              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
!                                                               !
.                                                               .

The 8-bit Control Type field indicates the type of control data contained in
the TLV structure whereas the 8-bit Length field indicates the size, in bytes,
of the Value field.  The Value field contains the control data of the
indicated type.

Control Message Types

The currently defined control message types are listed in the table below.

         +----+---------------+
         | ID | Type          |
         +----+---------------+
         | 0  | Reserved      |
         | 1  | Echo Request  |
         | 2  | Echo Reply    |
         | 3  | Resend        |
         | 4  | Start File    |
         | 5  | End File      |
         +----+---------------+

Type 1: Echo Request

The Echo Request control message is used to prompt the remote SteganRTP
application for a response, allowing the local application making the request
to determine if the remote application is still present and communicating.
This message is sent when a session inactivity timeout limit is approaching.
The format of an Echo Request control message:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       1       |       2       |     Seq       |    Payload    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The Control Type field's value is 1, indicating that it is an Echo Request
control message, and the Length field's value is 2, indicating the 2-byte
control message payload.  The control message payload consists of an 8-bit Seq
field which contains a standard incrementing sequence number specific to Echo
Requests, and an 8-bit Echo Request Payload, which contains a random
bit-string.  The Seq value is used to correlate sent Echo Request messages
with received Echo Reply messages and the Payload field received in an Echo
Reply message must match the random bit-string sent in its corresponding Echo
Request message.

Type 2: Echo Reply

The Echo Reply control message is used to respond to the remote SteganRTP
application's Echo Request message.  The format of the Echo Reply message is
identical to the Echo Request message as described in  above, however the
Control Type field's value is 2 rather than 1.

Type 3: Resend

The Resend control message is used to request the resending of a specified
message by the remote SteganRTP application, allowing the local application to
request missing or corrupted messages.  This message is sent when the
application begins to receive messages which contain sequence numbers beyond
the next sequence number that is expected.  The format
of a Resend control message:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       3       |       2       |     Requested Seq Number      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The Control Type field's value is 3, indicating that it is a Resend control
message, and the Length field's value is 2, indicating the 2-byte control
message payload.  The control message payload consists of a 16-bit Requested
Seq Number field which indicates the sequence number of the message to be
resent.

Type 4: Start File

The Start File control message is used to indicate to the remote application
that that local application will begin sending file data for a new file
transfer.  This message is sent when the user executes the command to transfer
a file.  The format of a Start File control message:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       4       |       #       |    File ID    |               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
|                            Filename                           |
!                                                               !
.                                                               .

The Control Type field's value is 4, indicating that it is a Start File
control message, and the Length field's value is 1 plus the string length, in
bytes, of the filename of the file being sent, indicating the total size of
the control message payload.  The control message payload consists of an 8-bit
File ID field which indicates the sending application's unique ID value for
the file, and the Filename field is the name of the file being sent in ASCII.

Type 5: End File

The End File control message is used to indicate to the remote application
that that local application is finished sending file data for a particular
file transfer.  This message is sent when the local application has finished
sending all data related to the open file descriptor being used to send data
from a file.  The format of a End File control message:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       5       |       1       |    File ID    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The Control Type field's value is 5, indicating that it is a End File control
message, and the Length field's value is 1, indicating the 1-byte control
message payload.  The control message payload consists of an 8-bit File ID
field which indicates the sending application's unique ID value for the file
who's transfer is now complete.

Data Messages

Non-control messages are considered data messages and contain some form of
actual data for the user, whether it be text chat data, incoming file data, a
command for the local shell service, or a response from the remote shell
service.  These various types of data are differentiated by the value of the
message header's Type field.

Chat Data Messages

The Chat Data Message is used to transmit text chat data between SteganRTP
applications.  This type of data requires no context information, thus the
message payload contains only a single field, Chat Data.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          Chat Data                            |
!                                                               !
.                                                               .

File Data Messages

The File Data Message is used to transmit data file contents between SteganRTP
applications.  Because multiple file transfers may be in progress at any given
time, this type of data must be accompanied with context information
indicating which file transfer the chunk of data belongs to.  The format of a
File Data message:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     File ID   |                 File Data                     |
+-+-+-+-+-+-+-+-+                                               |
!                                                               !
.                                                               .

The File ID field's value is a unique file ID number chosen for the particular
file transfer taking place and is used to indicate which file transfer the
chunk of data contained in the File Data field belongs to.  The File Data
field is a chunk of data from the file being transferred.  The proper order
for reconstruction of the file chunks transferred by these messages is ensured
by the message header's sequence number.

Shell Data Messages

The Shell Input Data and Shell Output Data Messages are used to transmit shell
input to, and receive shell output from, a remote SteganRTP shell service,
respectively.  This type of data requires no context information, thus the
message payload contains only a single field, Shell Data, as described by
below.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          Shell Data                           |
!                                                               !
.                                                               .

3.5) Functional Components

3.5.1) File Descriptor Lists

Two separate file descriptor lists are maintained; destinations for inbound
data and sources of outbound data.  The data structure for storage of a file
descriptor and its data for inclusion in either list is defined
below.

/* Structure used for file descriptor information */
typedef struct file_info_t {
   u_int8_t id;
   char *name;
   u_int8_t type;
   int fd;
   struct file_info_t *next;
   struct file_info_t *prev;
} file_info;

The independence of the file descriptor lists from the outbound data polling
and message handler components provides for a flexible and versatile
environment within which to expand functionality.  In order to include new
data types for transfer, all that is required is to define a new data type ID
for both applications to correlate messages upon, open a file descriptor to
the appropriate place to read or write the data, and include the file
descriptor in the appropriate list.

Inbound File Descriptors

Inbound File Descriptors are a list of file descriptors for various
destinations that inbound data may be directed to.  The order of these file
descriptors as included in the list is irrelevant as which file descriptor
data is destined for is looked up by matching the message type and properties
with the file descriptor's type and properties.

Chat Interface

Inbound chat data is written to this file descriptor.  This file descriptor is
tied to the chat window of the SteganRTP ncurses interface.

Remote Shell Interface

Inbound shell data from the remote application's shell service is written to
this file descriptor.  This file descriptor is tied to the shell window of the
SteganRTP ncurses interface.

Local Shell Service

Inbound shell data to the local application's shell service is written to this
file descriptor.  This file descriptor is tied to the local process providing
shell access.  This file descriptor does not exist in the list if the local
shell service is disabled.

File Transfers

Any number of file descriptors for data files being actively received may be
appended or removed from the end of the inbound file descriptors list.

Outbound File Descriptors

Outbound File Descriptors are polled, in order, for data waiting to be sent.
Due to being polled in order, they are essentially prioritized in that order
and data waiting to be sent from a prior descriptor in the list will have
precedence over data waiting to be sent from a latter descriptor.  The file
descriptors included in the outbound list are as follows:

Raw Message Interface

Entire, unencrypted outbound messages are written to this file descriptor.
This file descriptor is used for the replaying of entire messages in response
to a Resend control message as described in Section 3.4.3.

Control Message Interface

Outbound control messages as described in Section 3.4.3 are written to this file
descriptor after creation.

Chat Interface

Outbound chat data is written to this file descriptor.  This file descriptor
is tied to the command window of the SteganRTP ncurses interface.  All
non-command text entered into the command window while in chat mode is
considered chat data.

Remote Shell Interface

Outbound shell data is written to this file descriptor.  This file descriptor
is tied to the command window of the SteganRTP ncurses interface.  All
non-command text entered into the command window while in shell mode is
considered shell data.

Local Shell Service

Outbound shell data from the local shell service is written to this file
descriptor.  This file descriptor is tied to the local process providing shell
access.  This file descriptor does not exist in the list if the local shell
service is disabled.

File Transfers

Any number of file descriptors for data files being actively sent may be
appended or removed from the end of the outbound file descriptors list.

3.5.2) Message Handler

The SteganRTP application's message handler receives all valid incoming
messages as verified by the RTP packet receiving system for inbound packets.
This component performs all internal state changes and administrative tasks in
response to control messages.  It also handles the routing of inbound data
message payloads to the appropriate file descriptor in the inbound file
descriptors list.

Administrative Tasks

Echo Reply

If an Echo Request control message is received from the remote application,
the message handler constructs an appropriate Echo Reply control message as
described in Section 3.4.3 and writes it to the Control Message Interface file
descriptor in the outbound file descriptor list.

Start File Transfer

If a Start File control message is received from the remote application, the
message handler opens a new file descriptor using the file's context
information contained in the control message and appends the file descriptor
to the inbound file descriptors list.

End File Transfer

If an End File control message is received from the remote application, the
message handler closes the file descriptor for the file transfer indicated and
removes it from the inbound file descriptors list.


Data Routing

Chat Data

Inbound text chat data is buffered until a complete line of text is received
and then is written to the Chat Interface file descriptor in the inbound file
descriptors list.  A complete line of text is defined as being terminated by a
new-line character.

File Data

Inbound file transfer data is written to the appropriate file descriptor in
the inbound file descriptors list for the file transfer that the data belongs
to.

Shell Input Data

Shell Input Data messages contain input data for the local application's shell
service and is written to the Local Shell Service file descriptor in the
inbound file descriptors list.

Shell Output Data

Shell Output Data messages contain response data from the remote application's
shell service and is written to the Remote Shell Interface file descriptor in
the inbound file descriptors list.

3.5.3) Encryption System

The encryption method chosen for use in the SteganRTP reference implementation
is not really encryption at all.  In favor of light-weight and speed, a simple
bitwise exclusive-or (XOR)[26] obfuscation method was chosen as a symmetric
cipher.  The choice of encryption method here does not indicate that another,
more robust type of encryption could not be used; rather, the modular design
of the reference implementation promotes drop-in replacement of the current
encryption system entirely, assuming that the replacement encryption method
does not have a noticeable impact upon the latency of the overt RTP stream
being used as cover-medium.

The author does not claim that the obfuscation method used by the SteganRTP
reference implementation to be cryptographically secure.  Rather, it is well
documented in the literature that XOR against a repeating keystream is
insecure.  The obfuscation of message data is merely meant to provide some
rudimentary protections against statistical steganalysis which focuses upon
perceptible properties of language within the stego-medium.

The XOR obfuscation method employed by the SteganRTP reference implementation
consists of the following steps:

  1. Create a bit-pad for use as keying information.
  2. Choose an offset into the bit pad to begin using the keying information.
  3. XOR the message against the bit pad, byte by byte.

Bit-pad Creation

The method chosen for creation of the bit-pad is simply to duplicate the
bit-string found in keyhash, the creation of which is described in detail in
Section 3.3.1.

Choose a Bit-pad Offset

To help protect against some forms of statistical analysis that have proved
effective against XOR obfuscation using repeated static keying information, it
was decided against beginning every XOR loop at the same position within
keyhash.  To avoid this, a new offset into keyhash for each message must be
chosen.  The method that the SteganRTP reference implementation employs to
determine this offset is to use the hashword[25] function to create a 32-bit hash
of keyhash and the sum of the RTP packet being embedded into's Seq and
Timestamp header fields.  The resultant hash is then interpreted as a 32-bit
integer.  The integer modulus 20 is the chosen offset into keyhash.

The integer which is the result of the offset choosing operation and is within
the range of 0 through 19 is defined here as keyhashoffset and described by
Equation  below.

  keyhash_offset = hashword( keyhash, ( RTP_Seq + RTP_TS ) ) mod 20

The keyhash_offset equation incorporates keyhash so as to not be entirely
computable from observable information in the RTP packet header.

XOR Loop

When used as a bit-pad for the XOR operation loop, keyhash is used 8-bits, or
1-byte, at a time.  The XOR loop begins with the first byte of the message to
be obfuscated and the byte located at index keyhash_offset within keyhash.  The
two bytes are XORed to produce a result byte.  This result byte is placed into
the obfuscated message buffer at the same byte index as the original message
byte.  If the end of the bit-pad is reached, the position of the next byte in
the bit-pad returns to the beginning of the bit-pad.  When the end of the
original message is reached, the obfuscated message buffer should be of equal
length to the original message and have one corresponding obfuscated byte for
each original byte in the message.

It is important to note that within the scope of steganography terminology,
whether or not message data is obfuscated or encrypted is irrelevant.  As
such, further reference to the obfuscated message will still be referred to as
the message, or message data.

3.5.4) Embedding System

The embedding system that was developed for the SteganRTP reference
implementation is a generalized least-significant-bit (LSB) steganographic
data embedding method.  It is generalized such that when provided with a
cover-medium buffer, its length, the size of each word value within the
cover-medium buffer, and the message buffer to be embedded, it is then able to
perform the LSB embedding operation.  In this way, any audio Codec which uses
a linear grouping of fixed-length audio samples should be able to be utilized
as cover-medium by the embedding system.

For the purpose of discussion of the SteganRTP embedding system, the term word
value used in this context is equivalent to audio sample.  The example used
here, as well as the only Codec currently supported by the reference
implementation, is G.711.  G.711 is a Codec which encodes audio as a linear
grouping of 8-bit audio samples.  This encoded data is transported by RTP
packets as their payload and will serve as cover-medium.

Using the generalized LSB embedding method, the LSB of each word value in the
cover-medium is modified to be equivalent to a single bit from the message
data buffer, in order.  The properties of the RTP packet, such as its payload
length and payload type header value, determine how much message data can be
embedded into the packet's payload.  The RTP packet's payload size is
determined by subtracting the size of the RTP packet's header from the value
of the UDP packet header's Length field.  The wordsize is equivalent to the
sample size used by the RTP packet's Codec, indicated by the RTP packet
header's payload type field.  Modifying 1 bit from each word value requires 8
word values to embed a single byte of message data.  Thus, the amount of
available space within an RTP packet's payload for embedding is found by
multiplying the word value size by 8, then dividing the RTP packet payload
size by the result.

The resultant value is defined here as the RTP packet's available_space for
embedding and is described by Equation  below.

  available_space = RTP_payload_size / (wordsize * 8 )

The space available for user data after prepending the SteganRTP communication
protocol's message header is defined here as the SteganRTP message's
payload_size and is described by Equation  below.


  payload_size = available_space - sizeof( message_header )


Thus, payload_size bytes of user data can be packaged as a SteganRTP message
and embedded into an RTP packet payload cover-medium of availablespace bytes.
If an RTP packet is too small to contain a valid message, it is passed along
unmodified.

If a message being embedded is smaller than the available space in the
cover-medium, the message is padded out to the available size with random
data.  This ensures a more uniform distribution of modified values throughout
the cover-medium.

3.5.5) Extraction System

All inbound RTP packets are sent to the extraction system where potential
message data is extracted, decrypted, and then verified.  The extraction system
is essentially a reverse of the embedding system described in Section 3.5.4 and
then a pass through the symmetric encryption system described in Section 3.5.3.
This results in an decrypted potential message where the message's Checksum /
ID header field value can be verified to determine whether or not the extracted
potential message is valid.

If an extracted potential message is found to be valid, it is passed to the
message handler component.

3.5.6) Outbound Data Polling System

File descriptors in the outbound file descriptors list are polled, in order,
for data waiting to be sent.  When a file descriptor is found to have data, a
new formatted message is created if needed and data is read to fill the
payload of that message from the file descriptor.  The message type is
indicated by the file descriptor's record in the outbound file descriptors
list.  The result of this operation is a formatted SteganRTP message ready
for encryption and embedding into the cover-medium.

3.5.7) Message Caching System

All inbound and outbound SteganRTP messages are cached.  The outbound message
cache provides a mechanism for retrieval of any given message in the event
that the remote application issues a Resend control message requesting that
the message be resent.  The inbound message cache provides a mechanism for
storage of messages received that are beyond the expected sequence number.
Once the expected message is received, the others may be read back from the
cache rather than requesting that the remote application resend them.

3.5.8) Shell Service

The local application's shell service is essentially a child process executing
a shell.  This process's standard input and output file descriptors are
replaced with file descriptors which are stored in the inbound and outbound
file descriptors lists, respectively.  The local shell service is disabled by
default in the SteganRTP reference implementation and must be enabled via the
command-line.

3.6) Use

3.6.1) Command-line

The SteganRTP application provides a number of command-line arguments allowing
for control and configuration of various components.  The following sections
describe each in detail.

Usage Output Overview

The following usage output was copied verbatim from the most recent version of
the reference implementation, SteganRTP 0.3b.

Usage: steganrtp [general options] -t <host> -k <keyphrase>
  required options:
    at least one of:
      -a <host>       The "source" of the RTP session, or, host 
treated as the "close" endpoint (host A)
      -b <host>       The "destination" of the RTP session, or, 
host treated as the "remote" endpoint (host B)
    -k <keyphrase>  Shared secret used as a key to obfuscate 
communications
  general options:
    -c <port>       Host A's RTP port
    -d <port>       Host B's RTP port
    -i <interface>  Interface device (defaults to eth0)
    -s              Enable the shell service (DANGEROUS)
    -v              Increase verbosity (repeat for additional 
verbosity)
  help and documentation:
    -V              Print version information and exit
    -e              Show usage examples and exit
    -h              Print help message and exit

Command-line Arguments

The following command-line arguments are available from the SteganRTP
application's command-line.

-a host

host is the name or IP address of the closest side of the RTP session desired
to be utilized as cover-medium (Host A).

-b host

host is the name or IP address of the remote size of the RTP session desired
to be utilized as cover-medium (Host B).

-k keyphrase

keyphrase is a shared secret between the users of the two SteganRTP instances
which will be communicating.  In some cases, a single user may be running both
instances.  The keyphrase is used to generate a bit-pad via the SHA-1 hash
function which will later be used to obfuscate the data being
steganographically embedded into the RTP audio cover-data.

-c port

port is the RTP port used by Host A.

-d port

port is the RTP port used by Host B.

-i interface

interface is the interface to use on the local host.  This parameter defaults
to "eth0".

-s

This argument enables the command shell service.  If the command shell service
is enabled, the user of the remote instance of SteganRTP will be able to
execute commands on the local system as the user running SteganRTP.  You
likely don't want this unless you are the user running both instances of
SteganRTP and intend to use the remote instance as an interface for a remote
shell on that host.  This feature can be useful for remote administration of a
system without direct access to the system, assuming that RTP is allowed to
traverse traffic policy enforcement points.

-v

This argument increases the verbosity level.  Repeat for higher levels of
verbosity.

-V

This argument prints SteganRTP's version information and exits.

-e

This argument prints a quick examples reference.

-h

This argument prints the usage (help) information and exits.

Usage Examples

You can print a quick reference of the following examples from the SteganRTP
command-line by using the -e command-line argument.

The simplest command-line you can execute to successfully run SteganRTP is:

  steganrtp -k <keyphrase> -b <host>

This will begin a session utilizing any RTP session involving host-b as the
destination endpoint.

  steganrtp -k <keyphrase> -a <host-a> -b <host-b> -i <interface>

This will begin a session utilizing any RTP session between host-a and host-b
using interface interface

  steganrtp -k <keyphrase> -a <host-a> -b <host-b> -i <interface> -s

This is the same as the previous example but will enable the command shell
service:

  steganrtp -k <keyphrase> -a <host-a> -b <host-b> -c <a-port> -d 

This will begin a session utilizing a specific RTP session between host-a on
port a-port and host-b on b-port.  Note, this will effectively disable RTP
session auto-identification and will attempt to use an RTP session as
described whether it exists or not.  This is useful for when an RTP session
that is desirable for utilization is already in progress as the other examples
rely on libfindrtp to identify the RTP session as it is being set up by VoIP
signaling and thus must be waiting for the call-setup.

3.6.2) User Interface

SteganRTP provides a curses user interface featuring four windows; the Command
window at the bottom of the screen, the large Main window in the middle of the
screen, and the Input and Output Status windows at the top of the screen.

Windows

Command Window

All keyboard input, if accepted, is displayed in the Command window.  Lines of
input that are not prefixed with a slash ('/') character are treated as chat
text and are sent to the remote instance of SteganRTP as such.  Lines of input
that begin with a slash are considered commands and are processed by the local
instance of SteganRTP.

Main Window

When in Chat mode, chat text and general SteganRTP information messages and
events are displayed in the Main window.  When in shell mode, this window is
overloaded with the input to and output of the shell service provided by the
remote instance of SteganRTP.

Input Status Window

Events related to incoming RTP packets or SteganRTP communication messages are
displayed in the Input Status window.

Output Status Window

Events related to output RTP packets or SteganRTP communication messages are
displayed in the Output Status window.

Commands

The following commands can be executed from within the Command window:

/chat

The "chat" command puts the interface into Chat Mode.

/sendfile filename

The "sendfile" command queues a file for transmission to the remote instance
of SteganRTP.  filename is the path location and filename of the local file to
be sent.

/shell

The "shell" command puts the interface into Shell Mode.

/quit
/exit

The "quit" and "exit" commands exit the program.

/help
/?

The "help" and "?" commands print an available command list.

4) Solutions to Problems and Challenges

The following sections describe this research effort's approach to solving
many of the problems and challenges that were identified in Section 2.3, as
implemented via the SteganRTP reference implementation.  Most of the solutions
that have been devised during this research effort involved the creation of a
communications protocol to operate within the covert channel established
within the cover-medium.  This protocol employs a formatted message header
which is prepended to user message data before being embedded in the
cover-medium, providing various utility to the application making use of the
protocol.

4.1) Unreliable Transport

To mitigate the unreliable properties of the underlying transport protocols
used to transmit the cover-medium, the message header contains a sequence
number.  This sequence number coupled with the message caching system allows
the recipient to both identify when an expected message is missing as well as
request a resend of a particular message via a control message.  This property
also provides the added benefit of detecting erroneously or maliciously
replayed messages.

When considering potential solutions for this problem, various types of
Forward Error Correction (FEC) were considered.  Due to the limited space
available for message data as a result of the size of cover-medium available,
the additional space required for redundant data by most algorithms considered
deemed them to be unfit for purpose within this research effort's context.

4.2) Cover-Medium Size Limitations

The same property of RTP which restricts the size of available cover-medium in
each packet is luckily the same property which ensures that there are an
abundance of packets being sent between RTP endpoints every second.  User data
can be spread over multiple messages and cover-packets and then reassembled at
their destination.  For this research effort's purposes and goals, namely the
timely transfer of user text chat, interactive shell access, and transfer of
small files, an achieved throughput of 1,000 bytes per second as described in
Section 2.2.3 was found to be more than adequate.

4.3) Latency

To prevent against unintended impact on RTP packet latency, care was taken to
efficiently perform a number of operations:

4.3.1) Inbound Packet Processing

When receiving inbound RTP packets for processing, the receiving system does
not require making any modifications to the received packet.  In the SteganRTP
reference implementation, the packet is received and immediately accepted for
continued routing by the packet queue prior to extracting, decrypting, and
verifying any potential message data found within the payload.

4.3.2) Outbound Packet Processing

When receiving outbound RTP packets for processing, the fewest number of
operations possible must be performed in order to make a decision on whether
or not the packet should be immediately accepted for continued routing or if
it must be held for modification.  In the SteganRTP reference implementation,
the packet is received and then all active outbound file descriptors are
polled for data waiting to be sent.  If no data is waiting to be sent, the
packet is then accepted for continued routing by the packet queue. 

4.3.3) Encryption Overhead

When encrypting the raw message prior to embedding into the cover-medium, a
low-overhead algorithm was used.  The SteganRTP reference implementation
employs an XOR against a SHA-1 hash of a user-supplied shared-secret.

4.4) Tracking of RTP Streams

Identification and tracking of RTP streams is handled by the libfindrtp C
library paired with the NetFilter libipq C library for tracking and hooking
packets.  Both libraries were evaluated during this research effort's initial
requirements phase and were deemed fit for purpose.

4.5) Media Gateway Audio Modifications

4.5.1) Audio Codec Conversion

Due to the nature of VoIP, it is not always possible to detect whether or not
an audio session such as RTP is terminating at the actual recipient of the
call audio or at an intermediary.  As such, it is not possible to reliably
transmit stego-medium from end to end unless the actual network addresses of
each endpoint are known.  Due to this limitation, the SteganRTP reference
implementation assumes that there are no intermediary devices along the media
path making changes to the RTP payload.  The reference implementation makes
this assumption by also assuming that the sending and receiving applications
are either running on the same hosts as the RTP endpoint applications or are
along the network path between the two visible RTP endpoints which may or may
not be intermediaries.  The reference implementation requires that these
endpoint network addresses are specified by the user or identified by the RTP
session identification component.

4.6) Mid-session Audio Codec Change

The SteganRTP reference implementation's embedding component addresses the
issue of mid-session audio Codec change by determining the audio sample word
size dynamically based on the Codec value supplied by the RTP packet's header.
Thus, the embedding system's parameters are derived from each individual RTP
packet that will be embedded into as cover-medium.  If the RTP session were to
change Codecs mid-session, or even to change Codecs for every other packet,
the embedding system will only operate on RTP packets who's payloads are
encoded with a Codec that the embedding system recognizes and has parameters
defined for.  If the embedding system does not recognize and support a
particular packet's Codec, that packet is passed unmodified.

5) Conclusion

5.1) Design Goals

It is the author's belief that all of the design goals set forth in Section 3.1
for the SteganRTP reference implementation were met.  The primary goal of
steganography, establishment of a full-duplex communications channel,
compensation for the unreliable transport mechanism, identical user
experience regardless of mode of operation, and multi-type data transfer
were all accomplished.

5.2) Identified Challenges

It is the author's belief that all but two of the identified problems and
challenges identified in Section 2.3 were fully addressed.  The two challenges
that were not addressed were the various types of media gateway audio
modifications outlined in Section 2.3.6 due to scope and the issue of compressed
audio outlined in Section 2.3.5 due to time limitations of the research effort.

5.3) Secure Real-time Transfer Protocol

It is important to note that use of the Secure Real-time Transfer Protocol
(SRTP) RTP profile may prevent specific operational scenarios such as the
active MITM scenario described in Section 3.2.2.  Encrypting various parts of the
RTP header and RTP payload will prevent invasive modification of the payload
by an external entity to the RTP session.  SRTP, however, won't protect
against steganographic embedding of message data prior to the application of
the SRTP encryption methods, such as may be performed within the RTP endpoint
application itself.

5.4) Future Research

It is the author's intention to continue this research effort at a later time.
The identified areas for continued research include:

  1. Replacement of the generalized LSB embedding system with Codec specific
     embedding algorithms.  Utilizing Codec-specific properties, more intelligent
     embedding methods such as the inclusion of silence and voice detection can
     be performed as well as a wider variety of Codecs can be supported.
  2. Creation of embedding algorithms for video Codecs.
  3. Replacement of the XOR obfuscation system with real encryption.
  4. Addition of support for fragmentation of larger formatted messages across
     multiple RTP packet payload cover-mediums.
  5. Expansion of the shell service functionality into a more generalized
     services framework.

References

[1] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson,
    R. Sparks, M. Handley, and E. Schooler. Sip: Session initiation protocol.
    RFC 3261, Internet Society (IETF), June 2002.
[2] Wikipedia.      H.323 ¿ wikipedia, the free encyclopedia.             http://
    en.wikipedia.org/w/index.php?title=H.323&oldid=146577248, 2007.
    [Online; accessed 2-September-2007].
[3] Wikipedia.       Skinny client control protocol ¿ wikipedia, the free
    encyclopedia. http://en.wikipedia.org/w/index.php?title=Skinny
    Client Control Protocol&oldid=133621770, 2007. [Online; accessed 2-
    September-2007].
[4] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. Rtp: A transport
    protocol for real-time applications. RFC 1889, Internet Society (IETF), January
    1996.
[5] J. Postel. User datagram protocol. RFC 768, Internet Society (IETF), August
    1980.
[6] M. Baugher, D. McGrew, M. Naslund, E. Carrara, and K. Norrman. The secure
    real-time transport protocol (srtp). RFC 3711, Internet Society (IETF), March
    2004.
[7] Mehdi Kharrazi, Husrev T. Sencar, and Nasir Memon. Image steganography:
    Concepts and practice. Lecture Notes Series, Institute for Mathematical Sci-
    ences, National University of Singapore, 2004.
[8] Huaiqing Wang and Shuozhong Wang. Cyber warfare: steganography vs.
    steganalysis. Commun. ACM, 47(10):76¿82, 2004.
[9] Tayana Morkel, Jan H P Elo¿, and Martin S Olivier. An overview of im-
    age steganography. In Proceedings of the Fifth Annual Information Security
    South Africa Conference (ISSA2005), Sandton, South Africa, June/July 2005.
    Published electronically.
[10] Unknown. S-tools 4.0. ftp://ftp.funet.fi/pub/crypt/mirrors/idea.
     sec.dsi.unimi.it/code/s-tools4.zip, August 2006.
[11] Fabien A. P. Petitcolas. mp3stego. http://www.petitcolas.net/fabien/
     steganography/mp3stego/, June 2006.
[12] Heinz Repp. Hide 4 pgp. http://www.rugeley.demon.co.uk/security/
     hide4pgp.zip, December 1996.
[13] I)ruid.             An     analysis     of   voip      steganography      re-
     search       e¿orts.                   http://druid.caughq.org/papers/
     An-Analysis-of-VoIP-Steganography-Research-Efforts.pdf,
     September 2007.
[14] I)ruid. Real-time steganography with rtp. http://druid.caughq.org/
     presentations/Real-time-Steganography-with-RTP.pdf, August 2007.
[15] Defcon 15. http://www.defcon.org/html/defcon-15/dc-15-schedule.
     html, August 2007.
[16] T. Takahashi and W. Lee. An assessment of voip covert channel threats. http:
     //voipcc.gtisc.gatech.edu/download/securecomm.pdf, July 2007.
[17] Wikipedia.      G.711 ¿ wikipedia, the free encyclopedia.           http://
     en.wikipedia.org/w/index.php?title=G.711&oldid=151887535, 2007.
     [Online; accessed 6-September-2007].
[18] Wikipedia.        Least signi¿cant bit ¿ wikipedia, the free en-
     cyclopedia.              http://en.wikipedia.org/w/index.php?title=
     Least significant bit&oldid=150766150, 2007.              [Online; accessed
     6-September-2007].
[19] Voip foro - codecs. http://www.voipforo.com/en/codec/codecs.php,
     2007. [Online; accessed 5-September-2007].
[20] I)ruid. Steganrtp. http://sourceforge.net/projects/steganrtp/, Au-
     gust 2007.
[21] D. Eastlake 3rd and P. Jones. Us secure hash algorithm 1 (sha1). RFC 3174,
     Internet Society (IETF), September 2001.
[22] I)ruid. lib¿ndrtp.     http://sourceforge.net/projects/libfindrtp/,
     February 2007.
[23] Net¿lter.    http://www.netfilter.org/, 2007.          [Online; accessed 6-
     September-2007].
[24] Wikipedia. Type-length-value ¿ wikipedia, the free encyclopedia. http:
     //en.wikipedia.org/w/index.php?title=Type-length-value&oldid=
     128880452, 2007. [Online; accessed 3-September-2007].
[25] Bob Jenkins. Net¿lter. http://www.burtleburtle.net/bob/c/lookup3.c,
     May 2006. [Online; accessed 6-September-2007].
[26] Wikipedia. Exclusive or ¿ wikipedia, the free encyclopedia. http://en.
     wikipedia.org/w/index.php?title=Exclusive or&oldid=152332544,
     2007. [Online; accessed 5-September-2007].