Project 3: Analysis of RTP and RTCP Packets

Note: Examples of student project reports will be made available to course instructors upon request ( send email ).

The goal of this project is to capture and analyze RTP and RTCP packets during a real-time conference session over a wired and wireless network.

Where to find information about RTP/RTCP:

Textbook, Section 3.4
Wikipedia page for RTP and the Wireshark Wiki for RTP
YouTube video Lecture - 30 RTP
(Lecture Series on Broadband Networks by Prof. Karandikar, Department of Electrical Engineering, IIT Bombay)
Google Talk for Developers: Call Signaling
A page on RTP, Real-time Transport Protocol

The most detailed and authoritative source is RFC-3550
Also check RFC-3711 for description of security encryption in RTP.

This YouTube video explains how to decode RTP packets. It is not in English, but you can see how the author decodes the packets.

1. Experiment Description

The first step is to install a video conferencing software on computers in your project team. The most important requirements are that this software is based on RTP/RTCP and does not encrypt the packet payload. Google Hangouts, Skype, and Apple Facetime are the most popular choices, but Hangouts encrypts information and Skype does as well, as described in the TLS and sRTP for Skype Connect Technical Datasheet. It appears that it is not possible to selectively disable the encryption feature. Facetime appears not to be using payload encryption as of the time of this writing (October 2014), but it runs only on Apple computers.
An alternative is to search for an open source video conferencing software that does not do payload encryption. This Wikipedia page lists the features of various web conferencing software.
It appears that Linphone meets well our requiremets, so you may try using Linphone.
You may still run Hangouts or Skype for comparison, because they provide mature and widely used platforms, although you will not be able to accomplish all requirements of this project by using only Hangouts or Skype.

Establish a conferencing session over a wireless/Wi-Fi LAN (activate both the audio and video options).
Each participant should be in a different geographic location, or at least should try to connect to a different wireless LAN while conferencing. Conference for about 5 – 10 min; longer durations will ensure more meaningful statistics, which is particularly important because these data will be used in WS Project 4.
At the same time all participants should use Wireshark to capture all the IP packets sent from their host and received from other host(s).
For example, knowing that the IP address of your host is 192.168.2.11, you could use these Wireshark filters:

ip.src == 192.168.2.11: to display all packets sent from your host
ip.dst == 192.168.2.11: to display all packets received by your host

Use traceroute (Assignment #2) during or immediately after each session to determine the network paths between the participants. Each participant should determine their IP address and the destination IP addresses for all packets sent from their hosts (using Wireshark). Check if the destination address of the packets sent from your computer is the same IP address as that of another participant, or is it something else? You could use Spy-IP.com to determine the geographic location and the owner of the endpoint IP address.

Because of the complexity of this project, you may wish to first run some preliminary experiments to establish the methods for performing the actual experiments:

Determining the payload type for audio and video, as described in Section 2.2 below. This is necessary because applications such as Google Hangouts use dynamic payload types.
Determining the correspondences between source identifiers (SSRC numbers) carried in RTP and RTCP packets, as described in Sections 2.3 and 2.4 below. Start by running a one-way conference session with two participants, where one participant has activated both audio and video, and the other participant has muted audio and turned off video. Then capture all RTP and RTCP packets and list their properties (see Table I and Table II below), both at the sending side and at the receiving side.
This approach is necessary because it may be difficult to determine these correspondences with a two-way or n-way conferencing.

Note that even with audio muted you may observe audio packet transmissions, although these packets will have a much smaller size. These packets might be Silence Insertion Descriptor (SID) packets. RFC-3389 gives an explanation of SID and comfort noise.

Describe these experiments in your report before you describe your actual experiments. Call this section “Methods” and describe the actual experiments in the “Experiments” section. It makes sense to describe the methods first, not the last.

Perform your experiment at least two times, once during a low-intensity network traffic, and once during a busy period, when you expect that many people will be using the same wireless network. (In your report describe the locations of all participants, which Wi-Fi networks they used, and during which times of day.) Note that we are assuming that the wireless link is the “bottleneck.&rdqo; However, this assumption may need to be tested if you are using Linphone, for which the server is located overseas (in France).

2. RTP and RTCP Packet Analysis

After capturing the packets, use Wireshark filters to partition the traffic to/from your computer so they can be analyzed separately.
The data preprocessing consists of the following steps:

Separate the RTP data packets from RTCP control packets.
Identify the encoding schemes used to create the packet payload for audio/video RTP streams.
Determine the synchronization source (SSRC) identifiers for RTP packets.
Determine different types of SSRC identifiers in RTCP packets (“reporter,” “first source,” …, “n-th source”)

The following subsections describe the details of each step.

2.1 Separating RTP and RTCP packets

The first step is to separate the RTP data packets from RTCP control packets. If you are experimenting with Google Hangouts, their connection methods: “The UDP traffic consists of STUN, RTP, and RTCP packets, with SRTP encrypted data payloads.” Because the payloads are encrypted using SRTP, you will not be able to perform the analysis required for this project.
Note that the conference participants will not be directly connected to each other, but indirectly via the conference servers of your conferencing software (as you may have already discovered using traceroute).

It is not possible to recognize RTCP packets only based on their header. You must infer RTCP based on the UDP port—the UDP port(s) with majority packets are RTP data sessions. To separate the RTP data packets from RTCP control packets, use the fact that they are usually transmitted on different ports.
RFC-3550 in Section 11: “RTP over Network and Transport Protocols” describes some guidelines on demultiplexing of RTP data and RTCP control streams. It says that:

   For UDP and similar protocols,
   RTP SHOULD use an even destination port number and the corresponding
   RTCP stream SHOULD use the next higher (odd) destination port number.
   ...
   For applications in which the RTP and RTCP destination port numbers are
   specified via explicit, separate parameters (using a signaling
   protocol or other means), the application MAY disregard the
   restrictions that the port numbers be even/odd and consecutive
   although the use of an even/odd port pair is still encouraged.

However, a particular application may implement something different from the recommended port assignment.
Even if RTP and RTCP connections are not over consecutive UDP ports, it should be easy to recognize the RTP data port by the significantly larger number of packets compared to RTCP control packets.

First list the number of packets that one participant’s computer received on different UDP ports (by using the Wireshark display filter udp.srcport).
For example, you may observe the distribution of packets on different UDP ports as shown in Table I.
Table I: Incoming packets to Participant 1
Source Port Number Number of Packets Packet Type SSRC Total Number
of Lost Packets Mean Time Between
Two Packets

16384 6314
19304 103 (audio data)
99 (video data) 1562797448
1172311862 … …

16385 47 200 (RTCP SR) 1562797448 (reporter)
3459438840 (first source)
489794638 (second source) … …

26 201 (RTCP RR) 1172311862 (reporter)
3459438840 (first source)
489794638 (second source) … …

7 202 (RTCP SDES) … … …

16386 5 … … … …

16402 12 … … … …

For clarity, indicate in your tables the type of RTP packet payload (audio or video). In the “Methods” section of the report explain how you determined this type (for dynamic types).
Include in your tables the statistics of lost packets (of each type) as well as the Mean Times Between Two Packets of the same type. State what units are these times (e.g., milliseconds) and provide accompanying discussion.

Table I: *Incoming packets to Participant 1*
Source Port Number	Number of Packets	Packet Type	SSRC	Total Number of Lost Packets	Mean Time Between Two Packets
16384	6314 19304	103 (audio data) 99 (video data)	1562797448 1172311862	…	…
16385	47	200 (RTCP SR)	1562797448 (reporter) 3459438840 (first source) 489794638 (second source)	…	…
26	201 (RTCP RR)	1172311862 (reporter) 3459438840 (first source) 489794638 (second source)	…	…
7	202 (RTCP SDES)	…	…	…
16386	5	…	…	…	…
16402	12	…	…	…	…

Note that if you are capturing only RTCP Sender Report (SR) packets but none of Receiver Report (RR) packets, this may be because receiver reports are piggybacked on SR packets (see the SR packet format in Section 6.4.1 of RFC-3550). This is usually the case when the reporter host is both source and receiver at the same time. In other words, this participant is both sending audio/video to other participants and receiving their audio/video.

The remaining columns of Table I are described in the following subsections.
Also show the distribution of packets from the first participant to the second participant (Table II).
Table II: Outgoing packets from Participant 1
Destination Port Number Number of Packets Packet Type SSRC Total Number
of Lost Packets Mean Time Between
Two Packets

19302 4394
11749 103 (audio data)
99 (video data) 3459438840
489794638 … …

19303 31 200 (RTCP SR) 3459438840 (reporter)
1562797448 (first source)
1172311862 (second source) … …

7 … … … …

19304 5 … … … …

19305 12 … … … …

Table II: *Outgoing packets from Participant 1*
Destination Port Number	Number of Packets	Packet Type	SSRC	Total Number of Lost Packets	Mean Time Between Two Packets
19302	4394 11749	103 (audio data) 99 (video data)	3459438840 489794638	…	…
19303	31	200 (RTCP SR)	3459438840 (reporter) 1562797448 (first source) 1172311862 (second source)	…	…
7	…	…	…	…
19304	5	…	…	…	…
19305	12	…	…	…	…

You should show at least four tables in your report. For example, if your conferencing session had two participants, show separate tables for incoming and outgoing packets for each participant.
For each table, indicate the host on which the packets were captured and the direction (Inbound versus Outbound).

Note that the number and properties of RTP packets sent by one participant may not exactly correspond to those of packets received by the other participant. First, some packets may be lost in transmission (recall that UDP is an unreliable protocol). Second, the Google server over which the session is run may transcode audio or video from one compression format to another. As a result, more (smaller) packets or fewer (larger) packets may be received by the receiver than what the sender sent. Also, some of the packets’ parameters (such as SSRC identifiers) may be changed.

Note also that in addition to RTP data packets (audio or video), you may capture “RTP event packets” (shown in Wireshark as “RTP EVE”). These events support telephony-related signaling during the session, such as initiation of ringing tones. Check RFC-4733 for RTP payload format for named telephone events. List them in your tables, but do not mix them up with RTP data packets.

2.2 Identifying Payload Types for Audio/Video

The second step is to determine the encoding schemes used to create the packet’s payload for audio/video RTP streams. For example, G.711 is an audio codec.
RTP streams can only carry media from a single source. According to RFC-3550 in Section 2.2, if both audio and video media are used in a conference, they should be transmitted as separate RTP sessions. That is, separate RTP and RTCP packets should be transmitted for each medium using a different UDP-port pair for each.
Again, a particular application may instead implement multiple media streams over the same UDP-port pair.

Using Wireshark packet inspection, determine the codecs used in RTP stream and write them down in your report. The RTP header has a 7-bit field named “payload type”, which indicates the specific encoding scheme used to create the packet’s payload. For example, in Google Talk types 99 and 126 represent video, and types 103 and 105 represent audio. See more details in Google Talk Call Signaling.
Google Hangouts uses dynamic load type range from 96 to 127 and does not specify which packet type is used for audio or video. One approach is to guess the RTP packet type (audio/video) based on the packet length.
A more accurate approach for finding of the packet payload type may be by turning the video option ON or OFF and seeing which RTP packets are missing or reappear. Similarly, you can “mute” audio and check which RTP packets are missing.
Perhaps the best way is to apply both approaches and see whether large-sized packets disappear when you turn off the video.

For each synchronization source or SSRC i (described in the next section) determine the sampling rate τ_{SSRC_i}.
You can read the sampling rate of the used audio and video codecs directly from the SIP invite message.
Alternatively, the sampling rate can also found experimentally by using the following calculation on a pcap of sent packets (i.e., departing packets captured at their source, on their sending side). For each pair i, i+1 of subsequent packets, given their RTP timestamps t_{RTP_i} (in header), and their departure wireshark timestamps t_{ws_i}, calculate the instantaneous sampling rate τ_i as:
τ_i+1 = (t_{RTP_i+1} – t_{RTP_i}) / (t_{ws_i+1} – t_{ws_i})
The approximate sampling rate τ is calculated by averaging the instantaneous values τ_i over a long interval, say 5 minutes. For example, you may calculate 48.12 kHz and 90.22 kHz instead of the actual values 48 kHz and 90 kHz, because of varying delays in the packet transmission process.
Note that the timestamps found in the RTP header must be converted to from random-based numbers to actual time as described in WS Project #4.

2.3 Synchronization Source (SSRC) Identifiers of RTP Packets

The third step is to determine the synchronization source (SSRC) identifiers for RTP packets. A synchronization source (SSRC) is source of a stream of RTP data packets, such as a microphone or a camera. Each source must be identified as a different SSRC. All packets from a synchronization source belong to the same timing and sequence number space, so a receiver groups packets by their synchronization source for playback.
A synchronization source may change its data format, e.g., audio encoding, over time. (See more details in Section 3 of RFC-3550, on [Page 9].)

Note that when showing the SSRC identifiers in your report (such as in Table I and Table II), the SSRC identifiers must be aligned with the packet types, so that it is clear which source generated which type of packets.

2.4 Determining SSRC Identifiers in RTCP Packets

All receivers of RTP data packets issue reports about reception quality by sending RTCP report packets to senders of RTP data packets.
If a receiver is also a source of RP data packets, then it generates sender reports (SR).
According to RFC-3550 (Section 6.4), the only difference between a sender report (SR) and a receiver report (RR), other than the packet type code (“200” versus “201”), is that the sender report includes a 20-byte sender information section for use by active senders. The rest of a sender report is exactly the same as a receiver report.
Note that if a source is only generating RTP data packets and sending them to receivers, but does not receive any RTP data packets from other sources, then sender report packets from this source do not contain receiver report blocks.

For explanation of the meaning of different types of SSRC identifiers in RTCP packets (“reporter,” “first source,” “second source,” …, “n-th source”), check the textbook (Section 3.4) and RFC-3550 (Section 6).
Here is a brief explanation:

Reporter SSRC (or “SSRC of packet sender”) is the SSRC identifier of the entity that generated this RTCP packet and sent it as a report to the data source from which it received RTP data packets.
SSRC_1 is the SSRC identifier of the first source from which this reporter received some RTP data packets, such as audio or video.
SSRC_2 is the SSRC identifier of the second source from which this receiver received some RTP data packets, such as audio or video.
Etc. (Note that there may be more than two sources from which this reporter received RTP data packets during the reporting period.)

Unfortunately, Wireshark currently does not support generating summaries of RTCP packets, unlike for RTP packets. Therefore, to extract this information, you will need to program your own code in a language such as C, C++, Java, or C#.

There are two main aspects of identifying SSRC identifiers in RTCP packets:

The “i-th source SSRC” in “report blocks” must correspond to some source from which the sender of this RTCP packet received RTP packets during the previous period. For example, our Table I shows that Participant 1 received data packets from these sources: 1562797448 and 1172311862.
Therefore, Table II must include these SSRC identifiers in RTCP Receiver Report (RR) packets for “first source” and “second source.” No more and no less.
On the other hand, RTCP Sender Report (SR) packets from Participant 1 must contain the same SSRC identifiers (1562797448 and 1172311862), as shown in Table II, because Participant 1 (the report sender) is using SR packets to report about the RTP packets it received from these sources (see Table I).
Otherwise, this information is not decoded correctly.

It is possible that your compound RTCP packets carry reports about more than two sources (“first” and “second”), particularly if the conference session includes more than two participants. You can know this only by checking the packet length field in the first word of the RTCP header. The first word (4 bytes) is followed by 4 bytes of the SSRC of this reporter, which in turn is followed by 20 bytes of “sender info,” and then multiple “report block i,” each 24 bytes long.

When displaying the statistics of RTCP packets in tables, ensure that entries in all rows are properly aligned so that the correspondences between port numbers, payload types, and SSRC’s are clear.
Also, it must be clear which “reporter” SSRC identifier appears with which “i-th source” SSRC identifier.

2.5 Analyzing the Fractions of Different Packet Types

Compare the empirically observed fractions of packets of different type with the expected fractions calculated by the theoretical algorithm for computing the RTCP reporting interval (see Section 6.3.1 and Appendix A.7 of RFC-3550, as well as end of Section 3.4.2 in the textbook). The empirical fractions can be obtained from the statistics in Table I and Table II. If you cannot calculate the exact fractions, find the best approximation.

3. Report Preparation and Submission

NOTE: Keep the Wireshark data collected in this project because they will be used in WS Project 4.

As a minimum, include the following information in your report:

Describe the experimental conditions:
- Location where was the experiment done. Were the participants co-located? (e.g., on the University campus, at home, or someplace else?)
- Hardware and software (e.g., Linphone or Apple Facetime) used in the experiment.
- Wired or wireless networks used in the experiment (specify by name, datarate, or other characteristics). Did all participants use the same Wi-Fi network?
- Date of each experiment and the time of the day (would you consider it a busy or quiet period? –Explain.) Ideally, you should do it both during busy and quiet periods.
Describe your methods (such as the method for identifying payload type in RTP packets: audio versus video, and the method for determining the sampling rates of SSRCs) near the front of the report. Include any experimental results that support your methods.
IP addresses of the participating hosts. Draw the network paths between the participants for each traced route, as was done in WS Project 2.
Statistics of captured packets for each experimental session and all participants (as in Table I and Table II). Use different tables for inbound and outbound packets and include the following:
- Total duration of each session (in minutes).
- Number of captured packets listed by UDP port number, payload type and SSRC identifier.
- Payload type numbers of RTP and RTCP packets.
- Type of media in RTP data packets (audio, video), and the compression coding scheme.
Ensure that entries in all rows are properly aligned so that the correspondences between port numbers, payload types, and SSRC’s are clear.
All filters used for filtering the packets and description of what each filter does. Describe the exact procedure how you decoded UDP packets into RTP/RTCP and how you obtained the statistics shown in your tables.
If you developed your own programs to extract statistics from the captured packets include your source code at the end of the report.
The code must be properly described and commented to help the grader to understand it.
Analysis of the correspondence between the observed and theoretically-expected fractions of packets of different types.
List of references used during the data analysis and report preparation, such as websites, blogs, books, etc.

For all charts provide the measurement units for both horizontal and vertical axes. The units should be shown either in the chart itself or in its caption.
Discuss the differences in results for sessions recorded during periods of low-intensity network traffic versus the periods when the network is busy for wired and wireless scenarios.

The items listed above form just a minimum requirement for the report and can be satisfied to a different degree. Only the students who have performed greatest number of experiments and provided most extensive analysis and discussion of their results shall receive the maximum score (100%).

The report format is the same as for project 1.

Submission deadline given on the course syllabus page. (Only PDF format will be accepted!)

@ Back to Wireshark projects page
& Back to Computer Networks textbook page

Last Modified: Wed Oct  8 23:49:30 EDT 2014
Maintained by: Ivan Marsic