VoIP-Voice over Internet Protocol

1. Introduction

Evolution of communication system dates back to centuries when man developed means of communication. These methods lead to the development of Post. This method is very slow means of communication. Sir Graham Bell spun across the Atlantic to transmit voice and shocks send over the world. This lead to the deployment of telephone systems and in a matter of decades this system was in every house hold also the world.

The universally accepted and fully developed means of communication is the telephone. The telephone was designed for the transmission of analog voice signal. The telephone system creates a “dedicated line” between the calling parties. By dedicated line I mean that a specific channel is allocated for the transmission of voice over the channel. This channel is shared by nun, until the caller disconnects the line.

Internet made its dawn in the mid 80s with a great potential to develop the society. The Internet gave many features like FTP, E-mail & Remote login. Within seven years of its inception, the Internet had grown to span hundreds of individual networks located throughout the United Sates and Europe. It connects nearly 20,000 computers at universities, government, and corporate research laboratories. Both the size and the use of the Internet continued to grow much faster than anticipated. By late 1987, it was estimated that the growth had reached 15% per month. By 1994 the global Internet reached over 3 million computers in 61 countries.

Now if we look at the corporate level we find that a company had to pay for the Telephone service and also to the ISP for the Internet. The Internet is a cheap communication medium. But that is not the case with the telephone systems. After the globalization all the companies through out the world required to make a lot of International Calls this gave a heavy financial constrain on the company.

2.Traditional PSTN

The public telephone network and the equipment that makes Voice transmission possible are taken for granted in most parts of the world. Availability of a telephone and access to high-quality worldwide network is considered to be essential in modern society. The PSTN was originally designed for transmission of Voice only. There is no problem with the quality of service it provides. The telephone systems are expected to operate even when the power is off. These features of the telephone system have lead to the acceptance. The system that has developed has a large infrastructure and a huge capital investment in it.

But with the developments in the modern tele-tecnologies this system shows some problems.

It supports a dedicated line technology.
Old analog circuit switching technology.
High cost of operation.

With the number of user and the need of communication growing at an explosive rate has put constraints on the POTS. This system “dedicates” an entire channel to the calling party. This had problem in the modern systems as the channels are limited and they must be properly used.

Old analog systems have costly equipments and limited number of users. The cost of the operation of this system also increases. This will lead to burden on the user in terms of cost. The user or subscriber when called had to pay no only for the time he talked but also the time he doesn’t talk. What means if a subscriber does and international call and doesn’t speak any thing, still he had to pay for the time of the call.

All the above reasons have lead to high telephone bills on the corporate world.

3.Dawn of Packet Switching Networks

Over the past decade, the telecommunications industry has witnessed rapid changes in the way people and organizations communicate. Many of these changes spring from the explosive growth of the Internet and from applications based on the Internet Protocol (IP). The Internet has become a ubiquitous means of communication, and the total amount of packet-based network traffic has quickly surpassed traditional voice (circuit-switched) network traffic.

Availability of a telephone and access to a low-cost, high-quality worldwide network is considered to be essential in modern society (telephones are even expected to work when the power is off). Anything that would jeopardize this is usually treated with suspicion. There is, however, a paradigm shift beginning to occur since more and more communications is in digital form and transported via packet networks such as IP, ATM cells, and Frame Relay frames. Since data traffic is growing much faster than telephone traffic, there has been considerable interest in transporting voice over data networks (as opposed to the more traditional data over voice networks).

One legacy of the Internet bubble is the idea that Internet Protocol (IP) can be used for more than just basic exchange of files and Web content. IP offers enormous potential for communications. E-mail was the first (and still most) successful communication application that leveraged the Internet, and instant messaging has since transformed the communication habits of many workgroups. Such feature is.

Connectionless Packet Delivery Service: -Connectionless delivery is an abstraction of the service that most packet-switching networks offer. It means simply that a TCP/IP internet routes small message from one machine to another based on address information carried in the message. Because the connectionless service routes each packet separately, it doesn’t guarantee reliable, in-order delivery.
Low cost: -The cost of operation also deceases as one channel now is shared by not one user but by many users. In one channel all the packets are put one after another and then routed to there destination. This will divide the cost between many users.

4.VoIP(Voice over Internet Protocol)

Now we know that there are two methods of communication. One can be through the Circuit Switched Networks and the other can be through the Packet switched Networks. To understand the advantages of one over the other we have to go in for a complete comparison of the both these technologies.

Description	PSTN (Circuit Switched)	Internet(Packet switched)
Designed for	Voice only	Packetized data, voice & video
Bandwidth Assignment	64 Kbps (Dedicated)	Full-line bandwidth over a period of time
Delivery	5.40ms(Distance dependent)	Not predictable (usually more than PSTN)
Cost for the Service	Per-minute charge	Monthly Flat rate for Access
Voice quality	Toll Quality	Depends on customer Equipment
Connection Type	Telephone, PBX, Switches with frame relay and backbone	Modems, T1/E1,Gateways, switches, ISDN, Bridges, Routers, Backbone.
Quality of Service (QoS)	Real time delivery	Not Real time delivery
Network-Charters tics (Hardware)	Switching systems for assigned bandwidth	Routers & Bridges for layer 3 & 2 Switching
Network-Charters tics (Software)	Homogeneous	Various interoperable software systems
Access Points	Telephone, PBX, PABX, ISDN, Switches, High-speed trunks.	Modems, ISDN, T1/E1 gateways, high-speed DSL, Cable Modems

Table No. 1 Feature of PSTN and Internet.

The advantages of reduced cost and bandwidth savings of carrying voice over packet networks are associated with some quality of service issues unique to packet networks. These issues are explored below.

Delay
Delay causes two problems -- echo and talker overlap. Echo is caused by the signal reflections of the speaker's voice from the far end telephone equipment back into the speaker's ear. Echo becomes a significant problem when the round trip delay becomes greater than 50 milliseconds. Since echo is perceived as a significant quality problem, Voice over Packet systems must address the need for echo control and implement some means of echo cancellation.

Talker overlap (or the problem of one talker stepping on the other talker's speech) becomes significant if the one-way delay becomes greater than 250 msec. The end-to-end delay budget is therefore the major constraint and driving requirement for reducing delay through a packet network.

Following are sources of delay in an end to end Voice over Packet call:
1. Accumulation Delay (sometimes called algorithmic delay): This delay is caused by the need to collect a frame of voice samples to be processed by the voice coder. It is related to the type of voice coder used and varies from a single sample time (.125 microseconds) to many milliseconds. A representative list of standard voice coders and their frame times follows:

G.726 ADPCM (16, 24, 32, 40 Kbps) - .125 microseconds
G.728 - LD-CELP(16 Kbps) - 2.5 milliseconds
G.729 - CS- ACELP (8 Kbps) - 10 milliseconds
G.723.1 - Multi Rate Coder (5.3, 6.3 Kbps) - 30 milliseconds

2. Processing Delay: This delay is caused by the actual process of encoding and collecting the encoded samples into a packet for transmission over the packet network. The encoding delay is a function of both the processor execution time and the type of algorithm used. Often, multiple voice coder frames will be collected in a single packet to reduce the packet network overhead. For example, three frames of G.729 codewords, equaling 30 milliseconds of speech, may be collected and packed into a single packet.

3. Network Delay: This delay is caused by the physical medium and protocols used to transmit the voice data, and by the buffers used to remove packet jitter on the receive side. Network delay is a function of the capacity of the links in the network and the processing that occurs as the packets transit the network. The jitter buffers add delay which is used to remove the packet delay variation that each packet is subjected to as it transits the packet network. This delay can be a significant part of the overall delay since packet delay variations can be as high as 70-100 msec in some Frame Relay networks and IP networks.

Jitter
The delay problem is compounded by the need to remove jitter, a variable inter-packet timing caused by the network a packet traverses. Removing jitter requires collecting packets and holding them long enough to allow the slowest packets to arrive in time to be played in the correct sequence. This causes additional delay.

The two conflicting goals of minimizing delay and removing jitter have engendered various schemes to adapt the jitter buffer size to match the time varying requirements of network jitter removal. This adaptation has the explicit goal of minimizing the size and delay of the jitter buffer, while at the same time preventing buffer underflow caused by jitter.

Two approaches to adapting the jitter buffer size are detailed below. The approach selected will depend on the type of network the packets are traversing.

1. The first approach is to measure the variation of packet level in the jitter buffer over a period of time, and incrementally adapt the buffer size to match the calculated jitter. This approach works best with networks that provide a consistent jitter performance over time, such as ATM networks.

2. The second approach is to count the number of packets that arrive late and create a ratio of these packets to the number of packets that are successfully processed. This ratio is then used to adjust the jitter buffer to target a predetermined allowable late packet ratio. This approach works best with the networks with highly variable packet inter-arrival intervals, such as IP networks.

In addition to the techniques described above, the network must be configured and managed to provide minimal delay and jitter, enabling a consistent quality of service.

Lost Packet Compensation
Lost packets can be an even more severe problem, depending on the type of packet network that is being used. Because IP networks do not guarantee service, they will usually exhibit a much higher incidence of lost voice packets than ATM networks. In current IP networks, all voice frames are treated like data. Under peak loads and congestion, voice frames will be dropped equally with data frames. The data frames, however, are not time sensitive and dropped packets can be appropriately corrected through the process of retransmission. Lost voice packets, however, cannot be dealt with in this manner.

Some schemes used by Voice over Packet software to address the problem of lost frames are:

1. Interpolate for lost speech packets by replaying the last packet received during the interval when the lost packet was supposed to be played out. This scheme is a simple method that fills the time between non-contiguous speech frames. It works well when the incidence of lost frames is infrequent. It does not work very well if there are a number of lost packets in a row or a burst of lost packets.

2. Send redundant information at the expense of bandwidth utilization. The basic approach replicates and sends the nth packet of voice information along with the (n+1)th packet. This method has the advantage of being able to exactly correct for the lost packet. However, this approach uses more bandwidth and also creates greater delay.

3. A hybrid approach uses a much lower bandwidth voice coder to provide redundant information carried along in the (n+1)th packet. This reduces the problem of the extra bandwidth required, but fails to solve the problem of delay.

Echo Compensation
Echo in a telephone network is caused by signal reflections generated by the hybrid circuit that converts between a 4-wire circuit (a separate transmit and receive pair) and a 2-wire circuit (a single transmit and receive pair). These reflections of the speaker's voice are heard in the speaker's ear. Echo is present even in a conventional circuit switched telephone network. However, it is acceptable because the round trip delays through the network are smaller than 50 msec. and the echo is masked by the normal side tone every telephone generates.

Echo becomes a problem in Voice over Packet networks because the round trip delay through the network is almost always greater than 50 msec. Thus, echo cancellation techniques are always used. ITU standard G.165 defines performance requirements that are currently required for echo cancellers. The ITU is defining much more stringent performance requirements in the G.IEC specification.

Echo is generated toward the packet network from the telephone network. The echo canceller compares the voice data received from the packet network with voice data being transmitted to the packet network. The echo from the telephone network hybrid is removed by a digital filter on the transmit path into the packet network

4.1 What is Voice over Internet Protocol (VoIP)

Organizations around the world want to reduce rising communications costs. The consolidation of separate voice and data networks offers an opportunity for significant savings. Accordingly, the challenge of integrating voice and data networks is becoming a rising priority for many network managers. Organizations are pursuing solutions which will enable them to take advantage of excess capacity on broadband networks for voice and data transmission, as well as utilize the Internet and company Intranets as alternatives to costlier mediums.

A voice over packet application meets the challenges of combining legacy voice networks and packet networks by allowing both voice and signaling information to be transported over the packet network.

The possibility of voice communications traveling over the Internet, rather than the PSTN, first became a reality in February 1995 when Vocaltec, Inc. introduced its Internet Phone software. Designed to run on a 486/33-MHz (or higher) personal computer (PC) equipped with a sound card, speakers, microphone, and modem. This was the first step towards voice to be send over Internet Protocol, i.e. is better known now as Voice over IP (VoIP). VoIP refers to the convergence of traditional telephony networks with data networks, utilizing the existing data network infrastructure as the transport system for both services. To achieve this convergence, technology has been developed to take a voice signal, which originates as an analogue signal and transport it within a digital medium.

In VoIP systems, analog voice signals are digitized and transmitted as a stream of packets over a digital data network. IP networks allow each packet to independently find the most efficient path to the intended destination, thereby best using the network resources at any given instant. The packets associated with a single source may thus take many different paths to the destination in traversing the network, arriving with different end-to-end delays, arriving out of sequence, or possibly not arriving at all. At the destination, however, the packets are re-assembled and converted back into the original voice signal. VoIP technology insures proper reconstruction of the voice signals, compensating for echoes made audible due to the end-to-end delay, for jitter, and for dropped packets.

VoIP supplies many unique capabilities to the carriers and customers who depend on IP or other packet-based networks. The most important benefits include the following:

Cost savings— By moving voice traffic to IP networks, companies can reduce or eliminate the toll charges associated with transporting calls over the Public Switched Telephone Network (PSTN). Service providers and end users can also conserve bandwidth by investing in additional capacity only when it is needed. This is made possible by the distributed nature of VoIP and by reduced operations costs as companies combine voice and data traffic onto one network.
Open standards and multivendor interoperability— By adopting open standards, both businesses and service providers can purchase equipment from multiple vendors and eliminate their dependency on proprietary solutions.
Integrated voice and data networks— By making voice "just another IP application," companies can build truly integrated networks for voice and data. These integrated networks not only provide the quality and reliability of today's PSTN, they also enable companies to quickly and flexibly take advantage of new opportunities within the changing world of communications.

As these packet telephony networks grew and interconnection dependencies emerged, it became clear that the industry needed standard VoIP protocols. Several groups took up the challenge, resulting in independent standards, each with its own unique characteristics. In particular, network equipment suppliers and their customers were left to sort out the similarities and differences between four different signaling and call-control protocols for VoIP:

H.323
Media Gateway Control Protocol (MGCP)
Session Initiation Protocol (SIP)
H.248/Megaco

4.2 Modes of Operation

There are three different modes of operation of VoIP. These modes define the methods in which VoIP can be implemented. These are explanted as follows.

PC-to-PC: - This is the simplest means of communication of Voice in VoIP. In this system a Personal Computer (PC) is used at both the Source and Destination call center of the network. The PC takes the analog voice signal and then converts it into digital data. This digital data is then put in IP packets and then routed to its destination. At the destination these packets are again convert into analog signal and given as the output to speaker.

Following is the basic circuit of this mode.

Figure No.1 PC-to-PC Mode

PC-to-Phone: - Computers are not the basic Voice carrying equipment. Telephones have been used for long and almost all customers have telephones for a company to implement. Therefore in a VoIP system interface to a PSTN is very important. This is the second mode of operation of VoIP.

In this the Computer (caller) converts the voice to digital data. And then loads it into IP packets to be transmitted over the Internet to the local ISP of the destination here this data is interfaced to the Telephone network. At the interface it is required to convert the digital data back into the analog voice. Now the IP packet which till now was routed by the IP-address has to be converted into the telephone number of the destination party. These all functions are done at the interface by the IP gateway. Then signal is send on the PSTN to the destination party.

Figure No. 2. PC-to-Phone Mode

Phone-to-Phone Mode : -This the most important mode of the VoIP. This utilizes all the features of VoIP to the fullest extent. In this mode the costly toll of the international carriers is bypassed completely.

In this mode the gateways at the both the local destination and the source ISPs are doing all the functions. It first does the analog to digital conversion and then loading the IP packets for the transmission over the internet to the called parties local ISP (destination ISP) there again the reverse process is done to be put it back on the PSTN. Finally call reaching the called party.

Figure No. 3. Phone-to-Phone Mode

5. Basics of IP Telephony

Till now we saw about the technology which is called as the Voice over Internet Protocol (VoIP), we will now see about the basics involved in this technology one by one.

5.1 ADC & DAC Conversion of Voice

Voice generates naturally as an analog signal. This analog signal cannot be transmitted over the data network; it was transmitted over the circuit switching PSTN. Now as voice is to be transmitted over the Internet it is to be converted into the digital format. For this we need Analog to Digital Converters at the transmitter and the Digital to Analog converters at the receivers.

The converters are bounded by the constraints on delay they produced. They are a number of different methods. They are as follows.

G.726 ADPCM.
G.728 - LD-CELP.
G.729 - CS- ACELP.
G.723.1 - Multi Rate Coder.

5.2 Compression: -

When voice is converted into digital format it has a lot of redundant data bits. These leads to increase in the data to be transmitted. Now we know that if the data is increased it will consume precious bandwidth for transmission. So to increase the capacity of the channel and also to subscribe more user per channel we need to compress data.

Consider an example if a channel of 64Kbps bandwidth; that is used to transmit voice data. If uncompressed voice data is transmitted then where could be a bandwidth utilization of all 64Kbps. Now if we compress the same voice data we can transmit the same data in just 10-12Kbps. There are many compressing technologies present in the market.

In a practical example PCM samples are converted to voice frames and a ‘VOCODER’ compresses the frame G.729a created a 10msec long frame with 10 bytes of speech. It compresses the 128Kbps linear PCM stream to 8Kbps.

5.3 Embedding in Packets

The most important part of VoIP is the IP which stands for Internet Protocol. Speech is a real-time data. That means speech looses its mean if it is delayed beyond a limit. But the internet is a very unreliable medium and the delay cannot be predicted. Ordinary protocols which are used for data transmission are bounded by the fact that data should be received without any error at the receiver. For this means they do retransmission data. This would be absurd in case voice transmission and also if there is some error of a few bits it will not affect the output Speech.

Following are a few points about the protocol used in transmission of Speech.

Network Quality of Service—A TCP/IP network must have mechanisms in place to prioritize VoIP traffic above all other traffic on the network (except other real-time application traffic such as video). A protocol called Resource Reservation Protocol (RSVP) or Real-Time Protocol (RTP) has been designed to reserve resources across the network for real-time transmissions. Quality of Service (QoS) mechanisms within TCP/IP have also recently been implemented by a number of TCP/IP router and switch vendors. ATM networks and, to a lesser degree, Frame Relay networks have QoS functionality already built into them. Generally, TCP/IP routers and switches will use a priority queuing system to buffer non-VoIP packets and send them only after all of the VoIP packets have been transferred to the next network element. Large IP packets (non-VoIP) are buffered to the side so that the smaller VoIP packets can be sent on time. Other mechanisms will predict times of congestion over the wide area link and throttle back bandwidth demands from non-real-time applications.

IP Packet Precedence—IP precedence bits should be set at the edge of the network, with VoIP traffic given the highest possible precedence. Data networks with protocols other than TCP/IP running on them are not as well suited for VoIP as a purely TCP/IP network because it’s more difficult to give traffic priority when it is not a TCP/IP packet. Whenever possible, all bridged traffic should all be segregated from the TCP/IP WAN links where VoIP will flow. Bridging of any sort over the wide area will hinder TCP/IP QoS implementations.

Weighted Fair Queuing— Weighted fair queuing (WFQ) is a buffering mechanism that will buffer TCP/IP packets, classify them based on a number of different criteria, and then de-buffer the packets based on IP precedence or traffic flow. The classifications available are: source and destination address, protocol, and session identifier. During the de-queuing procedure, packets are given privilege based on the three IP precedence bits in the packet’s IP header.

All these above features are present in the protocol named as RTP RFC 1889. We will consider this protocol in detail.

5.4 RTP 1889 Internet Protocol

5.4.1 Internet Protocol

The Internet Protocol is the lowest level protocol considered in this document. It is responsible for the delivery of packets (or datagrams) between host computers. IP is a connectionless protocol, that is, it does not establish a virtual connection through a network prior to commencing transmission; this is the job for higher level protocols.

IP makes no guarantees concerning reliability, flow control, error detection or error correction. The result is that datagrams could arrive at the destination computer out of sequence, with errors or not even arrive at all. Nevertheless, RTP 1889 succeeds in making the network transparent to the upper layers involved in voice transmission through an IP based network.

Any Voice over IP transmission must use IP (by definition). IP is not well suited to voice transmission. Real time applications such as voice and video require guaranteed connection with consistent delay characteristics. Higher layer protocols address these issues (to a certain extent).

The diagram below shows the header that proceeds the data payload to be transmitted. In its most basic form, the header comprises 20 octets. There are optional fields which can be appended to the basic header, but these offer additional capabilities which are not necessary for VoIP transmission as described in this document.

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31
	Octet 1,5,9...								Octet 2,6,10...								Octet 3,7,11...								Octet 4,8,12...
1 - 4	Version				IHL				Type of service								Total length
5 - 8	Identification																Flags			Fragment offset
9 - 12	Time to live								Protocol								Header checksum
13 - 16	Source address
17 - 20	Destination address

The fields shown are briefly described below:

Version

The version of IP being used. For this format header, the version would be 4.

IHL

The length of the IP header in units of four octets (32 bits). For the basic header shown in this diagram, the value would be 5 (each line in the diagram represents four octets).

Type of service

Specifies the quality of service requested by the host computer sending the datagram. This is not always effectively supported by routers or Internet Service Providers.

Total length

The length of the datagram, measured in octets, including the header and payload.

Identification

As well as handling the addressing of datagrams between two computers (or hosts), IP needs to handle the splitting of data payloads into smaller packages. This process, known as fragmentation, is required because, although a single IP datagram can handle a theoretical maximum length of 65,515 octets, lower link layer protocols such as Ethernet cannot always handle these large packet sizes. This field is a unique reference number assigned by the sending host to aid in the reassembly of a fragmented datagram.

Flags

These flags indicate whether the datagram may be fragmented, and, if it has been fragmented, whether further fragments follow this one.

Fragment offset

This field indicates where in the datagram this fragment belongs. It is measured in units of 8 octets (64 bits).

Time to live

This field indicates the maximum time the datagram is permitted to remain in the internet system. This parameter ensures that a datagram which cannot reach its destination host is given a finite lifetime.

Protocol

This indicates the higher level protocol in use for this datagram. Numbers have been assigned for use with this field to represent such transport layer protocols as TCP and UDP.

Header checksum

This is a checksum covering the header only.

Source address

The IP address of the host which generated this datagram. IPv4 addresses are 32 bits in length and, when written or spoken, a dotted decimal notation is used (e.g.: 192.168.0.1).

Destination address

The IP address of the destination host.

5.4.2 UDP (User Datagram Protocol)

Generally, there are two protocols available at the transport layer when transmitting information through an IP network. These are TCP (Transmission Control Protocol) and UDP (User Datagram Protocol). Both protocols enable the transmission of information between the correct processes (or applications) on host computers. These processes are associated with unique port numbers (for example, the HTTP application is usually associated with port 80).

TCP is a connection oriented protocol; that is, it establishes a communications path prior to transmitting data. It handles sequencing and error detection, ensuring that a reliable stream of data is received by the destination application.

Voice is a real-time application, and mechanisms must be in place with ensure that information is received in the correct sequence, reliably and with predictable delay characteristics. Although TCP would address these requirements to a certain extent, there are some functions which are reserved for the layer above TCP. Therefore, for the transport layer, TCP is not used, and the alternative protocol, UDP, is commonly used.

In common with IP, UDP is a connectionless protocol. UDP routes data to it's correct destination port, but does not attempt to perform any sequencing, or to ensure data reliability.

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31
	Octet 1,5								Octet 2,6								Octet 3,7								Octet 4,8
1 - 4	Source port																Destination port
5 - 8	Length																Checksum

The fields shown are briefly described below:

Source port

Identifies the higher layer process which originated the data.

Destination port

Identifies with higher layer process to which this data is being transmitted.

Length

The length in octets of the UDP data and payload (minimum 8).

Checksum

Optional field supporting error detection.

5.4.3 RTP (Real-time Transport Protocol)

Real time applications require mechanisms to be in place to ensure that a stream of data can be reconstructed accurately. Datagrams must be reconstructed in the correct order, and a means of detecting network delays must be in place.

Jitter is the variation in delay times experienced by the individual packets making up the data stream. In order to reduce the effects of jitter, data must be buffered at the receiving end of the link so that it can be played out at a constant rate. To support this requirement, two protocols have been developed. These are RTP (Real-time Transport Protocol) and RTCP (RTP Control Protocol).

RTCP provides feedback on the quality of the transmission link. RTP transports the digitized samples of real time information. RTP and RTCP do not reduce the overall delay of the real time information. Nor do they make any guarantees concerning quality of service.

The RTP header, which precedes the data payload, is shown in the diagram below:

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31
	Octet 1,5,9								Octet 2,6,10								Octet 3,7,11								Octet 4,8,12
1 - 4	V=2		P	X	CC				M	PT							Sequence number
5 - 8	Timestamp
9 - 12	Synchronization source (SSRC) number

Version

Identifies the version of RTP (currently 2).

Padding

A flag which indicates whether the packet has been appended with padding octets after the payload data.

X (Header extension)

Indicates whether an optional fixed length extension has been added to the RTP header.

CC (CSRC count)

Although not shown on this header diagram, the 12 octet header can optionally be expanded to include a list of up to contributing sources. Contributing sources are added by mixers, and are only relevant for conferencing application where elements of the data payload have originated from different computers. For point to point communications, CSRCs are not required.

M (Marker)

Allows significant events such as frame boundaries to be marked in the packet stream.

PT (Payload type)

This field identifies the format of the RTP payload and determines its interpretation by the application

Sequence number

A unique reference number which increments by one for each RTP packet sent. It allows the receiver to reconstruct the sender's packet sequence.

Timestamp

The time that this packet was transmitted. This field allows the received to buffer and playout the data in a continuous stream.

Synchronization source (SSRC) number

A randomly chosen number which identifies the source of the data stream.

5.4.4 The complete header

The headers of the three payload carrying protocols discussed are sent sequentially before the digitised voice or video samples, which are actually the payload the RTP header.

The result is a 40 octet overhead for every packet of data:

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31
	Octet 1,5,9...								Octet 2,6,10...								Octet 3,7,11...								Octet 4,8,12...
1 - 4	Version				IHL				Type of service								Total length
5 - 8	Identification																Flags			Fragment offset
9 - 12	Time to live								Protocol								Header checksum
13 - 16	Source address
17 - 20	Destination address
21 - 24	Source port																Destination port
25 -28	Length																Checksum
29 - 32	V=2		P	X	CC				M	PT							Sequence number
33 - 36	Timestamp
37 - 40	Synchronization source (SSRC) number
	The headers are followed by a payload of digitized voice or video samples

5.4.5 RTP payload

The IP, UDP and RTP headers are followed by the data payload of the RTP header. This comprises digitized samples of voice and video. The length of these samples can vary, but for voice, samples representing 20ms are considered the maximum duration for the payload.

The selection of this payload duration is a compromise between bandwidth requirements and quality. Smaller payloads demand higher bandwidth per channel band, because the header length remains at forty octets. However, if payloads are increased, the overall delay of the system will increase, and the system will be more susceptible to the loss of individual packets by the network.

5.5 Routing of a IP Packet

A communication system is said to supply universal communication service if it allows any host computer to communicate with any other host. To make our communication system universal, it needs a globally accepted method of identifying each computer that attaches to it. This global identification is done by the IP Address. Every system must have a unique IP Address. This address is assigned by the ‘Internet Addressing Authority’. The Internet Assigned Number Authority (IANA) has ultimate control over numbers assigned, and set policy.

The above diagram shows the overall process of routing a packet. You can note the following things in the figure.

All the IP machines are assigned different IP Address.
All IP machines connected to an Ethernet Switch have all address field same except the last.
A router is used to route if there are more than one Ethernet switch.
Modems are used for transport of data over long distances.

Now we will see in detail how an IP packet is routed

First the IP machine a PC as in our case a takes Speech signal and converts is into digital data in embeds it in the IP Packet.
Now depending on the source and destination IP Address the address fields of the packet are enabled.
This loaded packet with source, destination addresses, data, along with other detail is placed network.
This packet travel to the Ethernet switch.
In the Ethernet switch the destination address field is checked.
If the last field is different with no change in the second last field it is routed back to the IP machine connected to the Ethernet switch.
If the second last field is different then the packet is send to the Router.
The Router analysis the destination address.
If it matches with one of the Ethernet Switches connected to it, it routed the packet to that Ethernet switch.
If it doesn’t matches with the address field. Then it routes the packet to other Routers connected.
At that Destination Router the address field is checked and routed to the desired Ethernet switch.
At the Ethernet switch the packet is routed to the IP machine.
And finally received by the destination machine.

5.6 Interface with PSTN

We will first understand the need for the interface with the PSTN. PSTN is a circuit switched network and by its features it is accepted all over now this immerging technology of VoIP is not available everywhere so if a person implement VoIP. And he wants to call a person then it may be possible that the destination party have VoIP equipment, but most of the time the destination doesn’t have a VoIP equipments.

To enable the user, so that he can call any one and also receive calls form any one the needs to interface with the PSTN. To interface we need a IP gateway.

From the above diagram we see that a Gateway must be at the interface of PSTN and VoIP network. Following are the operation carried out by the IP gateway.

First the IP packet it receivers is converted into analog speech signal.
Then the IP destination address is converted to the telephone number of the called party.
After these the call is established.
Then speech is transferred to the party.
On the on the other hand when the telephone calls the IP machine.
The IP gateway converts the analog signal into digital data and then loads them into IP packets.
Then it converts the telephone number of the destination into the IP address of the destination.
This packet is then transmits it on the VoIP network.
The packet itself finds the IP machine depending on the destination address.

5.7 Digital-to-Analog conversion

We until now have seen the conversion of speech to data. Then we saw the IPs used for speech data transmission. Means from the analog speech to the digital IP packets received at the destination, we know every thing in detail.

Now we will see the internal details about the receiver in brief. The receiver can be of two types one of the IP network as a PC and the other is the PSTN.

PSTN: - There is not much to say about the PSTN receiver as most of the part is covered in the ‘IP gateways’ topic. The final receiver is the telephone set.

IP Machine: - The IP machine is a device which accepts IP packets and then converts it into speech signal. This IP machine can be a PC loaded with Soft-Phone software. It can also be a special telephone with an IP -to- speech converting circuit and algorithm. These phones are called as IP-Phone.

With an IP-based voice platform, enterprises can deploy IP phones. These phones support traditional PBX functionalities (call forwarding, camp, hold, caller ID, etc), but will also be able to integrate with the Internet and Internet-based services (central directory information, Web pushes, e-mail integration, etc.). The phones pictured in Figure are IP phones currently sold by Cisco© and PingTel©. Other companies that sell IP phones today include Nortel©, Siemens©, NEC©, Mitel©, and Avaya©. Most of these phones follow the standards-based approach for convergence. This means a company can choose between phone manufacturers and know whichever phone they choose will interoperate with the company’s underlying network. Also, applications created by different vendors will work seamlessly with the IP phone handsets, just as third-party applications can integrate with PDAs today. After all, the network recognizes the IP phone as just another IP end device. One feature that IP phones enable that is not possible in the traditional PBX world is mobility. By plugging into any Internet connection, IP phones can be used at home (or anywhere in the world) and work the same as in the office. IP phones will enable communication mobility in a manner analogous to logging on to an ISP account (MSN©, AOL©) with a laptop; — no matter where you log on, the service is able to identify you and deliver your account specific, personalized information (e-mail, instant messages, home page configuration, etc). Every IP phone will have a global accepted IP address.

6. Requirements in INDIA

On an organizational point of view if we want to implement VoIP in India we need to first know about the terms and conditions laid down by the government in concern with it.

Until April 1st, 2001 transport of speech signals or the technology of VoIP was banned in India. That means no network that is meant to carry data could also carry Voice traffic. But this doesn’t mean that Voice data cannot be transported outside India. A person could send Voice data any where outside India but not in India. But after this rule the restrictions where eased.

Now a person can call or use the data network to transport speech. But the mode of operation is restricted. A user can only go for PC-to-PC operation.

That means an organization will have to get IP phones or Soft-phone on PC. If the phones are large in number, then we need Ethernet switch and if still not enough we need routers, and then to transmit the signal we need Modem. These modems will be connected to the ISP. For more information refer to Figure No.4.

And also if the ISP is to be connected to the PSTN then we need the IP gateways. But with the rule of the land it is illegal to install a gateway in India.

7.Why is VoIP widely accepted ?

7.1 VoIP Cost Structure & Channel Utilization: -

Long distance and especially international voice communications can be significantly less expensive when supported by an IP network rather than by the PSTN. Calls supported by VoIP technology are not subject to the same cost structure of access charges, transmission costs, and settlement charges.

Access charges are imposed by the local telephone company to allow long-distance carriers to originate or terminate the local portion of each telephone call. Transmission costs associated with the actual long distance transmission are typically much less thanks to the reduced bandwidth required by the data packets associated with the call. And, finally, the settlement charges associated with international calls are not present when international transmission is carried by an IP network. A call supported by the PSTN involves the establishment and cost of an end-to-end circuit that is maintained for the duration of the call. A call supported by VoIP technology, by contrast, involves the transmission of many individual packets over an IP network. The cost of a VoIP call thus depends in part on the number and size of the packets that must be transmitted; i.e., the bandwidth required. Use of speech compression algorithms can reduce the required bandwidth by a factor of 8 or more. Further bandwidth reductions can be obtained by recognizing and not explicitly transmitting the silences that naturally occur in human speech. These reductions in bandwidth directly translate to a reduction in cost.

The total cost per VoIP call is thus due to the costs associated with access to the gateways at both ends and the cost of transmission over the IP network. If originating calls access the VoIP gateway through the PSTN, access costs at the originating end may include the costs of local or long distance connections or the monthly cost of a toll-free access number. Access costs at the terminating end may include the costs of the local or long distance connections associated with terminating a call from the nearest gateway to the destination number.

The cost of transmission over the IP network depends on what IP network is employed. If the Internet is employed as the underlying IP network, then the only cost is the cost of Internet access at each gateway. Costs are higher if a proprietary or leased IP network is employed, but, in return, the network can provide enhanced reliability and assured Quality of Service.

7.2 Simplification.

An integrated infrastructure that supports all forms of communication allows more standardization and reduces the total equipment complement. This combined infrastructure can support dynamic bandwidth optimization and a fault tolerant design. The differences between the traffic patterns of voice and data offer further opportunities for significant efficiency improvements.

7.3 Consolidation.

Since people are among the most significant cost elements in a network, any opportunity to combine operations, to eliminate points of failure, and to consolidate accounting systems would be beneficial. In the enterprise, SNMP-based management can be provided for both voice and data services using VoIP. Universal use of the IP protocols for all applications holds out the promise of both reduced complexity and more flexibility. Related facilities such as directory services and security services may be more easily shared.

7.4 Advanced Applications.

Even though basic telephony and facsimile are the initial applications for VoIP, the longer term benefits are expected to be derived from multimedia and multi-service applications. For example, Internet commerce solutions can combine WWW access to information with a voice call button that allows immediate access to a call center agent from the PC. Needless to say, voice is an integral part of conferencing systems that may also include shared screens, white boarding, etc. Combining voice and data features into new applications will provide the greatest returns over the longer term. Although the use of voice over packet networks is relatively limited at present, there is considerable user interest and trials are beginning. End user demand is expected to grow rapidly over the next five years. Frost & Sullivan and other research firms have estimated that the compound annual growth rate for IP-enabled telephone equipment will be 132% over the period from 1997 to 2002 (from $47.3M in 1997 to $3.16B by 2002). It is expected that VoIP will be deployed by 70% of the Fortune 1000 companies by the year 2000. Industry analysts have also estimated that the annual revenues for the IP fax gateway market will increase from less than $20M in 1996 to over $100M by the year 2000. It is clear that a market has already been established and there exists a window of opportunity for developers to bring their products to market.

8. What are the constraints?

Following are the constraints that are faced by the VoIP technology.

PSTN is the most important competitor of VoIP. The Quality of Service (QoS) provided by the PSTN is very good and reliable. Telephones have been accepted globally. Now to implement a new technology needs a complete change in the overall system. All the circuits now are Circuit switch and for packet switch network we need all new equipments. Therefore it need a huge capital investment. Though the equipment cost of IP systems is comparatively less but a change will cost more.
The ‘law of the land’ doesn’t allow a full implementation of VoIP. As it will dramatically reduce to calls at the PSTN. Therefore this is also a restriction at least in India.

9. Future of VoIP Telephone

Several factors will influence future development in VoIP products and services. Currently, the most promising area for VoIP is corporate intranet and commercial extranets. Their IP-based infrastructures enable operators to control who can-and cannot-use the network.

Another influential element in the ongoing Internet-telephony evolution is the VoIP gateways. As these gateways evolve from PC-based platform to robust embedded systems, each will be able to handle hundreds of simultaneous calls. Consequently, corporations will deploy large number of them in an effort to reduce the expenses associated with high-volume voice, fax, and videoconferencing traffic. The economics of placing all traffic- data, voice, and video-over an IP-based network will pull companies in this direction, simply because IP will act as a unifying agent, regardless of the underlying architecture (i.e. leased lines, frame relay, or ATM) of an organization’s network.

Commercial extranets, based on conservatively engineered IP network, will deliver VoIP and Facsimile over Internet Protocol (FAXoIP) services to the general public. By guaranteeing specific parameters, such as packet delay, packet jitter, and service interoperability, these extranets will ensure reliable network support for such applications.

VoIP products and service transported via the public Internet will be niche markets that can tolerate the varying performance level of that transport medium. Telecommunication carriers most rely on the public Internet to provide telephone service between/among geographic locations that today are high-traffic area. It is unlikely that the public Internet’s performance characteristics will improve sufficiently within the next two years to stimulate significant growth in VoIP for that medium.

However, the public Internet will be able to handle voice and video service quite reliable within the next three to five years, once two critical changes take place:

An increase by several orders of the magnitude in backbone bandwidth and access speeds, stemming from the deployment of IP/ATM/synchronous optical networks and ISDN, cable modems, and x digital subscriber line (xDSL) technologies, respectively.
The tiering of the public Internet, in which user will be required to pay for specific service level they require.

On the other hand, FAXoIP products and services via the public Internet will become economically viable more quickly than voice and video, primarily because the technical roadblocks are less challenging. Within two year, corporations will take their fax traffic off the PSTN and move it quickly to the public Internet and corporate Intranet, first through FAXoIP gateways and then via IP-capable fax machines. Standards for IP-based fax transmission will be in place are in place by the end of this year.

Throughout the remainder of this decade, videoconferencing (H.232) with data collaboration (T.120) will become the normal method of corporate communications, as network performance and interoperability increase and business organization appreciate the economics of telecommuting. Soon, the video camera will be a standard piece of computer hardware, for full-featured multimedia systems, as well as for the less-than-$500 network-computer appliance now starting to appear in the market. The latter in particular should stimulate the residential demand and bring VoIP service to the mass market.