What is VoIP?
A Detailed Guide to Hosted Phone Service Terminology
Last updated: 01-07-2019
Voice Over Internet Protocol (VoIP) is a set of digital technologies that allow you to make phone calls over local networks and the internet. They simultaneously work together to facilitate the movement of voice and video between callers.
You’ll learn in this document that VoIP refers to many subsets of communications tech. All at once, it’s a large set of codecs like G.711, protocols like H.323, quality of service techniques like bandwidth allotment, and metrics like the Mean Opinion Score.
You can read from the top or jump into any of the sections below.
- The Public Switched Telephone Network
- Packet Switched Networks
- Challenges to VoIP Service Providers
- Codecs, Mean Opinion Score, and Protocols
- Quality of Service
- VoIP Phones and Other Equipment
The Public Switched Telephone Network
In order to understand how VoIP manages calls, it’s first necessary to learn about the Public Switched Telephone Network (PSTN) and historical methods of calling.
PSTN Has Many Moving Parts
The PSTN, like VoIP, exists as a superset that contains multiple moving parts. When you consider copper and fiber optic telephone lines, cellular networks, communications satellites, and undersea cables – and even switching centers that route calls between different types of information-carrying devices – you’re speaking about the PSTN.
What you’ll see most often of that superset are the telephone poles and copper wires that line your streets. The traditional, analog method of carrying information between household telephones uses those wires to send information between one another. Whenever household residents would pick up a telephone and hear a dial tone, they were using miles of copper wire to connect to a local telephone carrier’s office, which would route calls to the proper destinations.
In discussions about this type of traditional calling, you may also hear the term Plain Old Telephone Service (POTS). It isn’t uncommon to find PSTN and POTS used interchangeably in conversation.
Still, you should know that POTS, in definition, is more closely aligned with copper wires and home phone lines. PSTN can refer to those elements in addition to the rest of the analog and digital infrastructure elements listed above.
Circuit switching is the method used to connect traditional phone calls that move through a copper medium, like the wires that line your neighborhood’s streets. When one caller dialed at least one other person, they established a circuit that would remain active until the call ended. This connection was required before individuals could speak to one another.
Although connecting two individuals in this manner over long distances – perhaps over thousands of miles – is a somewhat weighty endeavor, it guaranteed that the full bandwidth of the connection was dedicated to those users for the full length of the call. It provided callers with high reliability and call quality.
The Public Switched Telephone Network gets its name from the fact that it is the sum of all communications infrastructure that uses circuit switching to manage calls between users.
Protocols and Regulations
There are a number of protocols a telephone network can use when routing calls, including the E.164 standard the International Telecommunication Union created to define the 15-digit international phone numbers we all recognize. The international phone numbering plan, defined in this high-level protocol, could be applied to calls made from your business desk phone, home phone, or a cell phone.
Other protocols, such as the signaling protocols H.323 and Session Initiation Protocol (SIP), which are discussed at length later in this article, are more specific in their applications. They were created for use within Voice Over IP (VoIP) systems, which are not strictly part of the PSTN. However, they do govern digital voice traffic that ultimately makes use of PSTN components.
For instance, you may see an internet-based service provider use SIP to connect a business’s on-site phone server to the internet. The service provider could route its client’s digital calls through copper infrastructure to a second business that’s still connected to the PSTN through a traditional analog service. In this case, you would see outgoing calls utilize the E.164 protocol and SIP simultaneously.
E.164 means to reach across national borders and unify calling across the globe; it has seen widespread adoption toward that goal. In contrast, individual phone service vendors often take the option of using H.323, SIP, or some other type of signaling protocol, which can lead to inconsistency in the market and interoperability issues between VoIP clients.
This isn’t the place to discuss all the regulations that govern public communications systems. Still, you should be aware that specific protocols exist and take different liberties as part of digital calling systems like VoIP and global networks like the PSTN.
Back to top.
Packet Switched Networks
VoIP does not use circuit switching. Instead, it relies on packet switching to transmit information from one caller to another.
The most fundamental answer to the question “What is VoIP service?” is that it’s a packet-switched communications network.
Networks that send packets do not create continuous connections between two or more nodes. They transmit small pieces of information, called packets, from the sender to the receiver. Those packets can take different paths (as demonstrated in this animation) and arrive at the receiver at different times, but they ultimately get put back together in the correct order.
The internet uses packets to send and retrieve information. Therefore, since VoIP is joined fundamentally to the internet, it must also use packets and will be subject to mechanics of that type of network.
VoIP’s link to packet switching provides it with a number of advantages and disadvantages. Let’s first look at the advantages. We’ll address disadvantages in the next section so you can see what companies like VirtualPBX must take into consideration when delivering high-quality voice service.
How Packet Switching Helps VoIP
Many organizations use VoIP calling to make it easier to route phones between departments. Instead of using complicated and expensive private branch exchanges (PBXs) on their sites, businesses can adopt a virtual PBX that uses VoIP techniques to connect phones across a local network.
The use of packet-based systems is also efficient. Whereas circuit switched networks create a connection that persists throughout downtimes – like when you’re not speaking – packet switched networks only send information when required.
Packet switching can also make long distance calling easier because, again, it doesn’t require a steady connection between two end points. It doesn’t need to tie New York and California with thousands of miles of copper wiring for the duration of a call. Therefore, the caller doesn’t need to “own” those lines for that period; the caller only needs to request use of copper, fiber, satellites, and data routers when data needs to be sent.
Back to top.
Challenges to VoIP Service Providers
In contrast to circuit switching, the packet switching method of transferring voice data can be relatively unreliable when it’s not managed properly. Such unreliability can lead to poor call quality.
Service providers want to offer their clients clear, consistent audio. Therefore they must address jitter, packet loss, and latency if they want to provide a high-quality service — especially in situations where clients will switch from an on-site PBX to VoIP.
Jitter is a primary offender when it comes to voice quality. Although the word jitter makes it sound like this culprit is the cause of shaky or choppy audio, that isn’t always the case.
Jitter may actually be heard as a wave-like motion that distorts the playback speed of a sound.
This occurs when individual packets are not sent and received in the same time frame.
Example: If a sentence you spoke to a colleague during a VoIP conversation used three packets – A, B, C – that were sent at intervals of 1 ms, you would need to have them reach your colleague as A, B, and C in that order with 1 ms between them in order to have the sentence make sense.
If A and B arrived as expected and were transmitted to the colleague, but C was several-hundred milliseconds late, your colleague may hear a distortion in the final part of your sentence.
Today’s advanced VoIP networks control jitter with techniques such as buffers and data prioritization, which you’ll learn more about later in this article.
Packet loss occurs when data packets are either discarded or don’t get transmitted to the receiver in a call. The auditory feedback from packet loss you might hear in a call could manifest as choppy voice quality or a cutting-out of entire words or sentences.
Packets can be interrupted by bursts of data within a network. In a case where this happens, network routers could choose to discard some of the voice data you intend to send to your colleague because they need room to send other packets deemed more important.
Keep in mind the effect of jitter. It refers to the time between individual packets, so it’s possible that jitter could become so large that a conversation would no longer make sense if packets weren’t discarded on purpose.
Example: Jitter could occur throughout your transmission of an entire spoken sentence. It could even reach the point where it causes delays of one to two seconds between packets. But by that time, you may have started a second sentence before the first sentence entirely reached your colleague.
The network could, in that case, reach a point where it wants to present your second sentence to the colleague before the first had finished being presented. At that point, it would have to decide whether or not it wanted to discard the severely-delayed packets from the first sentence.
If the network didn’t start to discard some of your packets, it may try to present everything at once, which could result in a jumbled transmission.
Alternatively, it could continue to delay the presentation of any new packets until the jittered packets were finally able to be received. This would cause a noticeable delay overall and would disrupt the flow of the entire conversation.
Neither of those situations are desirable. Yet, packet loss could also create quality concerns within the conversation. This is why commercial VoIP phone systems use Quality of Service techniques to prioritize VoIP calls so packet loss and delay are prevented.
The previous two issues lead us to another hiccup in the packet switching system when it comes to voice quality: Latency.
Latency refers to the overall delay in a call between when the sender speaks and the audio information is presented to the receiver. In a conversation, the receiver might hear the sender’s words several seconds after they were initially spoken.
Latency may be created by inefficiencies in a VoIP network or by the physical distance between users. Usually, latency is seen as having detrimental effect on the quality of a conversation. It can interrupt the natural rhythm of how individuals usually conduct themselves when speaking.
In some cases, however, latency can be a useful tool when combating jitter and packet loss. If those issues begin to severely affect call quality, then purposeful latency in a system could help add padding to the time a network has to present useful audio to a receiver.
You don’t want too much latency in your VoIP conversation, regardless of how it’s being used. When the gap between users becomes too large, it can be difficult to hold a meaningful conversation that includes all the nuance we expect with regard to presenting ideas. Networks that use latency to ease the effects of jitter and packet loss must balance any possible benefits with the negative effect of long pauses in conversation.
Example: Have you ever seen a news report where a newscaster interviews a politician who is in a remote location? The newscaster will eventually bring up an important point that the politician wants to respond to. But the delay between them will be so large that, by the time the politician begins to interrupt the newscaster, the newscaster has moved to their next point.
From there, the conversation experiences some strange pauses because both parties are caught trying to figure out a way to give each participant their time to speak.
This same sort of gap in conversation can happen when latency is too large in VoIP calls. It can create confusion in important business meetings and between colleagues who use internet-based calling to assist a remote workforce.
Back to top.
Codecs, Mean Opinion Score, and Protocols
There is a lot of variability in the way service providers can implement a business VoIP system. Depending on the client’s needs, providers may choose one codec or protocol over another.
Let’s take a look at some of the popular codecs and protocols that have become industry standards. Along the way, we’ll talk about the Mean Opinion Score and its integral position in service provision.
Don’t worry – this will not be an exhaustive list of all the codecs available to IP phones. We will get started by listing a few: G.711, G.729, G.723.1, and G.726.
A codec is a computer program that encodes and decodes audio. As an encoder, it prepares a data stream, such as a collection of voice packets, for transmission over a computer network. The same codec that encodes the audio is used on the receiving end to help organize the data stream into a mode that humans can understand, such as audio played through a set of headphones.
In a VoIP environment, codecs can encrypt and compress data so the information sent through a network is more secure or less bandwidth-intensive, respectively.
The VoIP codecs listed above can been rated by their Mean Opinion Score (MOS), which is discussed in the following section of this article. This rating system gives service providers a good idea of how clear and reliable a voice transmission will be when using one codec versus another.
Codecs have the power to compress audio so that the bandwidth requirement for a call is not overbearing. G.711 uses no compression; the others in our short list do. Compression can result in a loss of audio quality, but a balance can be achieved that keeps calls efficient and at a satisfactory quality for end users.
The FCC recommends that a single VoIP connection have a “Less than 0.5” Mbps connection speed in order to provide “adequate performance” in real-world scenarios. G.711 calls use about 0.087 Mbps, so a single VoIP call in any broadband connection should be fine when no other applications are running in that same connection.
When grouped together with many other calls and many other applications, however, businesses may experience a steep drop in the performance of their VoIP phones that use G.711.
Service providers can cater to an organization’s needs by recommending less bandwidth-intensive codecs such as G.723.1, which can use as little as 0.021 Mbps.
It’s clear that G.723.1 would use much less bandwidth per call than G.711. That said, using a codec with higher compression isn’t a magic bullet. If compressed too much and used in inappropriate circumstances, a lighter codec can show degradation just as easily as a heavier codec might.
Everything depends on your business’s needs and your network’s capabilities.
Service providers can cater to an organization’s circumstances by recommending phones and codecs that meet specific bandwidth and usage requirements. They will match the power of your network to a codec that will give you the highest expected call quality during normal workday conditions.
Mean Opinion Score
The Mean Opinion Score (MOS) of any VoIP codec provides a numerical representation of the call quality a user can expect when using a given codec. Although an MOS is only an indicator of how a codec could perform, it’s an effective tool in the service provider tool belt.
The MOS for any VoIP codec comes from a standardized analysis of audio samples. Either software or an auditing panel of humans can perform this task. In either case, the audio samples are rated on a scale of 1-5 with 1 being “Unsatisfactory” and 5 being “Excellent.”
The range of scores is meant to capture the real-world effect of a codec on the transmission of audio in a VoIP system. Human participants will inherently be subjective about how an audio codec makes them feel, and software can use algorithms to scan transmitted audio for signs of degradation or loss of quality.
As we addressed earlier, the voice data in these tests could be subject to delays or errors in transmission. The MOS tries to provide a realistic look at how a codec will perform under normal network conditions so service providers can make an informed choice about using one over any other.
Likewise, protocols in the VoIP landscape are legion, so we won’t list them here ad infinitum. Two you might want to be aware of, though, are H.323 and Session Initiation Protocol (SIP).
Protocols act as a set of guidelines or recommendations for completing a task. In the VoIP landscape, they might define how a service provider would manage its hardware-based or virtual terminals, gateways, and gatekeepers to ensure that VoIP calls maintain their integrity.
For instance, the International Telecommunication Union provides documentation for H.323 that defines how telecommunications companies should use their hardware and virtual systems. The Internet Engineering Task Force has released similar documentation in its attempt to standardize SIP.
Both H.323 and SIP are used for signaling. This means that, within a VoIP call, they help identify the state of a connection between two or more phones. They each approach this broad task in unique ways.
H.323 has been referred to as an “old world” VoIP protocol because of its relatively complex setup and its wide reach into other telecommunications systems.
In contrast, SIP takes the “new world” approach by leaving many duties, such as call reliability, to complimentary protocols. This can make it simpler for organizations to create SIP applications, but it can also lead to interoperability issues because multiple network operators may choose to follow differing protocols that support their SIP applications.
When one set of supporting protocols doesn’t play nicely with another set, clients’ applications may not be able to converse easily with one another.
One of H.323’s advantages is that it specifies how an operator should manage the entire network. Everything from VoIP codecs to network reliability can fall into that envelope. It helps create an end result where multiple clients will use the same network configurations and may therefore be more interoperable.
Back to top.
Quality of Service
Despite the daunting nature of jitter, packet loss, and latency, the challenges those issues present are manageable with Quality of Service (QoS) techniques such as bandwidth allotment, packet classification, queuing, fragmentation, and traffic shaping.
It isn’t enough for service providers to choose a voice codec and a set of protocols and then provision phones for a business. QoS techniques demand that providers manage network traffic efficiently – regardless of codec/protocol.
In the following sections, you can find more detail about how each of the above elements plays a part in the need for efficiency.
The most important thing you can do to improve call quality on your network is to allot the correct amount of bandwidth your calls demand.
If, for instance, you have a VoIP call that demands 90 kbps and you can only provide a 64 kbps link, you will fall well short of the required bandwidth for the call. The network would drop nearly a third of the packets associated with the transmission.
As it was pointed out earlier, most modern networks will have enough bandwidth to support a single call. Bandwidth allocation only becomes a problem when other applications take up the same space your VoIP calls want when sending voice traffic.
Now consider that every person in your office is doing the same thing. As a group, you demand a lot from the company network. One VoIP call might actually perform well within all that traffic, but an entire office full of callers could easily drain everything the network has left in the tank.
If you can add more bandwidth to the equation, you can lessen the overall impact any VoIP call will have on the network. Businesses that use a single all-purpose network may attempt to mitigate the problem of bandwidth consumption by simply adding more available bandwidth to the system. Other businesses may choose to use a separate network altogether — one that handles only VoIP traffic.
Cisco explains that packet classification is the basis for providing any QoS for a network.
The manner in which packet classification takes place can be complex. At its core, classification comes when a network is able to recognize voice packets. The network recognizes such packets by identifying either the source and destination IP addresses or the UDP port numbers associated with the voice traffic.
This process can be broken down even further by using a technique called marking, which substitutes the process of header classification by marking packets by the type of service they provide. Marks on any packet may instruct the network that a packet is part of the global VoIP configuration, part of specific protocol, or part of an access list, among other specifications.
Networks that use IP addresses in header information for classification will try to classify packets late in the data transfer process. Since classification of this type is processor-intensive, late-term action keeps the whole network efficient by reducing the amount of information any transfer node needs to read.
The network core will have no choice but to mark packets early. The core takes advantage of marking, instead of using IP addresses or UDP port numbers, because of its simplicity. From the core, a packet will have to travel through many nodes to reach its destination; marking makes it possible for those steps to remain efficient.
The process of queuing gives VoIP traffic a higher priority than other traffic on a network.
Remember how jitter and packet loss can occur when bandwidth can’t keep pace with all the applications demanding attention? Queuing gives VoIP traffic special consideration because the nature of its packets are so sensitive.
Service providers may use the low-latency queuing (LLC) method to give VoIP traffic a high priority in a network. LLC utilizes the Modular QoS Command-Line Interface (MQC) method of classification that uses broad traffic policies to determine how to route packets. MQC templates define such policies and give system administrators a lot of control while maintaining usability.
Many other queuing systems exist. They all make sure VoIP packets gain a high priority inside a network, but some suffer from limitations such as difficulty in configuration and the inability to provide bandwidth guarantees to packets.
The act of fragmentation makes sure that VoIP packets can successfully be interleaved with data packet fragments from other classes.
A simple example here is to consider a VoIP transmission happening in tandem with a system backup. There may be a time when the VoIP conversation doesn’t have any packets to send but the backup does, so the backup continues transmitting packets as it would normally.
If the conversation begins sending packets again, it will want to send those packets at an industry standard of no greater than a 20 ms delay. If it waits any longer, jitter could occur.
The conversation can’t wait for the backup to terminate. It can’t even wait for any relatively large backup packets to finish sending because those larger packets could take more than 20 ms to complete.
Fragmentation techniques make sure data from the prioritized VoIP conversation makes it through the system in an orderly fashion. In this situation, techniques like Maximum Transmission Unit (MTU) Fragmentation, Multilink Point-to-Point Protocol (MLP) Link Fragmentation, and Frame Relay Fragmentation would break the larger backup packets into pieces small enough that a whole VoIP packet would make it through in less than the 20 ms delay.
Those types of fragmentation each have unique benefits, but they all work toward the same goal of splitting up packets and making sure the receiving application can put them back together once all the data has been received.
It is not recommended that VoIP packets themselves are ever fragmented because call quality can suffer. Instead, the rest of the traffic in a network must be fragmented so whole VoIP packets can be sent at a regular rate.
Some fragmentation schemes take advantage of traffic shaping. Traffic shaping is necessary in environments where a network sends data from a high-speed central location to remote hubs that are slower than the central site.
Think of a central processor (512 kbps) that’s connected to a slower hub (128 kbps). Eventually, the hub will have to drop packets because the central site sends packets too quickly.
Frame Relay traffic shaping is common. It holds data that’s sent at speeds too high for the hub to receive. Then it groups that data together and sends it in short bursts to the hub at a standard transmission rate.
A frame relay will know how to prioritize VoIP traffic. It may still incur the need to drop packets, but the relay can decide which packets to drop. It will make sure the VoIP packets are sent in bursts the hub can handle and drop any packets associated with programs that don’t have bandwidth guaranteed to them.
Back to top.
VoIP Phones and Other Equipment
In some cases, service providers can get businesses up and running with VoIP in less than a day. How is this possible?
The technology used to make VoIP work properly largely occurs on the service provider’s end. The provider handles all the networking and call quality concerns before your business even thinks about purchasing a phone plan.
Still, you will need some equipment. Many businesses will want desk phones for their employees. Alternatively, employees can use their computers and mobile phones to make calls natively or through software applications. In some cases, you may also require a phone adapter to begin making calls. We’ll discuss all that in the following sections.
Most VoIP-ready phones sold to businesses are able to connect directly to the internet. These phones are able to send packets of voice data and interpret the packets sent to them.
Individual phone models have specific capabilities (such as WiFi integration and group calling support) depending on their included hardware and the firmware that controls their sending and receiving of data. In particular, firmware can define which codecs a phone understands and therefore how efficient it can act in a conversation.
Even with their limitations, business VoIP phones are meant to be delivered to the client, plugged into a power source, plugged into the internet, and begin making calls. Once they are recognized in a network, many models can automatically download the updates they need from a central server.
Analog Phones with an ATA
An Analog Telephone Adaptor (ATA) lets you connect a traditional telephone to the internet so it can be used in a VoIP environment. Although this isn’t typically necessary for businesses – since most will have the capital to buy IP phones that include this functionality natively – home users often find a use for ATAs.
The ATA is a small box with plugs that accept the analog phone cord, an ethernet cord, and a power unit to provide power for the ATA itself.
ATA’s use the same protocols and codecs used with IP phones. They can, for instance, understand H.323 and G.711 to communicate effectively with a service provider’s server.
Softphones and Native Smartphones
A business can opt for softphones instead of desk phones.
Softphones often provide the same functionality you would find in a desk phone, including the ability to send and receive calls, scan through an address book, and dial extensions in a VoIP network. These software applications can be built for specific phone platforms, such as iOS and Android, and may also be built for desktop environments, including Linux, Mac, and Windows.
Some VoIP service providers even offer smartphones that are built to natively interact with a voice network they control. In this type of situation, the service provider makes use of smartphone SIM cards to interact with the network. Users’ calls are made normally through a cellular network, but their phones are able to interact with an underlying VoIP platform like a softphone or desk phone would.
Although much of the intended functionality is the same as the softphone, web phones differ in one key way from softphones: They run in a web browser.
VoIP web phones make use of the real-time calling project WebRTC. Modern browsers like Firefox and Chrome understand how to run WebRTC code and connect callers.
Many businesses without a large budget can use web phones with their VoIP plans for little or no cost. Users can make and receive calls, and they can save contacts for easy calling to outside lines or extensions.
Back to top.