WebRTC

WebRTC is an open source project to enable realtime communication of audio, video and data in Web and native apps.

WebRTC has several JavaScript APIs

Where can I use WebRTC?

In Firefox, Opera and in Chrome on desktop and Android. WebRTC is also available for native apps on iOS and Android.

What is signaling?

WebRTC uses RTCPeerConnection to communicate streaming data between browsers, but also needs a mechanism to coordinate communication and to send control messages, a process known as signaling. Signaling methods and protocols are not specified by WebRTC. In this codelab you will use Socket.IO for messaging, but there are many alternatives.

Signaling is the process of coordinating communication. In order for a WebRTC app to set up a call, its clients need to exchange the following information:

  • Session-control messages used to open or close communication
  • Error messages
  • Media metadata, such as codecs, codec settings, bandwidth, and media types
  • Key data used to establish secure connections
  • Network data, such as a host’s IP address and port as seen by the outside world

This signaling process needs a way for clients to pass messages back and forth. That mechanism is not implemented by the WebRTC APIs. You need to build it yourself.

To avoid redundancy and to maximize compatibility with established technologies, signaling methods and protocols are not specified by WebRTC standards. This approach is outlined by the JavaScript Session Establishment Protocol (JSEP)

What are STUN and TURN?

WebRTC is designed to work peer-to-peer, so users can connect by the most direct route possible. However, WebRTC is built to cope with real-world networking: client applications need to traverse NAT gateways and firewalls, and peer to peer networking needs fallbacks in case direct connection fails. As part of this process, the WebRTC APIs use STUN servers to get the IP address of your computer, and TURN servers to function as relay servers in case peer-to-peer communication fails. (WebRTC in the real world explains in more detail.)

RTCPeerConnection API and signaling: Offer, answer, and candidate

RTCPeerConnection is the API used by WebRTC apps to create a connection between peers, and communicate audio and video.

To initialize this process, RTCPeerConnection has two tasks:

  • Ascertain local media conditions, such as resolution and codec capabilities. This is the metadata used for the offer-and-answer mechanism.
  • Get potential network addresses for the app’s host, known as candidates.

Once this local data has been ascertained, it must be exchanged through a signaling mechanism with the remote peer.

Imagine Alice is trying to call Eve. Here’s the full offer/answer mechanism in all its gory detail:

  1. Alice creates an RTCPeerConnection object.
  2. Alice creates an offer (an SDP session description) with the RTCPeerConnection createOffer() method.
  3. Alice calls setLocalDescription() with her offer.
  4. Alice stringifies the offer and uses a signaling mechanism to send it to Eve.
  5. Eve calls setRemoteDescription() with Alice’s offer, so that her RTCPeerConnection knows about Alice’s setup.
  6. Eve calls createAnswer() and the success callback for this is passed a local session description—Eve’s answer.
  7. Eve sets her answer as the local description by calling setLocalDescription().
  8. Eve then uses the signaling mechanism to send her stringified answer to Alice.
  9. Alice sets Eve’s answer as the remote session description using setRemoteDescription().

Alice and Eve also need to exchange network information. The expression “finding candidates” refers to the process of finding network interfaces and ports using the ICE framework.

  1. Alice creates an RTCPeerConnection object with an onicecandidate handler.
  2. The handler is called when network candidates become available.
  3. In the handler, Alice sends stringified candidate data to Eve through their signaling channel.
  4. When Eve gets a candidate message from Alice, she calls addIceCandidate() to add the candidate to the remote peer description.

JSEP supports ICE Candidate Trickling, which allows the caller to incrementally provide candidates to the callee after the initial offer, and for the callee to begin acting on the call and set up a connection without waiting for all candidates to arrive.

Code WebRTC for signaling

The following code snippet is a W3C code example that summarizes the complete signaling process. The code assumes the existence of some signaling mechanism, SignalingChannel. Signaling is discussed in greater detail later.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
// handles JSON.stringify/parse
const signaling = new SignalingChannel();
const constraints = {audio: true, video: true};
const configuration = {iceServers: [{urls: 'stuns:stun.example.org'}]};
const pc = new RTCPeerConnection(configuration);

// Send any ice candidates to the other peer.
pc.onicecandidate = ({candidate}) => signaling.send({candidate});

// Let the "negotiationneeded" event trigger offer generation.
pc.onnegotiationneeded = async () => {
try {
await pc.setLocalDescription(await pc.createOffer());
// send the offer to the other peer
signaling.send({desc: pc.localDescription});
} catch (err) {
console.error(err);
}
};

// After remote track media arrives, show it in remote video element.
pc.ontrack = (event) => {
// Don't set srcObject again if it is already set.
if (remoteView.srcObject) return;
remoteView.srcObject = event.streams[0];
};

// Call start() to initiate.
async function start() {
try {
// Get local stream, show it in self-view, and add it to be sent.
const stream =
await navigator.mediaDevices.getUserMedia(constraints);
stream.getTracks().forEach((track) =>
pc.addTrack(track, stream));
selfView.srcObject = stream;
} catch (err) {
console.error(err);
}
}

signaling.onmessage = async ({desc, candidate}) => {
try {
if (desc) {
// If you get an offer, you need to reply with an answer.
if (desc.type === 'offer') {
await pc.setRemoteDescription(desc);
const stream =
await navigator.mediaDevices.getUserMedia(constraints);
stream.getTracks().forEach((track) =>
pc.addTrack(track, stream));
await pc.setLocalDescription(await pc.createAnswer());
signaling.send({desc: pc.localDescription});
} else if (desc.type === 'answer') {
await pc.setRemoteDescription(desc);
} else {
console.log('Unsupported SDP type.');
}
} else if (candidate) {
await pc.addIceCandidate(candidate);
}
} catch (err) {
console.error(err);
}
};

To see the offer/answer and candidate-exchange processes in action, see simpl.info RTCPeerConnection and look at the console log for a single-page video chat example. If you want more, download a complete dump of WebRTC signaling and stats from the chrome://webrtc-internals page in Google Chrome or the opera://webrtc-internals page in Opera.

Peer discovery

Peer discovery mechanisms are not defined by WebRTC and you don’t go into the options here. The process can be as simple as emailing or messaging a URL. For video chat apps, such as Talky, tawk.to and Browser Meeting, you invite people to a call by sharing a custom link. Developer Chris Ball built an intriguing serverless-webrtc experiment that enables WebRTC call participants to exchange metadata by any messaging service they like, such as IM, email, or homing pigeon.

Is WebRTC secure?

Encryption is mandatory for all WebRTC components, and its JavaScript APIs can only be used from secure origins (HTTPS or localhost). Signaling mechanisms aren’t defined by WebRTC standards, so it’s up to you make sure to use secure protocols.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
<video autoplay playsinline></video>

const mediaStreamConstraints = {
video: true,
};
const hdConstraints = {
video: {
width: {
min: 1280
},
height: {
min: 720
}
}
}

// Video element where stream will be placed.
const localVideo = document.querySelector('video');
// Local stream that will be reproduced on the video.
let localStream;
// Handles success by adding the MediaStream to the video element.
function gotLocalMediaStream(mediaStream) {
localStream = mediaStream;
localVideo.srcObject = mediaStream;
}
// Handles error by logging a message to the console with the error message.
function handleLocalMediaStreamError(error) {
console.log('navigator.getUserMedia error: ', error);
}
// Initializes media stream.
navigator.mediaDevices.getUserMedia(mediaStreamConstraints)
.then(gotLocalMediaStream)
.catch(handleLocalMediaStreamError);
HTML Media Capture
1
2
3
<input type="file" accept="image/*;capture=camera">
<input type="file" accept="video/*;capture=camcorder">
<input type="file" accept="audio/*;capture=microphone">
device element ( browser not support )
1
2
3
4
5
6
7
<device type="media" onchange="update(this.data)"></device>
<video autoplay></video>
<script>
function update(stream) {
document.querySelector('video').src = stream.url;
}
</script>
getUserMedia
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
function hasGetUserMedia() {
return !!(navigator.mediaDevices &&
navigator.mediaDevices.getUserMedia);
}

if (hasGetUserMedia()) {
// Good to go!
} else {
alert('getUserMedia() is not supported by your browser');
}

<video autoplay></video>

<script>
const constraints = {
video: true
};

const video = document.querySelector('video');

navigator.mediaDevices.getUserMedia(constraints).
then((stream) => {video.srcObject = stream});
</script>

const hdConstraints = {
video: {width: {min: 1280}, height: {min: 720}}
};
navigator.mediaDevices.getUserMedia(hdConstraints).
then((stream) => {video.srcObject = stream});


const vgaConstraints = {
video: {width: {exact: 640}, height: {exact: 480}}
};
navigator.mediaDevices.getUserMedia(vgaConstraints).
then((stream) => {video.srcObject = stream});
select a media source
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
var videoElement = document.querySelector('video');
var audioSelect = document.querySelector('select#audioSource');
var videoSelect = document.querySelector('select#videoSource');

audioSelect.onchange = getStream;
videoSelect.onchange = getStream;

getStream().then(getDevices).then(gotDevices);

function getDevices() {
// AFAICT in Safari this only gets default devices until gUM is called :/
return navigator.mediaDevices.enumerateDevices();
}

function gotDevices(deviceInfos) {
window.deviceInfos = deviceInfos; // make available to console
console.log('Available input and output devices:', deviceInfos);
for (const deviceInfo of deviceInfos) {
const option = document.createElement('option');
option.value = deviceInfo.deviceId;
if (deviceInfo.kind === 'audioinput') {
option.text = deviceInfo.label || `Microphone ${audioSelect.length + 1}`;
audioSelect.appendChild(option);
} else if (deviceInfo.kind === 'videoinput') {
option.text = deviceInfo.label || `Camera ${videoSelect.length + 1}`;
videoSelect.appendChild(option);
}
}
}

function getStream() {
if (window.stream) {
window.stream.getTracks().forEach(track => {
track.stop();
});
}
const audioSource = audioSelect.value;
const videoSource = videoSelect.value;
const constraints = {
audio: {deviceId: audioSource ? {exact: audioSource} : undefined},
video: {deviceId: videoSource ? {exact: videoSource} : undefined}
};
return navigator.mediaDevices.getUserMedia(constraints).
then(gotStream).catch(handleError);
}

function gotStream(stream) {
window.stream = stream; // make stream available to console
audioSelect.selectedIndex = [...audioSelect.options].
findIndex(option => option.text === stream.getAudioTracks()[0].label);
videoSelect.selectedIndex = [...videoSelect.options].
findIndex(option => option.text === stream.getVideoTracks()[0].label);
videoElement.srcObject = stream;
}

function handleError(error) {
console.error('Error: ', error);
}
Taking screenshots
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
<video autoplay></video>
<img src="">
<canvas style="display:none;"></canvas>

<script>
const captureVideoButton =
document.querySelector('#screenshot .capture-button');
const screenshotButton = document.querySelector('#screenshot-button');
const img = document.querySelector('#screenshot img');
const video = document.querySelector('#screenshot video');

const canvas = document.createElement('canvas');

captureVideoButton.onclick = function() {
navigator.mediaDevices.getUserMedia(constraints).
then(handleSuccess).catch(handleError);
};

screenshotButton.onclick = video.onclick = function() {
canvas.width = video.videoWidth;
canvas.height = video.videoHeight;
canvas.getContext('2d').drawImage(video, 0, 0);
// Other browsers will fall back to image/png
img.src = canvas.toDataURL('image/webp');
};

function handleSuccess(stream) {
screenshotButton.disabled = false;
video.srcObject = stream;
}
</script>
CSS Filters
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
<video autoplay></video>
<p><button class="capture-button">Capture video</button>
<p><button id="cssfilters-apply">Apply CSS filter</button></p>

<script>
const captureVideoButton =
document.querySelector('#cssfilters .capture-button');
const cssFiltersButton =
document.querySelector('#cssfilters-apply');
const video =
document.querySelector('#cssfilters video');

let filterIndex = 0;
const filters = [
'grayscale',
'sepia',
'blur',
'brightness',
'contrast',
'hue-rotate',
'hue-rotate2',
'hue-rotate3',
'saturate',
'invert',
''
];

captureVideoButton.onclick = function() {
navigator.mediaDevices.getUserMedia(constraints).
then(handleSuccess).catch(handleError);
};

cssFiltersButton.onclick = video.onclick = function() {
video.className = filters[filterIndex++ % filters.length];
};

function handleSuccess(stream) {
video.srcObject = stream;
}
</script>

Build a signaling service with Socket.io on Node

Socket.io uses WebSocket with fallbacks: AJAX long polling, AJAX multipart streaming, Forever Iframe, and JSONP polling. It has been ported to various backends, but is perhaps best known for its Node version used in this example.

There’s no WebRTC in this example. It’s designed only to show how to build signaling into a web app. View the console log to see what’s happening as clients join a room and exchange messages. This WebRTC codelab gives step-by-step instructions for how to integrate this into a complete WebRTC video chat app.

1
2
3
4
5
6
7
8
9
10
11
12
// index.html

<!DOCTYPE html>
<html>
<head>
<title>WebRTC client</title>
</head>
<body>
<script src='/socket.io/socket.io.js'></script>
<script src='js/main.js'></script>
</body>
</html>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// main.js

const isInitiator;

room = prompt('Enter room name:');

const socket = io.connect();

if (room !== '') {
console.log('Joining room ' + room);
socket.emit('create or join', room);
}

socket.on('full', (room) => {
console.log('Room ' + room + ' is full');
});

socket.on('empty', (room) => {
isInitiator = true;
console.log('Room ' + room + ' is empty');
});

socket.on('join', (room) => {
console.log('Making request to join room ' + room);
console.log('You are the initiator!');
});

socket.on('log', (array) => {
console.log.apply(console, array);
});
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
// server.js

const static = require('node-static');
const http = require('http');
const file = new(static.Server)();
const app = http.createServer(function (req, res) {
file.serve(req, res);
}).listen(2013);

const io = require('socket.io').listen(app);

io.sockets.on('connection', (socket) => {

// Convenience function to log server messages to the client
function log(){
const array = ['>>> Message from server: '];
for (const i = 0; i < arguments.length; i++) {
array.push(arguments[i]);
}
socket.emit('log', array);
}

socket.on('message', (message) => {
log('Got message:', message);
// For a real app, would be room only (not broadcast)
socket.broadcast.emit('message', message);
});

socket.on('create or join', (room) => {
const numClients = io.sockets.clients(room).length;

log('Room ' + room + ' has ' + numClients + ' client(s)');
log('Request to create or join room ' + room);

if (numClients === 0){
socket.join(room);
socket.emit('created', room);
} else if (numClients === 1) {
io.sockets.in(room).emit('join', room);
socket.join(room);
socket.emit('joined', room);
} else { // max two clients
socket.emit('full', room);
}
socket.emit('emit(): client ' + socket.id +
' joined room ' + room);
socket.broadcast.emit('broadcast(): client ' + socket.id +
' joined room ' + room);

});

});

Readymade signaling servers

If you don’t want to roll your own, there are several WebRTC signaling servers available, which use Socket.IO like the previous example and are integrated with WebRTC client JavaScript libraries:

If you don’t want to write any code at all, complete commercial WebRTC platforms are available from companies, such as vLine, OpenTok, and Asterisk.

For the record, Ericsson built a signaling server using PHP on Apache in the early days of WebRTC. This is now somewhat obsolete, but it’s worth looking at the code if you’re considering something similar.

WebRTC apps can use the ICE framework to overcome the complexities of real-world networking. To enable this to happen, your app must pass ICE server URLs to RTCPeerConnection, as described in this article.

ICE tries to find the best path to connect peers. It tries all possibilities in parallel and chooses the most efficient option that works. ICE first tries to make a connection using the host address obtained from a device’s operating system and network card. If that fails (which it will for devices behind NATs), ICE obtains an external address using a STUN server and, if that fails, traffic is routed through a TURN relay server.

In other words, a STUN server is used to get an external network address and TURN servers are used to relay traffic if direct (peer-to-peer) connection fails.

Every TURN server supports STUN. A TURN server is a STUN server with additional built-in relaying functionality. ICE also copes with the complexities of NAT setups. In reality, NAT hole-punching may require more than just a public IP:port address.

URLs for STUN and/or TURN servers are (optionally) specified by a WebRTC app in the iceServers configuration object that is the first argument to the RTCPeerConnection constructor. For appr.tc, that value looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
{
'iceServers': [
{
'urls': 'stun:stun.l.google.com:19302'
},
{
'urls': 'turn:192.158.29.39:3478?transport=udp',
'credential': 'JZEOEt2V3Qb0y27GRntt2u2PAYA=',
'username': '28224511:1379330808'
},
{
'urls': 'turn:192.158.29.39:3478?transport=tcp',
'credential': 'JZEOEt2V3Qb0y27GRntt2u2PAYA=',
'username': '28224511:1379330808'
}
]
}

STUN

NATs provide a device with an IP address for use within a private local network, but this address can’t be used externally. Without a public address, there’s no way for WebRTC peers to communicate. To get around this problem, WebRTC uses STUN.

STUN servers live on the public internet and have one simple task—check the IP:port address of an incoming request (from an app running behind a NAT) and send that address back as a response. In other words, the app uses a STUN server to discover its IP:port from a public perspective. This process enables a WebRTC peer to get a publicly accessible address for itself and then pass it to another peer through a signaling mechanism in order to set up a direct link. (In practice, different NATs work in different ways and there may be multiple NAT layers, but the principle is still the same.)

STUN servers don’t have to do much or remember much, so relatively low-spec STUN servers can handle a large number of requests.

Most WebRTC calls successfully make a connection using STUN—86% according to Webrtcstats.com, though this can be less for calls between peers behind firewalls and complex NAT configurations.

TURN

RTCPeerConnection tries to set up direct communication between peers over UDP. If that fails, RTCPeerConnection resorts to TCP. If that fails, TURN servers can be used as a fallback, relaying data between endpoints.

Just to reiterate, TURN is used to relay audio, video, and data streaming between peers, not signaling data!

TURN servers have public addresses, so they can be contacted by peers even if the peers are behind firewalls or proxies. TURN servers have a conceptually simple task—to relay a stream. However, unlike STUN servers, they inherently consume a lot of bandwidth. In other words, TURN servers need to be beefier.

Deploying STUN and TURN servers

For testing, Google runs a public STUN server, stun.l.google.com:19302, as used by appr.tc. For a production STUN/TURN service, use the rfc5766-turn-server. Source code for STUN and TURN servers is available on GitHub, where you can also find links to several sources of information about server installation. A VM image for Amazon Web Services is also available.

An alternative TURN server is restund, available as source code and also for AWS. Here are instructions for how to set up restund on Compute Engine.

  1. Open firewall as necessary for tcp=443, udp/tcp=3478.
  2. Create four instances, one for each public IP, Standard Ubuntu 12.06 image.
  3. Set up local firewall config (allow ANY from ANY).
  4. Install tools:
    sudo apt-get install makesudo apt-get install gcc
  5. Install libre from creytiv.com/re.html.
  6. Fetch restund from creytiv.com/restund.html and unpack.
  7. wget hancke.name/restund-auth.patch and apply with patch -p1 < restund-auth.patch.
  8. Run make, sudo make install for libre and restund.
  9. Adapt restund.conf to your needs (replace IP addresses and make sure it contains the same shared secret) and copy to /etc.
  10. Copy restund/etc/restund to /etc/init.d/.
  11. Configure restund:
    a. Set LD_LIBRARY_PATH.
    b. Copy restund.conf to /etc/restund.conf.
    c. Set restund.conf to use the right 10. IP address.
  12. Run restund
  13. Test using stund client from remote machine: ./client IP:port

Find out more

The WebRTC codelab provides step-by-step instructions for how to build a video and text chat app using a Socket.io signaling service running on Node.

Google I/O WebRTC presentation from 2013 with WebRTC tech lead, Justin Uberti

Chris Wilson’s SFHTML5 presentation—Introduction to WebRTC Apps

The 350-page book WebRTC: APIs and RTCWEB Protocols of the HTML5 Real-Time Web provides a lot of detail about data and signaling pathways, and includes a number of detailed network topology diagrams.

WebRTC and Signaling: What Two Years Has Taught Us—TokBox blog post about why leaving signaling out of the spec was a good idea

Ben Strong’s A Practical Guide to Building WebRTC Apps provides a lot of information about WebRTC topologies and infrastructure.

The WebRTC chapter in Ilya Grigorik’s High Performance Browser Networking goes deep into WebRTC architecture, use cases, and performance.

Reference
  1. https://codelabs.developers.google.com/codelabs/webrtc-web#0
  2. https://www.html5rocks.com/en/tutorials/webrtc/infrastructure/