With WebRTC [20], various applications can be built – not only limited to communication features. WebRTC is not (only/mainly) about “calling” from within the browser, but about enabling web developers to access to audio/video input devices via JavaScript as well as abstracting the problem of browser-to-browser communication for ordinary web developers.
Once the browser-to-browser communication problem has been solved, WebRTC provides both a user data channel for real-time communications data, but also a data channel to send any kind of other data in a peer-to-peer manner.
All of this does mostly not require plug-ins – but is natively supported in the browsers (currently Google Chrome, Mozilla Firefox, Opera, Microsoft Edge).
The simplest application for WebRTC is the audio/video communication between browsers. The inbuilt WebRTC capability provides microphone (audio) and camera (video) access (the user can select the device and grant permission).
The important API functions for this use case are
Before getUserMedia was available, browsers handled already “static” media objects (<img>, <video>, <audio>). These objects could be displayed, but also manipulated (e.g. an <img> tag can scale using width="400" attribute). The getUserMedia API adds access to dynamic sources such as microphones and cameras. The characteristics of these sources can change in response to application needs. MediaConstraints are used as standard way of restricting resources.
The PeerConnection is a media technology that allows two users to communicate directly, browser to browser. This communication is coordinated via a signaling channel which is provided by unspecified means, but generally by a script in the web page that has been provided by the web server. Many websites do already have the possibility to exchange messages between web client and server (e.g. via web sockets).
Sample services are:
The RTCDataChannel lets a web application send and receive generic application data peer-to-peer.
The DataChannel interface represents a bidirectional data channel between two peers. While the PeerConnection is a channel for RTC only, the DataChannel can transport any type of data.
A sample service is sharefest.me.
The getUserMedia API can not only access camera/microphone as media source, but also the shared screen. For security reasons, accessing the screen requires a plug-in. This plug-in is however not providing screen sharing as such (this is done by the WebRTC part of the browser), but only access to the browser API for certain domains that are explicitly permitted through the plug-in.
Most services that are used to communicate with audio/video also offer screen sharing.
Besides A/V communication and screen sharing, the applied data channel can also be used to transfer not only files, but also control information. This control information can be used to modify displayed browser content.
A sample application for this can be a collaborative whiteboard. By sending the inputs from one whiteboard (“editor”) to all other whiteboards under the same link (“viewers”) the browser application can act as shared whiteboard. Supported by WebRTC communication features such a website can be used e.g. within e-learning.
From the pure browser concept, WebRTC is conceived as peer-to-peer communication, without requiring additional infrastructure.
This architectural approach makes it difficult to realize sessions with multiple streams such as group video conferences or other "n-to-m" broadcasting scenarios.
This is the place, in which the conferencing building block comes into play. The conferencing building block cares about the distribution of media traffic to a group of peers. That distribution is possible in three different ways, which are primarily differing in their requirements of additional servers.
Let’s start with the peer to peer concept, which results in a fully meshed network approach.
The biggest advantage of this approach is its simplicity of being built by a developer, since it does not require any kind of distribution point in the center of the network (compare Figure 12).
On the other hand, this simplicity comes with a price of a very high demand in terms of network performance. The more participants attend a conference, the higher the required network performance becomes in general.
In contrast to the peer-to-peer approach, involving a selective forwarding unit or respectively a media control unit requires additional servers (see Figure 13). A selective forwarding unit acts similar to a router or proxy, which broadcasts each media stream that it receives from one peer to all the other peers.
On the one hand, this reduces the required network performance as result of uploading only one media stream per peer.
On the other hand, the required network performance is only shifted towards the selective forwarding unit.
In the case of a media control unit, the central unit receives all the media streams from the peers, similar to the selective forwarding unit. However, in a second step, the traffic is processed by the central unit, in order to build one individual stream for every peer. Finally, the media control unit transmits only the one individual stream to every single peer, which results in a great improvement in terms of the required network performance. Additionally, this approach also enables a broad variety of conceivable application cases by using different types of media processing on the central unit.