Otalk & WebRTC on iOS

Sep 23rd, 2014 • Development • Nigel Brooke

This year, Steamclock collaborated with &yet to build an iOS version of their video chat service, Talky. We released the first version in June, and a 1.1 update a few weeks ago. Along with the release of 1.1, &yet released important portions of the app as open source, under the Otalk project. Otalk is a collection of open source components to simplify setting up real time messaging systems based on modern, open standards: WebRTC, XMPP and WebSockets.

We learned a lot along the way, and would like to pass that along so that people who are interested in creating their own iOS application can tie into WebRTC.

WebRTC on iOS

WebRTC is a standard for supporting real-time communication on the web, especially voice and video chat. The project originated at Google, is already implemented in Chrome and Firefox, and is winding its way towards standardization. If you’ve ever been filled with rage by needing to install some random plugin like GoToMeeting or WebEx to video chat with someone, then WebRTC is the future cure for what ails you.

WebRTC is definitely an important part of the future of internet audio and video chat, but here in the present, particularly on iOS, there are still issues. The biggest one is that Apple hasn’t committed to supporting WebRTC in Mobile Safari, although there are strong rumblings suggesting that it is coming. On the desktop, using Chrome or Firefox - or installing a plugin - is an option. But today on iOS, if you want to tie into a WebRTC based service, you need a native iOS app.

Luckily, the core WebRTC libraries that Google uses in Chrome build on iOS, and have recently added a nice Objective-C API that you can use to access them. Unfortunately, there are two big problems that anyone looking to experiment with it will encounter. One is specific to iOS, and the other is a more general problem with the WebRTC ecosystem.

Signaling

The most significant problem for someone interested in trying out WebRTC is that, by design, there is a big hole in the way that the standard is defined. WebRTC focuses on solving the problem “Given a set of clients that want to chat, how do I connect their media streams?”, but there is also another question that needs to be answered, namely “How do I signal which set of clients want to chat?”, and WebRTC is silent on that issue. These decisions don’t need to be baked into the standard, because that work doesn’t need to be built right into the browser. Running a signaling layer in ordinary JavaScript on a webpage is pretty straightforward, whereas direct access to the camera and video codecs needs the lower level APIs that we call WebRTC.

However, not having a standard for WebRTC signaling means a lot more work for someone looking to experiment. Most of our open source WebRTC iOS code is aimed at solving this problem. One the web side, &yet supplies a simple signaling server and the associated client side JavaScript code to quickly prototype WebRTC applications, so you don’t have to figure out signaling on day one. You can find the client side JavaScript form of that work on simplewebrtc.com, and the backend code for the signaling server is there as well if you want to switch to hosting yourself later. The bulk of the source code for iOS that we’ve released is focused on bringing this SimpleWebRTC experience to iOS, in the form of some straightforward Objective-C interfaces for communicating with the SimpleWebRTC signaling server. This library is the TLKSimpleWebRTC part of Otalk.

iOS libraries

The signaling problem isn’t the only issue you face experimenting with WebRTC on iOS. WebRTC is a big project, with a complicated build system, and is not very friendly to just quickly dropping into a small project. Getting an iOS WebRTC project running in minutes, rather than hours, requires a set of pre-compiled iOS libraries that you can drop in to your app. We have released some such libraries as otalk/webrtc-ios.

There are two main limitations with the prebuilt libraries to be aware of. One is that because actually building them is a bit of a pain, we don’t do it very often either. The version of the libs that are there now are still from the launch of Talky 1.0, and thus are a few months out of date compared to the latest WebRTC. We will update them in the future, and may eventually automate the process, but for now, they will lag behind the state of the art proportionally to how much inbound changes in the newer WebRTC versions matter to us.

The second limitation with these prebuilt libraries is that the WebRTC build system is not currently capable of building armv7s or arm64 binaries, so the libraries there are armv7 only. This is something that we don’t currently have the resources to fix ourselves, but there are some encouraging signs that folks at Google are aware of the issue and are making fixes. Maybe it’s been fixed recently, maybe it’ll start working some time in the future - we’ll let you know when we find out. For now, any apps built with this will be 32-bit only.

Usage

The last piece of the puzzle is how to actually use this stuff. We’ve put together a sample project to demonstrate the use of TLKSimpleWebRTC in an iOS app, at otalk/iOS-demo. Grab a copy and follow along.

Build Environment

We recommend pulling the open source code (TLKSimpleWebRTC, as well as TLKWebRTC - a small part of the project that is independent of the signaling server) and the iOS libraries via CocoaPods.

Here’s the Podfile that we use for that:

target "ios-demo" do

pod 'webrtc-ios', :git => 'https://github.com/otalk/webrtc-ios.git'
pod 'TLKWebRTC', :git => 'https://github.com/otalk/TLKWebRTC.git'
pod 'TLKSimpleWebRTC', :git => 'https://github.com/otalk/TLKSimpleWebRTC.git'

end

post_install do |installer_representation|
    installer_representation.project.targets.each do |target|
        target.build_configurations.each do |config|
            config.build_settings['ONLY_ACTIVE_ARCH'] = 'NO'
        end
    end
end

The post_install hook above requires a little more explanation. As mentioned, the libraries are currently armv7 only. However, Xcode tries to intelligently reduce what architectures it’s building for when you are debugging. So if you have an armv7s or arm64 device plugged in, by default it will ONLY build for those architectures when debugging. But for WebRTC, only the armv7 build is valid, so if you have a modern device plugged in, it won’t build the armv7 libraries that it needs, and nothing will link. The post install hook forces it to build all the architectures so that you will always have up-to-date armv7 libraries even if your dev device has a newer architecture.

Along these lines, you also need to go and set the same flag in you app project. In the project build settings, you should set “Build Active Architecture Only” to “No” for the Debug build (by default it is only set for release), and also set the “Architectures” to “armv7” only (default is all three).

Note that these projects are not currently in the main CocoaPods DB, you have to pull them directly from the Otalk github for now.

Connecting to the signaling server

Once you’ve got your build going, it’s only a few lines of code to start a chat by connecting to the signaling server. In iOS-demo, that code looks like this:

self.signaling = [[TLKSocketIOSignaling alloc] initAllowingVideo:YES];

First you allocate a TLKSocketIOSignaling object. This class is your main gateway for communicating with the signaling server. At init time, you can specify if you want to allow video chat or not. Video capture has significant overhead, so you definitely want to disable it if you are doing audio only.

self.signaling.delegate = self;

If the server wants to communicate with you, it needs a delegate, so you want to derive something from TLKSocketIOSignalingDelegate and set it as the delegate.

[self.signaling connectToServer:@"signaling.simplewebrtc.com" port:8888 secure:NO success:^{
    [self.signaling joinRoom:@"ios-demo" success:^{
        NSLog(@"join success");
    } failure:^{
        NSLog(@"join failure");
    }];
    NSLog(@"connect success");
} failure:^(NSError* error) {
    NSLog(@"connect failure");
}];

To join a chat, you need to both connect to a server and join a room. For demo purposes, we are using the SimpleWebRTC demo server on signaling.simplewebrtc.com. The room name can be anything you choose - it’s how you decide who you want to talk to. All the chat participants can share the room name through email, web, or whatever.

Once you’ve made these calls, the signaling server will figure out if other people are in the room, and connect the WebRTC video and audio streams to them.

There are a few additional calls on that you might want to be aware of. Signalmaster (the actual service running the signal server) does have a notion of rooms being locked with an additional password (so you can’t get in just by guessing the name), so there are calls related to locking and unlocking rooms, and joining the room with a password. Also important are the localAudioMuted and localVideoMuted properties, which can be used to mute your local audio and video streams, so they are temporarily not sent to other participants.

Showing the video

If you did nothing other than the above, you’d have a functional audio chat app, but if you want to do video chat, you’ve got one more thing to do, which is start rendering incoming video to UIViews in you app when other clients connect.

To do this, you need both a UIView that you are going to display the video in, as well as a RTCVideoRenderer object to manage the decoding of the video. RTCVideoRenderer is not part of TLKSimpleWebRTC, but rather part of the Objective-C interface for WebRTC itself. The header files for the public Objective-C interface to WebRTC are in the webrtc-ios project along with the compiled libs.

For our iOS demo code, we just keep copies of those in our view controller along with the signaling server.

	@interface ViewController () <TLKSocketIOSignalingDelegate>

	@property (strong, nonatomic) TLKSocketIOSignaling* signaling;
	@property (strong, nonatomic) UIView* renderView;
	@property (strong, nonatomic) RTCVideoRenderer* renderer;
	@end

To create a video view when a new client is added to the conversation, you need to implement the addedStream: selector on your TLKSocketIOSignalingDelegate. TLKSocketIOSignaling will call this any time a new client is added to the chat.

Here’s the code in iOS-demo that handles that:

-(void)addedStream:(TLKMediaStreamWrapper *)stream {
    if(!self.renderView) {
        self.renderView = [[UIView alloc] initWithFrame:CGRectMake(0, 0, 480, 640)];
        self.renderView.layer.transform = CATransform3DMakeScale(1, -1, 1);
    
        self.renderer = [[RTCVideoRenderer alloc] initWithView:self.renderView];
        [self.view addSubview:self.renderView];
    
        [(RTCVideoTrack*)stream.stream.videoTracks[0] addRenderer:self.renderer];
        [self.renderer start];
    }
}

Basically, the code above sets up a plain UIView to act as a container for the video (WebRTC will add some of its own views/layers to it), allocate a RTCVideoRenderer, and connect it to both the view and the video track inside the media stream you are given, and then call start on the renderer.

There’s also an equivalent removedStream call that will get called when a client removed it. This isn’t used in the demo app, but you will probably want to implement it in your app.

Gotchas and Hints

A couple of things that we encountered in the development of the Talky App to be aware of or consider if you are implementing your own WebRTC projects on iOS:

  • WebRTC’s main codecs are not hardware accelerated on iOS. This means both that there can be significant battery usage from long chat sessions, and that performance on older devices falls off dramatically. For Talky, we had to limit the iPhone 4 to audio-only chat, and limit the iPhone 4S, iPad 2, iPad Mini and 5th gen iPod Touch to just one video stream. Newer devices can handle group conversations with multiple video streams. You will want some user-friendly communication about these limits in your app.

  • There seems to be some instability in the UIView/RTCVideoRenderer/RTCVideoTrack relationship, either underlying bugs or subtleties we just aren’t handling right yet. When we tried reusing views and renderers as participants went in and out of chats, we saw lots of weird crashes. You should consider (as we have, for now) recreating the view and the video renderer for each participant added, rather than trying to reuse ones from a previous participant.

  • If you want to add a preview video for the local camera, you can do so by creating a render view the same as for a remote video stream, but using the ‘localMediaStream’ property of TLKSocketIOSignaling.

  • The Objective C API of WebRTC itself does not specify anything about what threads things happen on, and in practice lots of delegate calls and so on that you would prefer to get on the main thread seem to happen on background threads. In our code, we’ve tried to bridge everything across to the main thread, so you shouldn’t need to worry about it for basic use of TLKSocketIOSignaling but if you interact with the WebRTC api (any of the RTC* classes) directly or make modifications to any of our code, make sure you check what thread things are happening one.

  • RTCVideoRenderer uses OpenGL behind the scenes to do its rendering, which means that it can’t run when the app is backgrounded. Make sure you respond to entering the background by stopping all the renders, or the OS will kill your app.

Future Directions

As with WebRTC itself, a lot of this is preliminary work. These packages will continue to evolve as &yet’s services that use them evolve, and if you are serious about WebRTC, at some point you’ll probably need to do the same thing we’ve done and start re-implementing some of the signaling layer yourself - or use the Otalk signaling methods that &yet is working to document at the XMPP Standards Foundation.

In the meantime, hopefully this will provide a leg up to anyone interested in experimenting with iOS applications that integrate with WebRTC. Have fun, and good luck.

Nigel Brooke • CTO