User InterACTION Aware Content Generation and Distribution for Next Generation Social TeleVision

The project goal is to develop technologies, which allow viewers at home to engage with pre-produced live action television shows relaxing the rigid and passive nature of present broadcasting ecosystems. It has two key aims:

  •  a group of users can take part in TV shows providing a sense of immersion into the show and seamless engagement with the content;
  • users are encouraged to use TV shows as a mean of social engagement as well as keeping them and their talents more visible across social circles.

Advanced digital media access and delivery platform that enables augmenting traditional audio-visual broadcastings with novel interactivity elements to encourage natural engagement with the content will be developed. Users will be able to take part in pre-recorded live-action shows, which will be made ‘responsive’ to users’ actions by ingeniously using a set of auxiliary streams. Through mixed-reality technologies, one or more users can appear in the show, which can be watched by a selected audience, which will often be users’ social peers. In this way, end users will have access to more engaging personalised content as well as socialise themselves with community members having common interests.

ACTION-TV supports a range of applications from an individual trying out a garment in a TV advert to a group of users interactively attending a TV talent show with the convenience of staying at home. Ways of utilising the proposed interactive television concept are endless and only limited by the imagination of inspiring content producers.

Project objectives:

  • User interaction and engagement aware content creation

Today, TV content is mainly produced under the assumption of a passive audience. The interaction between users and TV content is practically limited to phone calls, Internet chats, and live video feeds. To bring interactivity between viewers and TV content to the next level, ACTION-TV will develop novel content production technologies which will enable viewers to become actors, i.e. they will be able to become an active part of the broadcasted TV content by inserting themselves as a dynamic 3D object in the broadcasted scene, influence the plot of the content, and share the same media environment with a social group.

This objective considers the revision of conventional content production workflows for studio environments and the development of new modules which extend these workflows and enable the creation of content for personalised, interactive, engaging, and shared media experiences. This is achieved by investigating and specifying camera configurations, designing and managing spaces of interaction, acquiring and reconstructing scene geometry information, analysing video and geometry data, and extracting other metadata which is required for immersive rendering of home user objects. The metadata also provides guidelines for automating non-linear editing workflow and associated constraints to assist intermediatories for producing meaningful and non-disruptive yet relevant and satisfying outcome after responding to user interactions.

  • User model generation and real-time rendering

ACTION-TV will extract the 3D model of users in order to render them perspectively correct to the virtual scene. Hereby, the producer may define the setup, which adds further constraints to the rendering process. The overall goal is to adapt the users’ gaze direction and perspective according to a given shared virtual environment. A key issue here is eye contact between different virtual users.

FP7-ICT-2013-10 STREP proposal ACTION-TV Depending on the given scenario 3D models with different levels of accuracy are required. ACTION-TV will adapt different sensor technologies, which may be used in a scalable manner. Even though the basic approach will be based on a Kinect-like system, use of additional sensors will be investigated to improve the robustness and precision of the model. Based on a scalable approach the user can apply systems of increasing complexity, which give an increasing amount of system flexibility and virtualism. The 3D model information is furthermore required to determine correct scene rendering relative to the depth layering and positioning of other graphical content that will be inserted into the broadcast video stream as a function of the current viewer content.

  • User interaction capture and analysis

In order to enable natural interaction between remotely collaborating users and the broadcasted content, ACTION- TV will capture real-time user interactions at home. A set of available interaction tasks and appropriate modalities will be selected, given the context parameters such as content type, content-dependent script, network resources, etc. The captured interactions will be processed by a cloud-based software in order to concurrently analyse interactions by all peers involved. The processing will result in a unified AV stream that will be broadcasted to all users within the social group, responding to group’s joint behaviour and reacting to specific events (predetermined during the content production). The unified AV stream will be generated based on appropriate interaction events, which will determine appropriate auxiliary streams, user posture for rendering user models and other personalization functions in such a way to maximise the user QoE. Two distinctive domains of interaction will be addressed: user interaction with content and social interactions between the users. User interaction with content will focus on the multimodal processing of information captured by interaction sensors such as cameras and Kinect, distilling the interactions relevant to given content type and narrative. Underpinned by user interactions with content, social interactions between the peers will be critical to user engagement and collaboration. Issues such as task coupling and balance between interaction and interface in the social context will be addressed.

  • Real-time transmission over media clouds

Real-time rendering and transmission will pose significant challenges to the overall ACTION-TV system architecture in order to deliver the high-quality interactive media content to the collaborating users nearly at the same time as well as meeting their QoE expectations. ACTION-TV proposes a community-centric innovative media cloud system leveraging a fog computing architecture, in order to handle the computationally intensive tasks while at the same time enabling efficient and personalised delivery of the live interactive content. The architecture will exploit the geographical location of end users distributed globally and focus on the strategic placement of computing resources (local clouds, shown in the figure) at the edge of the network to address stringent delay requirements. The cloud and network resources will be jointly optimised by taking into account the dynamic nature of network conditions and extensive computation required for augmentation. In this respect, both the computation time elapsed in the cloud for creating an interactive and personalised media and the time required for delivery of this content to end users will be taken into account. Even though performing computationally intensive tasks at a selected single processing node is the ideal situation, it may be necessary to perform computational tasks concurrently and coordinatively at multiple local clouds in order to maintain the QoE for a geographically distributed group of users. In addition, ACTION-TV adopts a social communication overlay in which groups of collaborating users can be formed and share rendered content special to their group and communicate at the same time. The innovative techniques proposed by the ACTION-TV media cloud enable synchronisation among collaborating users for seamless interaction. Besides, QoE expectations of end users will be addressed by considering the network conditions and terminal capabilities of each user and hence an adaptive processing on the virtualized cloud resources will be implemented based on this model. By taking into account the QoE expectations of all users within social groups, optimum usage of cloud resources will be accomplished enabling personalised content consumption.

  • Build the ACTION-TV demonstrator

A complete system demonstration will be achieved by the integration of all system components, where at least four users will have AV communication amongst them while watching the media. At least two of the users will be active collaborators, who will have the opportunity to be a presence in the scene while others are passive collaborators, who will only be able to enjoy active collaborators’ performance on the TV