Tag: Forschung
Considerations about Mobile Video Telephony (Prt. II)
von Christoph Köpernick am Sep.06, 2009, in Allgemeines, Grundlagen, In English
In desktop video conferencing, the video conferencing application is normally bundled to an Instant Messaging software that includes text chat capabilities. Users can appoint or prearrange a video conference using textual chat. In contrast, the current evolution of video telephony in UMTS (Universal Mobile Telecommunications System) networks based on the circuit-switched 3G-324M service does not seamlessly combine video conferencing with other communication channels. The notion of video telephony in the mobile environment is nearer to standard voice calling than in the stationary world. Therefore, it is more likely that somebody will place a video call without prior announcement. This leads to privacy and inconvenience concerns. The callee might not want to be seen during a conversation for a variety reasons: A video call “turns you ugly” (Harlow) because the build-in cameras are usually not placed just above the user’s line of sight but in the suboptimal position below the nose. Further, the video quality is meagre, and lightning conditions are poor. People might feel that exposing their face over a video call invades their privacy and, most times, callees do not want callers to see how he or she looks. Furthermore, the use of video telephony can depend on social factors. Societies in South East Asian countries—for example, Malaysia—are considered non-confrontational. This can be seen when people make decisions on which channel they use for communication. The author’s experiences in South East Asia revealed that most people prefer non-confrontational communication such as SMS (Short Message Service), instant messaging or e-mail, even in the business environment or with good friends. Voice calling is avoided as much as possible for a first or unexpected contact. It is obvious that P2P (Peer-to-peer) video calling is considered even more intrusive—and therefore unlikely to succeed in these societies.
According to an informal research of Sachendra Yadav (Yadav), opinion leaders and technology experts feel that video calling does not add much to a conversation compared to voice calling. In comparison to desktop video conferencing, which is mostly free nowadays, the cost-benefit analysis leads to resistance for using mobile video telephony.
For many reasons, 3G video telephony as a person-to-person conversational service is not as successful as projected. The existing technical foundation for video calling can be used to deliver IVVR (Interactive Voice & Video Response) services. A wide range of IVVR applications is imaginable, and some service providers and network operators already deploy them. Furthermore, special IVVR applications such as P2P Video Avatar can even compensate the drawbacks of classic P2P video telephony, making P2P-alike video telephony successful after all.

Considerations about Mobile Video Telephony (Prt. I)
von Christoph Köpernick am Sep.06, 2009, in Allgemeines, Grundlagen, In English
Even with a great deal of marketing, early attempts to convert users to the video telephony technology flopped (Jones and Marsden). In contrast, desktop video conferencing is incredibly popular for private person-to-person conversations and widely used for video conferencing in business environments such as telepresence for computer-supported cooperative work (CSCW).
In desktop video conferencing scenarios, typically a stationary computer is used. Camera and microphone are fixed and usually maintain the same distance from the person participating during the conversation. Moreover, lighting conditions are generally better than “on-the-go”, as a desktop is easier to illuminate correctly than a scene in the mobile environment. When performing mobile video telephony, lighting conditions change over time when the caller moves or the environment changes; moreover, the camera is usually not fixed. During mobile video telephony, the caller is likely to hold the handset in front of his face by extending his arm, making the video wiggly. In combination with the meagre bandwidth and low-resolution video, this can considerably degrade the video quality shown on the callee’s side. These considerations about the video quality problems in the mobile environment also play a major role in IVVR (Interactive Voice & Video Response) applications that take advantage of the instant video streaming capabilities that 3G-324M video telephony offers. Bad video quality negatively influences camera-based games, gesture recognition, or P2P (Peer-to-peer) services that intentionally change the video for dynamic video overlays such as for the P2P Avatar, because motion analysis algorithms perform better with a sharp and clear video signal.

Side-Effects of IVVR Quality on the User Experience
von Christoph Köpernick am Sep.03, 2009, in Grundlagen, In English
Media compression, error concealment measures, and the characteristics of wireless networks have side effects on the quality of 3G video telephony and IVVR (Interactive Voice & Video Response) applications.
3G-324M requires only the use of speech codecs. In contrast to audio codecs, speech codecs are designed for speech transmission within a narrow frequency range, making them inappropriate for transmission of music or a range of artificial sounds. This fact needs to be considered when designing IVVR applications—especially games, as most games utilize music and sound effects to create an immersive atmosphere.
H.263 and MPEG-4 Part II baseline were designed for images of natural scenes with predominately low-frequency components, meaning that the colour values of spatial and temporally adjacent pixels vary smoothly except in regions with sharp edges. In addition, human eyes can tolerate more distortion of high-frequency components than of the low-frequency components (Kwon and Driessen). In reference to the explanation of Kwon and Driessen, video codecs used for 3G-324M video telephony are great for natural scenes and talking-head scenarios. Depending on the type of IVVR application, these characteristics work against a good user experience.
Typical desktop or web applications have a monochromatic user-interface with boxes, buttons, and fonts that are clearly readable. Based on user interaction, the user interface can change its appearance frequently, perhaps only for some parts of the user interface or perhaps the whole screen. It is obvious that codecs used for 3G-324M video telephony are unsuitable for this kind of video transmission. Compressing such user interfaces with H.263 creates blurred fonts and tattered buttons and lines, leading to a user interfaces too distorted for a good user experience. The comparable high round-trip delays can make interaction tedious, with interfaces that require a high rate of user interaction and screen changes.
Depending on the type of game, the compression characteristics of video codecs used in 3G-324M can be advantageous. Contemporary 3D games such as first-person shooters or simulation games try to model the game environment as realistic as possible, creating natural-looking scenes and making them appropriate for compression using video codecs defined for 3G-324M.
However, more problematic are the delay requirements for mobile games that are essential for a good gameplay experience. 3GPP (3rd Generation Partnership Project) defines a delay variation of below 75ms for real-time games and considers first-person shooters the most demanding ones with respect to delay requirements (3GPP). Other types of games, such as turn-based strategy games or visual novels, may tolerate a higher end-to-end delay and may require lower data rates.
Multimedia Codecs, Compression, and Streaming in 3G-324M
von Christoph Köpernick am Jul.02, 2009, in Allgemeines, Grundlagen, In English
3G video telephony generally operates over a single 64 kbit/s connection where both parties need to share the available bandwidth. Effectively, the application then is left with 60 kbit/s, or less that are dedicated for both media types, since H.245 call control messages reduce the gross bandwidth. In 3G-324M systems, the bandwidth is allocated dynamically; however, generally said, every party has 50% of the bandwidth available for sending audio and video signals. In a typical unidirectional scenario, 12.2 kbit/s are allocated for the speech codec, and a bitrate of 43-48 kbit/s is allowed for the video data (Sang-Bong, Tae-Jung and Jae-Won).
By employing rate control methods in the media encoders, the network can dynamically change these bitrates depending on network conditions and application demand. When two parties communicate simultaneously, the bitrates for the speech and video codec can be reduced in the encoders of both parties, keeping the overall bitrate below 64 kbit/s. For instance, when just one party shows speech activity, the speech bitrate for the other party can be reduced to a minimum where only comfort noise is generated on the receiver side (Holma and Toskala); AMR (Adaptive Multi-Rate) can perform these bitrate changes every 20ms. For video, the encoder can reduce the average bitrate by either reducing the frame rate or simply dropping frames during transmission. To increase the overall frame rate on the receiver side, the decoder can employ H.263 temporal scalability.
In 3G video telephony, the audio and video signals are bidirectionally streamed over dedicated circuit-switched W-CDMA (Wideband Code Division Multiple Access) paths. Streaming describes media is continuously being received or sent and played back on a terminal. Non-conversational one-way audio or video streaming requires a transport delay variation of below 2s (3GPP (3rd Generation Partnership Project)). In contrast, two-way video telephony introduces even higher real-time requirements with an end-to-end, one-way delay of below 150-400ms (3GPP) to maintain a smooth conversation. The overall one-way delay in W-CDMA networks is already approximately 100ms, and it should be noted that in addition to the transmission time, media generation time is required when delivering IVVR (Interactive Voice & Video Response) services. Due to these tight delay requirements, there is no time for retransmission when transmission errors are detected. Retransmission would reduce bit errors and consequently improve video quality, but it would also add undesired delays when resending PDUs. Therefore, to avoid retransmission, H.223 and the media codecs are working hand-in-hand to detect errors, accomplish resynchronisation, and perform error concealment.
Characteristics of Wireless Networks
von Christoph Köpernick am Apr.30, 2009, in Grundlagen, In English
In the following I describe how the characteristics of wireless networks affect the audiovisual quality of 3G video telephony and especially IVVR (Interactive Voice & Video Response) applications. The impact of these characteristics need to be considered when designing IVVR applications and services:
Wireless networks are inherently error prone. Bitrates in wireless systems tend to fluctuate more as compared with wired networks. In wired networks, phenomena such as fading, shadowing, or reflection are non-existent so that, for the most part, the same bandwidth and much higher bandwidths are present during transmission. Influences on signal propagation cause the constant changing bandwidths in wireless systems. Generally, the receiving power depends on the distance between sender and receiver. The receiving power p decreases proportionally to the square of the distance between sender and receiver:
p=1/d2
where d is the distance between sender and receiver (Schiller).
Receiving power is influenced further by frequency dependent fading, shadowing, reflection at large obstacles, refraction depending on the density of the medium, scattering at small objects, and diffraction at edges.
The effect of multipath propagation can cause jitter when the radio signal reaches the receiver by two or more paths at different times. Moreover, the mobility of the user adds another set of problems that results in fading of received power over time; the channel characteristics change over time and location. This exacerbates the effect of multipath propagation because signal path change will be increased as the user changes his or her location. Changes in the distance between sender and receiver cause different delay variations of different signal parts.
The phenomenon of “cell-breathing” is a special problem in CDM systems. In CDM systems, all terminals use the same frequency spectrum. Therefore, the more information that terminals are sending and receiving in a cell, the more noise that is produced. A higher noise level means that the noise level for far terminals will increase to the point that reception is impossible; ergo, the cell shrinks.
The UMTS (Universal Mobile Telecommunications System) or W-CDMA (Wideband Code Division Multiple Access) network counters but not eliminates these effects by implementing error detection, error correction, and error concealment measures. For example, in W-CDMA, cell-breathing is effectively prevented by implementing the wideband power-based load estimation to keep the cell coverage within the planned limits (Holma and Toskala). Nonetheless, these phenomena can still affect the audiovisual quality of 3G video calls and IVVR applications such as high delays, bit errors, or varying bitrates.




