Personal Subscriptions     Group Subscriptions     Archives     Contact Us     Home     Advertising

ScienceWeek
Crossing Barriers Since 1997

    Receive ScienceWeek three times a week by Email: Subscriptions


About ScienceWeek

Archives

Contact Us

Subscriptions

 


ScienceWeek

COMPUTER SCIENCE: ON SPEECH PROCESSING

The following points are made by Lawrence Rabiner (Science 2003 301:1494):

1) In the next few decades, advances in communications will radically change the way we live and work. The concept of "going to work" will change from commuting to a particular place to get things done, to "getting things done" no matter where you are. Life at home will also change radically as communications between individuals become multimodal (using voice, visual, and tactile modes) and multimedia (with sharing of text, data, audio, images, video, and other forms of information). For example, you will be able to control virtually any device in the home -- such as the family home entertainment center -- by pointing to it with your finger and issuing voice commands such as "find me a good classical music station."

2) The driving force for these changes is the seamless integration of real-time communications (voice, audio, video, virtual reality) and data (text, images, files) into a single network that can be accessed anywhere, anytime, and by a wide range of devices. Speech and language processing plays a crucial role in this network by enabling enhanced services and providing seamless access to new services (1).

3) Speech coding has existed for more than 60 years, beginning with the classic work of Dudley on the "vocoder" (2). The original goal of speech coding was to provide a compression technology that would enable existing copper wires to handle the continual growth in voice traffic without having to continuously add new lines. Recently, the need for speech coding has grown because of the rapid growth in wireless systems and in the transmission of voice signals over data networks, where speech is just one (very important) data type.

4) The goal of speech coding (3) is to compress the speech signal -- that is, to reduce the bit rate necessary to accurately represent the speech signal -- without distorting it excessively. Two main techniques have been used in speech coding. Waveform coding tries to match waveform characteristics directly, whereas model-based coding tries to match spectral and source-excitation characteristics of speech.

5) Today, speech can be coded down to bit rates of about 8000 bps, with intelligibility and quality approaching that of telephone-bandwidth speech (which has a bit rate of about 64,000 bps). The challenge for the next few years is to lower the bit rate by a factor of 2 without seriously lowering the quality of the resulting speech. Achieving this goal requires improved signal processing for accurately representing the excitation source and the short-time spectrum properties of the time-varying speech signal.(4,5)

References (abridged):

1. R. V. Cox, C. A. Kamm, L. R. Rabiner, J. H. Schroeter, J. G. Wilpon, Proc. IEEE 88, 1314 (2000)

2. R. V. Cox, B. G. Haskell, Y. LeCun, B. Shahraray, L. R. Rabiner, Proc. IEEE 86, 755 (1998)

3. W. B. Kleijn, K. K. Paliwal, Eds., Speech Coding and Synthesis (Elsevier, Amsterdam, 1995)

4. A. Hunt, A. Black, Proc. ICASSP'96, 373 (1996)

5. M. Beutnagel, A. Conkie, J. Schroeter, Y. Stylianou, A. Syrdal, J. Acoust. Soc. Am. 105, 1030 (1999)

Science http://www.sciencemag.org

--------------------------------

SPEECH AND LANGUAGE PROCESSING FOR NEXT-MILLENNIUM COMMUNICATIONS SERVICES

The following points are made by R.V. Cox et al (IEEE Proc. 2000 88:1314):

1) The world of communication in the twentieth century was characterized by two major trends, namely person-to-person voice communication over the traditional telephone network and data communications over the evolving data networks, especially the Internet. In the new millennium, the world of telecommunications will be vastly different. The driving force will be the seamless integration of real-time communications (e.g., voice, video, music, etc.) and data into a single network, with ubiquitous access to that network anywhere, anytime, and by a wide range of devices. From a human perspective, the new network will increase the range of communication services to include expanded people-to-people communications (i.e., audio and video conferencing, distance learning, telecommuting, etc.) and people-to-machine interactions (i.e., messaging, search, help, commerce, entertainment services, etc.). These new services will meet the basic human needs for communication, entertainment, security, sense of community and belonging, and learning, and will increase productivity in numerous ways.

2) In order to understand the role of speech and language processing in the communications environment of the twenty-first century, we first have to look at how things will change as we build out the new network. There are five areas where there will be major changes to the communication paradigm as we know it today.

a) The network will evolve from a circuit-switched connection-oriented network with a 64-kb/s connection dedicated to every voice and dialed-up data call to a packet-switched connectionless network based on Internet protocol (IP).

b) Access to the network will evolve from narrow-band voice and data to broad-band multimedia integrating voice, image, video, text, handwriting, and all types of data in a seamless access infrastructure.

c) Devices connected to the network will evolve from standard telephones and PCs (personal computers) to a range of universal communication devices including wireless adjuncts, mobile adjuncts, appliances, cars, etc. The common characteristic of such devices is that they have IP addresses and can be networked together to communicate over the IP network.

d) Services on the network will evolve from simple dial-up voice and data services to a range of universal communication services including communication, messaging, find, help, sell, entertain, control, storage, and community services. These services will be synergistic with each other and with features of the network that enable them to seamlessly interoperate with all devices and methods of access to the network.

e) Operations will evolve from people-oriented processes (which are extremely expensive and highly inefficient) to machine-oriented processes, including natural language voice interactions with computerized agents, self-provisioning of services, web-based billing and accounting, web-based customer care, and automated testing and maintenance procedures. The new network provides a wide range of opportunities for speech and language processing to become a major component of the telecommunications environment of the new millennium. First of all, the need for speech and audio coding and compression remains high, even as bandwidth increases dramatically to the home, to the office, and in wireless environments. This need remains because the new network offers the opportunity for high-speed streaming of voice, CD-quality audio, and HDTV-quality video, and each of these technologies imposes tight constraints on network performance to maintain high quality with low delay. Coding and compression enable networks to provide high levels of quality at low delays without requiring excessive amounts of network resources.(1-5)

References (abridged):

1. J. C. Ramming, “PML: A language interface to networked voice response units,” in Proc. Workshop on Internet Programming Languages, Chicago, IL, May 1998

2. H. Dudley, “The vocoder,” Bell Lab. Rec., vol. 18, pp. 122–126, 1939

3. D. W. Petr, “32 kb/s ADPCM-DLQ coding for network applications,” in Proc. IEEE GLOBECOM, Dec. 1982, pp. A8.3-1–A8.3-5

4. P. Kroon and W. B. Kleijn, “Linear-prediction based analysis-by-synthesis coding,” in Speech Coding and Synthesis,W. B. Kleijn and K. K. Paliwal, Eds. Amsterdam, The Netherlands: Elsevier, 1995, pp. 79–119

5. R. E. Crochiere and J. M. Tribolet, “Frequency domain coding of speech,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-27, pp. 512–530, 1979

IEEE http://www.ieee.org

ScienceWeek http://scienceweek.com

Copyright © 2004 ScienceWeek
All Rights Reserved
US Library of Congress ISSN 1529-1472