Advanced technology in speech-based interfaces

Abstract

Speech-based interfaces are not new to computing, they have been relatively underused as an efficient and effective method of human and computer interaction. The technology has been of great interest over the past few years, although there are still significant improvements and possibilities for the future. This paper investigates current usages and standards of the technology and what contributions are being made. The paper also identifies some possible future uses of Speech-based interfaces, and possible future benefits of this technology, when compared to current methods and certain types of users.

Speech-based interfaces are not new to computing, they have been relatively underused as an efficient and effective method of human and computer interaction. A background to the technology is included and it is described how the need for natural language and speech interfaces increased, and there became a need for standardisation, and the standard VoiceXML was released. From this standard other technologies were born, including a combination of XHTML and VoiceXML to develop Internet applications with a speech-based interface. These technologies combined with web and car technologies have provided an opportunity for voice controller motor vehicle control in the near future. While this technology has been designed to help the average person be more efficient, with some small changes there can be benefits to be gained from elderly users and disabled users as well. With every new technology there exist problems which will be discussed as well, and this will lead to a conclusion summarising points and justifying the benefits.

Natural language interfaces are an important part of Human Computer Interaction, as the number of telephones in the world still outnumbers of computers and therefore natural language is more widely used than a mouse or keyboard. To smooth the progress of exchanges between humans and machines the World Wide Web Consortium (W3C) has published a recommendation for vocal interaction language based on XML, which allows interactions on many interfaces including Internet applications by using XHTML combined with VoiceXML. Because VoiceXML uses the HTTP protocol to communicate it is possible for a VoiceXML telephone gateway to communicate with a web server, in this type of environment the web server is providing a response to a user on a telephone and bridging the gap between phone and Internet. This is supported by the World Wide Web Consortium (2010):

The telephone was invented more than 150 years ago, and continues to be a very important means for us to communicate with each other. The Web by comparison is very recent, but has rapidly become a competing communications channel. The convergence of telecommunications and the Web is now bringing the benefits of Web technology to the telephone, enabling Web developers to create applications that can be accessed via any telephone, and allowing people to interact with these applications via speech and telephone keypads (p. 1).

VoiceXML is becoming a standard for Human-Computer audio, with speech synthesis and recognition of spoken input. This technology brings the ability to have a natural conversation as an Internet and content manipulation interface. An automated phone system with VoiceXML also has the ability to understand or translate multiple languages. The popularity is increasing as major companies such as IBM, HP and Motorola are now supporting and using VoiceXML. A major goal is to “bring the advantages of web-based development and content delivery to interactive voice response applications” (Rouillard, 2007, p. 27).

XHTML + Voice (X+V) are a technology for describing visual and audio web pages, visual interaction is described by XHTML and auditory interaction is described by VoiceXML. Enabling users to have a HTML display of a website, with the ability to navigate and use the site by voice or by traditional methods of input. Until recently XHTML and VoiceXML (X+V) functionality had not been implemented by major Internet browser companies, instead it had been used by small companies with government grants and been talked about as a possible future technology. Currently the Opera web browser offers native support for XHTML and VoiceXML, it will also attempt voice interaction with standard XHTML pages. While Internet Explorer and Firefox still do not have native support for XHTML and VoiceXML, although third party extensions and add-ons have been created. Opera Software ASA say, “any ordinary browser command can be done by voice, such as navigating to, and following the next link in a document, going to the next slide in an Opera Show presentation, or logging on to a password protected Website” (p. 1). XHTML and VoiceXML offer an increased opportunity with Opera web browser now being installed in Ford vehicles, for a speech-based interface to enable eye-free and hands-free computer interaction while driving. This technology could potentially control dash-panel and computer systems via speech-based interfaces, enable users functionality from changing the temperature of the heater to sending emails by voice while driving a car. Opera Software ASA say, “This solution will allow Ford truck and van owners to maintain a virtual work environment with access to all of the important files, information and applications they need on a daily basis” (p. 1).

Read also  StarTeam System Development

Because XML is a dynamic and universal language overseen by the W3C, it means that XML based technologies such as VoiceXML are not limited to Internet applications. The same piece of XML can be used for various applications and imported into other applications if they support it, and there is no reason why VoiceXML cannot be the same in the future as well. Mobile phones for some time have had the ability to read text messages and email messages aloud to the user, which could be beneficial for visually impaired persons and persons operating a vehicle. “Text-to-speech software reads the text on the screen aloud in a natural sounding voice, giving you convenient access to phone menus and functions, short messages, e-mail messages” (Nokia, n.d., p. 1). Using VoiceXML based technology it is entirely possible for a user to read a text message aloud to the mobile phone, the phone translate this to textual content and sends it via the SMS service. This may sound silly at first, due to the technology to be able to call someone and say it verbally without a computer translating the words into text for you. Although this would give businesses a greater ability to stay in contact while on the move, as text messaging is used extensively in business and preferred in some cases depending on the message being sent. This could also provide a solution to a major problem with cellular phones, which is texting while driving. In principle a technology that allows a user to drive and sent text messages safely while talking to their cell phone will save lives and make lives easier. Talking to a passenger or singing to the radio has not been noted as a significant cause of crashes, which are very similar functions to verbalising a text message. “Government officials aren’t the only ones getting on the texting ban-wagon. TV talk show host Oprah Winfrey has launched a national television and Internet campaign to encourage people to commit to putting their cell phones away while driving” (Hattiesburg American, 2010, p. 1). As technology has progressed, people have continuously sought after smaller and smaller devices with greater detail and speed. Technology has reached the point where the input devices themselves are holding back the device from becoming any smaller. “Voice interaction can escape the physical limitations on keypads and displays as mobile devices become ever smaller” (World Wide Web Consortium, 2010, p. 5).

With a global aging population it is important that we enable and help elderly people to function and live as independently as technology will allow. Elderly people may be able to benefit by the advancement of speech-based technologies, but to first understand how they could benefit, it is important to understand their characteristics. “The human interfaces to most computer systems for general use have been designed, either deliberately or by default, for a ‘typical’, younger user” (Gregor, P., Newell, A. F., 2001, p. 1). Elderly people can be crudely generalised into three groups: fit older people, frail older people and older people with long term disabilities. Fit older people can be described as those who appear or do not consider themselves disabled. Frail older people who would be considered as disabled and have one or more difficulties, including at least one that impairs their functionality in some way. The elderly who have had a long-term disability throughout their life that has affected the aging process and their ability to function is dependent on declining functions. Other aspects to keep into consideration are the variability in physical, sensory and cognitive abilities with the elderly, as one size does not fit all in this situation. Another aspect is the variations in ability to operate a computer system due to disabilities, impairments and learning capabilities. Gregor and Newell (2001) conclude:

Read also  Teleworking - what it is?

In general, as people grow older their abilities change. This process of change includes a decline over time in the cognitive, physical and sensory functions, and each of these will decline at different rates relative to one another for each individual. This pattern of capabilities varies widely between individuals, and as people grow older, this variability increases. In addition, any given individual’s capabilities vary in the short term due, for example, to temporary decrease in, or loss of, function due to a variety of causes including illness, blood sugar levels and state of arousal” (p. 2).

Interfaces for older people need to have a greater diversity of functionality when compared to a younger group, to meet the greater needs. By providing a speech based interface as an option for operating a computer, it is dependent on a function that most people have used their entire lives and is reliant on a function that is not considered to dramatically decrease with age. This can also enable them to use a computer system with a telephone as described previously with VoiceXML capabilities, for those who are intimidated by technology and the thought of using a computer. Finally the interface designed needs to use general terms over technical terms, for example moving to the main section rather than clicking on the home link.

Most systems and interfaces are designed for typical healthy or high functioning users, when compared with users with disabilities that can have difficulties using a standard keyboard or mouse. It is important with the growth of the Internet and technology that disabled users are not left out, and that they are able to access these resources if they choose, or if it could benefit their lives. There may be situations where a computer application could benefit the life of somebody with a handicap, but they cannot use a computer due to motor-function restrictions. This demonstrates the need for hands-free or eye-free computer access and includes two main groups, visually impaired users and motor-handicapped. “The Web Accessibility Initiative (WAI) works with organizations around the world to develop strategies, guidelines, and resources to help make the Web accessible to people with disabilities” (Web Accessibility Initiative, 2009, p. 1). Many applications and web browsers are developed to assist people with disabilities, although many of them have been quietly withdrawn leaving broken links or on the occasion that the system is still available for download it may have been abandoned and not maintained anymore. An important aspect of developing voice applications for handicapped users is that they may want to use voice control in combination with other interfaces such as a joystick or other aid devices. The aim of speech systems is generally naturalness and to copy conversations that we have had our entire lives, but in the case of users with disabilities it may be more beneficial to aim for learn-ability over naturalness. For example instead of saying ‘activate microphone’ or something technical to activate the microphone, “saying ‘Wake Up’: un-mutes the microphone and turns on the light in left side” (Brondsted & Aaskoven, 2005, p.4). Technology is currently heading toward eye-free and hands-free access of systems, for purposes such as accessing a computer while driving a car or making us more productive. The same base technology is required to support speech based services for disabled users, but the difference of needs when interacting are very different. We generally would prefer to speak to a computer in a turn based communication like we have when we are talking to other human beings, although as an aid for using systems or interface for disabled users it would be more beneficial to use command driven voice systems using non-technical terms. While still using human to human terms, such as wake-up and sleep which even severely mentally disabled users would understand. There are people with mental disabilities so severe that they are unable to understand wake-up or sleep, but they are highly unlikely to have any need for a computer, as they are more concerned with surviving day to day.

Read also  Event Driven Programming

The VoiceXML standard has ensured a guideline for developing voice applications, but there are currently no standards for the development environments or interfaces. This means that the layout and functionality from development environments will be completely different, and the code generated by the development environments will not necessarily be compatible, as the two different development environments will generate completely different tags and formats. Building spoken applications from scratch can take a long period of time, and several different frameworks and technologies. As VoiceXML works with predetermined grammar, which can be troublesome in the development of some applications. But by combining the VoiceXML platform with independent systems for voice recognition, it is possible to increase its capacities of understanding. VoiceXML is great step toward speech and voice based interfaces, but it has a lot of work to become a complete framework for developing speech applications. “Accordingly, a great deal of emphasis has been placed on the development of toolkits and environments that hide some of this complexity and allow developers to rapidly prototype and deploy speech-based applications.” (Bennett & Llitjod & Shriver & Rudnicky & Black, 2002, p. 1).

Natural speech-based interfaces can provide a known and familiar interface for interacting with computer systems, because we spend our lives conversing with other people and communicating over the telephone. Current technology makes it possible to interact with a website or computer application via a telephone and it is possible to translate the language spoken for the system, and translate a response back to the user. The ability to use a generic markup language like VoiceXML with applications such as XHTML is a leap forward in creating an Internet that can be accessible via speech-based interfaces. This enables future technology such as voice controlled functions of a motor vehicle and improved cell phone speech interface. One of the most significant impacts of this technology is the ability for elderly people to use a function is not known for degeneration as a computing interface. This will also enable users who are new to computers but familiar with telephones to use a computer more easily. Many disabled people struggle to maintain their independence, with motor function restrictions that prevents them from using a computer effectively. With the ability for disabled people to manipulate programs and browse the Internet with a speech interface, it could help them maintain their freedom and independence. As with all new technologies, there are severe problems that a solution must be found for before this technology can take off; this includes a standard for a complete framework rather than just a markup language providing grammar and large vocabulary support. It is concluded that speech-based interfaces currently, and will continue to, provide benefits in the advancement of the technology, providing that the right people get access to this technology and not just the average user who is happy to type.

References

  • Bennett, C., & Llitjod, A. F., & Shriver, S., & Rudnicky, A., & Black, A.W. (2002). Building voicexml-based applications. Paper presented at the7th International Conference on Spoken Language Processing September 2002, Denver, Colorado, United States of America. Retrieved February 19, 2010, from http://www.cs.cmu.edu/~awb/papers/ICSLP2002/voicexml.pdf
  • Brondsted, T., Aaskoven, E. (2005). Voice-controlled internet browsing for motor-handicapped users. Design and Implementation Issues, Interspeech 2005. doi:10.1.1.65.3974
  • Gregor, P., Newell, A. F. (2001). Designing for Dynamic Diversity – Making accessible interfaces for older people. In J. Jorge., R. Heller., & R. Guedj (Eds.). Proceedings of 2001 EC/NSF Workshop on Universal Accessibility of Ubiquitous Computing: Providing for the Elderly: 22-25 May 2001, Alcacer do Sal, Portugal. Dunhee: University of Dunhee.
  • Hattiesburg American. (2010). Texting while driving deadly at any age. Retrieved March 1, from 2010 from http://www.hattiesburgamerican.com/article/20100221/OPINION01/2210304/Texting-while-driving-deadly-at-any-age
  • Opera Software ASA. (2010). Opera Tutorials. Retrieved March 1, 2010 from http://www.opera.com/browser/tutorials/voice/using/
  • Opera Software ASA. (2009). Opera brings full web browsing to new ford trucks and vans. Retrieved March 3, 2010 from http://www.opera.com/press/releases/2009/04/02_2/
  • Nokia. (n.d.). Nokia accessibility: Text to speech. Retrieved March 1, 2010 from http://www.nokiaaccessibility.com/tts.html
  • Rouillard, J. (2007) Web services and speech-based applications around voicexml. Journal of Networks, 2(1), 27-35.
  • Web Accessibility Initiative. (2009). About WAI. Retrieved March 1, 2010 from http://www.w3.org/WAI/about-links.html
  • World Wide Web Consortium. (2010). W3C voice browser working group. Retrieved March 1, 2010 from http://www.w3.org/Voice/
Order Now

Order Now

Type of Paper
Subject
Deadline
Number of Pages
(275 words)