Today, at the 8th edition of Foritaal in Genoa, I presented a talk on the problem of accessibility to mobile technology for older adults, and how we addressed it in ECOMODE, a European project funded under Horizon 2020, using multimodal interaction with mid-hair one-hand gestures and voice commands.
Increasing older adults’ accessibility to mobile technology: The ECOMODE camera
We all know that life expectancy is increasing, and that the ageing population is already a consistent part of the overall population. Europe has the highest proportion of population in the world aged 65 or more (18.9%), with 10 million of seniors in Spain, 12 in France, and 13 in Italy, and projections show that these numbers are going to rise in the next years.
Mobile technology can enrich the lives of older adults and support their wellbeing in many ways, for instance by helping them to achieve an active lifestyle and healthy living, and by fostering social inclusion (helping older people to expand their social networks and keeping in touch with their peers and family members). Despite this, older adults lag in technology adoption. The reason resides in a complexity of factors, such as skeptical attitudes toward technology, low expertise, cognitive and physical challenges (i.e. reduced visual capability, which makes it difficult to recognize fine details of icons or pointers; tremors, joint pain, arthritis, which hampers fine movements).
Davis (1989), showed that technology acceptance depends on a number of interwined aspects. First of all, technology has to be useful to people. Of course, perceived usefulness is a very personal matter related to individual needs and desires, and is the result of a tradeoff between the perceived benefits and the costs (i.e. frustrations) of learning a new technology. The second aspect is that technology has to be easy to use: ease of use also depends on individual characteristics (e.g., previous experience with computers, technical self-confidence), but what we, as designers and developers, can work on is the technology itself. For a technology to be easy to use, it needs to work properly: that is, be fast, responsive, efficient, accurate. And it also needs to provide a good user experience: the interaction has to be intuitive, natural, and make the technology usable, accessible, inclusive, offering a compelling UX.
Multimodal interaction combines two or more interaction modalities such as such as speech, touch, vision, and gesture. In our case, mid-hair one-hand gestures and voice interaction. Research with older adults showed that it can provide a powerful and compelling interactive experience. For instance, Carreira (2016) compared the performance of two groups of seniors: a physically fit group and a conditioned group (with some kind of impairments, such as rheumatism, tendinitis, osteoarthritis, Parkinson’s disease, leg and back pain). Although the physically fit group was faster in the interaction, there were no significant differences in the number of errors, showing that mild physical limitations do not compromize the ability to use mid-air gestures. Moreover, nearly all participants completed the tasks with at least one gesture and without any major difficulties, and overall enjoyed the interaction. Moreover, multimodal interaction is also flexible (because the user can choose a preferred interaction modality), robust, easy and natural, because we use gestures and speech in everyday communication.
So, ECOMODE focused on both the technical accuracy of the automatic recognition of mid-air gestures and voice commands, and the user-centered design of multimodal interaction. With regard to the technical accuracy, while traditional computational algorythms process the visual information frame-by-frame, resulting in significant latencies and in high energy consumption, ECOMODE employs a novel method, based on the EDC (Event-Driven Compressive) paradigm. This paradigm is event-based and biologically inspired, because it resembles the functioning of the human brain: retina cells activate to external stimuli sending signals (events) to the cortical structures trough the optical nerve. There, each neuron is autonomous in deciding when to send out an event, based on the spatio-temporal information received. So, there are no representation of static images anywhere in the brain, but a continuous flow of information. Applying the same concepts to the automatic recognition of mid-hair gestures and voice interaction, we obtain a high temporal resolution combined with continuous-time operation, which results in an extremely fast, efficient, and precise computation. This method analyses visual information of hand and finger gestures and voice commands, combining the auditory input with visual clues from lip and chin motion to gain robustness and background noise immunity.
But we also designed the interaction around the users. For the UCD of multimodal interaction, my colleagues and I studied our users from different perspectives: at first, we carried out a literature review on ICT and older adults, on seniors’ attitudes toward technology, on age-related functional decline, on the ergonomics and biomechanics of the hand. Then, we focused on the people using personas, which are archetypes of actual users, fictional people who have names, age, gender, ethnicity, preferences, occupations, families, friends, a certain income. They also have an education, a socioeconomic status, life stories, goals and tasks. Personas are usually employed to improve the design and to guide the decision about the interaction. We also interviewed experts working closely with older adults, and we started to define our users: a heterogeneous group in need of social contact (one of the main reasons for them to approach aggregation centers is to meet with others and overcome loneliness), mostly active (both physically and socially, but with some exeptions), and interested in learning new technology.
We also carried out different studies with older adults, both to develop a set of gestures and voice commands and to evaluate the interaction. In one of these studies, we used the Wizard of Oz technique (WoZ), in which the participant operates an apparently fully functioning system, whose missing functions are supplemented by a human operator, called “wizard”. WoZ simulates functions that are not yet implemented in the system and has been used in the assessment of multimodal applications and in the development of gestures. The procedure followed four stages:
- Intro on touch-based interaction: a tutorial on the interface and the touch commands for using the camera application;
- Intro on multimodal interaction: participants were shown mid-air gestures and vocal commands, with videos displayed on the tablet screen;
- Task: participants used multimodal interaction for taking pictures with the tablet. In this part, the WoZ setup was used to operate the tablet device (using a software application for controlling remotely the tablet PC).
- Semi-structured interview and questionnaire to explore the overall interaction.
Results showed tha, in general, our participants were satisfied with the interaction (M=4.40; SD=0.48), they did not feel that holding the tablet was an obstacle (M=4.10; SD=0.99), and making gestures using the tablet was considered quite comfortable (“How would you rate the comfort of holding the tablet on one hand and making gestures with the other hand?”, M=3.75, SD= 1.03). We noted that the participants performed the mid-air gestures at different distances from the tablet device but most of them performed the gestures very close (6-15 cm). They also changed some mid-air gestures to make them more comfortable. With regard to vocal commands, they made an extensive use of synonyms, and they expressed concerns regarding the use of voice interaction in public (raising privacy issues and fear of disturbing others).
All these elements and interactions with the users allowed us to define initial guidelines for the design of multimodal interaction. In the presentation, I summarized them into three categories, naming only few examples for each one:
- Hardware: it needs to be light and portable for seniors to hold it with one hand, even if interactions are very quick.
- Mid-air gestures:
- mechanics of gestures: the wave-up gesture for the vertical scrolling of the page should be performed with the palm facing down. At first, we had designed it with the palm facing up, not considering that this movement requires an excessive rotation of the wrist that can exercise an increased pression on the carpal tunnel. But, fortunately, frequent iterations with the real users allowed us to correct the initial mistakes 🙂
- Other characteristics of the gestures: we recommended the use of metaphorical gestures (i.e. “turning the pages” for horizontal scrolling), also preferring macro-gestures (with the entire hand, such as closing the hand in a fist) over micro-gestures because our participants found them more comfortable. Micro-gestures (like the “click” gesture with the forefinger) were performed with a high variability by older aduts, and therefore all the possible alternatives of micro-gestures should be included in the dictionary in order to allow the recognition algorythm to be flexible enough.
Moreover, our users did not like the «unblock gesture» (the gesture that is sometimes required to unblock the system and turn it on in the listening mode, to avoid false positives): therefore, if needed, we recommended the use of a very simple gesture or an alternative voice command.
Finally, the short distance from the screen (10-15cm) at which gestures were performed, and the small amplitude of gestures influenced the choice of the optics of the camera.
- Voice commands: we recommended to include a wide set of synonyms to be recognized by the system, or the possibility for the user to personalize his/her preferred voice commands. Moreover, since participants expressed concerns about the use of voice interaction in public, we recommended to employ a flexible system that lets the user choose the preferred intearaction modality depending on personal preferences and contexts of use.
So, here I presented the initial steps of the project. Recently, we have explored further multimodal interaction for older adults, comparing different age groups and how users combine different input modalities. Now, we are investigating the personal values related to mid-air gesture interaction and the characteristics of the interaction that can sustain those values, and are about to start the first evaluation of multimodal interaction with users.