6.217 Sum: Multimodal references

Tue Feb 14 11:23:13 UTC 1995

----------------------------------------------------------------------
LINGUIST List:  Vol-6-217. Tue 14 Feb 1995. ISSN: 1068-4875. Lines: 201

Subject: 6.217 Sum: Multimodal references

Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at tam2000.tamu.edu>
            Helen Dry: Eastern Michigan U. <hdry at emunix.emich.edu>

Asst. Editors: Ron Reck <rreck at emunix.emich.edu>
               Ann Dizdar <dizdar at tam2000.tamu.edu>
               Ljuba Veselinova <lveselin at emunix.emich.edu>

-------------------------Directory-------------------------------------

1)
Date: Mon, 13 Feb 95 12:05:33 +0100
From: jussi at sics.se
Subject: summary of multimodal refs

-------------------------Messages--------------------------------------
1)
Date: Mon, 13 Feb 95 12:05:33 +0100
From: jussi at sics.se
Subject: summary of multimodal refs

Summary to my query of early January:

) Dear linguists,
)
) Does anyone know of work -- studies, analyses -- in multimodal reference:
) i. e. how words and picture interact and refer to each other in texts?
)

Thanks to all who responsed, among them:

Susan Meredith Burt, Robert Dale, Karen Emmorey,
Sharon Flank, Lisa Frumkes, Sabine Geldof, James A. McGilvray,
Georgia Green, Marti Hearst, Richard Hirsch, Sally Jacoby,
Pirjo Karvonen, John Lee, Dick Oehrle, Toshio Ohori, Laurent Romary,
Deborah D K Ruuskanen, Roberta Trites, Mats Wiren.

----------------------------------

The background to my query is that my group has worked for some time with
multimodal input to computer systems, using combinations of text and direct
manipulation: point and click gestures. Just recently we extended our scope
somewhat, and we have just completed a first running version of a speech
interface to a 3-D graphic virtual environment. I myself am looking at
reference resolution in the interface. We are currently in the process of
experimenting and fooling about with it. I was surprised myself at the
strong effects of visual focus and interactivity on referent choice in
situations which could be construed as ambiguous, and conversely, how in
certain constructions, the textual effects completely override visual and
gestural cues. Just recently we completed the first stage of the project. I
will be happy to mail anyone a copy of our first tentative report on how
things turned out. This led me on to a side track: I am currently working
on a statistical study of a Tintin album (L'Oreille Cassee) to see how the
pictorial mode of the text affects the structure of referential
expressions.

Jussi Karlgren, Ivan Bretan, Niklas Frost, Lars Jonsson. 1995.
``Interaction Models for Speech Interfaces to Virtual Environments'',
Proceedings of Second Eurographics Workshop on Virtual Environments
-- Realism and Real Time, Monte Carlo. Darmstadt:Fraunhofer IGD.

Integrating speech and virtual reality technology is being done
simultaneously by several research groups, and there are several
publications from various sources in the works.

Naturally, multimodality in the human computer interface can be more than
speech, vision, and gestures, and a fair amount of work has been put into
investigating effects of adding text input or output to graphical displays
or vice versa. There are several studies made on multimodal aspects of
human-human discourse, both spoken and in text form, some with an eye on
application on human-computer interaction, and some not. Several responses
encouraged me to look more at studies of children's literature: the
standard reference to that appears to be Nodelman, in the following list.

The following list only contains the complete references I have been sent
so far. They keep arriving, so I may post a revised version if I receive
enough more material. Naturally, the field is diverse and large, and this
list does in no way even approximate the breadth and depth of study in the
area, but much of the work in it is new to me. I hope it is of some use!

(More) computer oriented work -- system descriptions and empirical studies:

Edwin Bos, Carla Huls, and Wim Claassen. 1994. ``EDWARD: full integration of
 language and action in a multimodal user interface'' {\it International Journal
 of Human-Computer Studies}, {\bf 40}:473-495.

R. Chandrasekar and S. Ramani. 1989. ``Interactive communication of sentential
 structure and content: an alternative approach to man-machine communication'',
 {\it International Journal of Man-Machine Studies} {\bf 30}:121-148.

\item Philip R. Cohen. 1992. ``The Role of Natural Language in a Multimodal
 Interface'', Proceedings of the ACM Symposium on User Interface Software and
 Technology (UIST), Monterey, pp. 143-150.

Steven K. Feiner and Kathleen R. McKeown.
Automating the Generation of Coordinated Multimedia Explanations
IEEE Computer 24 (10) 33-41 October 1991.

Interactive Spoken Dialogue Interface in Virtual Worlds
Christophe Godereaux, Korinna Diebel, Pierre-Olivier El Guedj, Pierre Nugues.
Proc. Ling Conc and Methods in CSCW, London, Nov 94.

Govindaraju, V.; Srihari, S.N.; Sher, D.B.
Caption-aided face location in newspaper photographs.
Proceedings of 11th IAPR International Conference on Pattern Recognition.
Vol.1. Computer Vision and Applications. The Hague, Netherlands.
1992. Los Alamitos, CA, USA: IEEE Comput. Soc. Press, p. 474-7.

Govindaraju, V.; Srihari, S.N.; Sher, D.
A computational model for face location based on cognitive principles.
AAAI-92. Proceedings Tenth National Conference on Artificial
Intelligence, San Jose, CA, USA. 1992. Menlo Park, CA, USA:
AAAI Press, p. 350-5.

\item[Susann LuperFoy]. 1992. {\it The Representation of Multimodal User
 Interface Dialogues Using Discourse PEGS}.  {\it Proceedings of the 30th Annual
 Meeting of the Association of Computational Linguistics}, Newark.

\item Johanna D. Moore and William R. Swartout. 1990. ``Pointing: A Way Toward
 Explanation Dialogue'', {\it Proceedings of AAAI}, Boston.

Dagmar Schmauck 'Deixis in der Mensch-Maschine-Interaktion Multimediala
Referentenidentifikation durch naturliche und simulierte Zeigegesten.'
Tubingen: Niemeyer. 1991.

Wahlster, W., Andre, E., Finkler, W., Profitlich, H. -J.  and Rist,
T.   Plan-based integration of natural language and graphics generation.
Artificial Intelligence, 63(1-2) pp 387-427, October 1993

(Less) computer based work, mainly human-human interaction studies, although
some have an eye on application in human-computer interface design.

\item[Alphonse Chapanis, Robert B. Ochsman, Robert N. Parrish, and Gerald
 D.Weeks], ``Studies in Interactive Communication: I. The Effects of Four
 Communication Modes on The Behavior of Teams During Cooperative Problem
 Solving'', Human Factors 14:6 (1972) 487-509

\item[Alphonse Chapanis, Robert N. Parrish, Robert B. Ochsman, and Gerald
 D.Weeks], ``Studies in Interactive Communication: II. The Effects of Four
 Communication Modes on The Behavior of Teams During Cooperative Problem
 Solving'', Human Factors 19:2 (1977) 101-126

\item[Philip R. Cohen], ``The Pragmatics of Referring and the Modality of
 Communication'', Computational Linguistics 10:2 (1984) 97-147

Glenberg, A. & Kruley, P. (1992).  Pictures and anaphora:  Evidence for
independent processes.  _Memory and Cognition_, 20(5), 461-471.

Glenberg, A. & McDaniel, M. (1992).  Mental models, pictures, and text:
Integration of spatial and verbal information.  _Memory and Cognition_,
20(5), 458-460.

Nelson Goodman _Languages of Art_ and _Routes of Reference_

Georgia M. Green and Margaret Olsen. Interactions of text and illustration
in beginning reading. Technical report 355, Center for the Study of Reading,
University of Illinois, Champaign, Illinois.

Hammel, Eugene A. (1972)  The Myth of Structural Analysis: Levi-Strauss
and the Three Bears.  Current Topics in Antrhropology, vol. 5, Module
26, 1972, paages 1-29.

Neilson and Lee, "Conversations with
Graphics ..." in Int. Journal of Human-Computer Studies, 40, 1994.

Nodelman, Perry (1988). _Words About Pictures_.  Athens, Georgia:
University of Georgia Press.

Ochs, E., Jacoby, S., & Gonzales, P.  (1994)   Interpretive Journeys: How
Physicists Talk and Travel through Graphic Space.  CONFIGURATIONS, 2 (1),
151-171.

Ochs, E., Gonzales, P., & Jacoby, S.  (forthcoming).  "When I come down
I'm in the domain state": Grammar and graphic representation in the
interpretive activity of physicists.  In E.A. Schegloff, E. Ochs, &
S. Thompson (Eds.), GRAMMAR AND INTERACTION, Cambridge University Press.

again, thanks to all who responded!

J

Jussi Karlgren, fil. lic.                               Jussi.Karlgren at sics.se
Sw Inst of Comp Sc (SICS)         Spr}kteknologi / Natural Language Processing
Box 1263, 164 28 Kista                 ph +46 8 752 15 00, fax +46 8 751 72 30
Stockholm, Sweden                    http://sics.se/~jussi/jussi-karlgren.html

--------------------------------------------------------------------------
LINGUIST List: Vol-6-217.