Spring 2018 Seminar Presentations

Here’s a summary from the Spring 2018 Seminar Series

Beyond Direct Command-Based Natural Language Interactions with Robots

Matthias Scheutz
Tufts University

Friday, April 20

A main goal of human-robot interaction research is to make human-robot interactions as natural as possible. Critically, this includes natural language (NL) interactions, even though NL capabilities were traditionally either not included at all in robotic architectures or at best restricted to simple command-based interfaces.

In this presentation, I will provide an overview of our recent architectural and empirical work in NL-based human-robot interaction, with focus on the pragmatic aspects of situated NL understanding and generation. In addition to video demonstrations showing our algorithms at work, I will also present results from human subject experiments that hint at the complex interplay between linguistic and non- linguistic aspects of human-robot interactions.

Bio: Matthias Scheutz received degrees in philosophy (M.A. 1989, Ph.D. 1995) and formal logic (M.S. 1993) from the University of Vienna and in computer engineering (M.S. 1993) from the Vienna University of Technology (1993) in Austria. He also received the joint Ph.D. in cognitive science and computer science from Indiana University in 1999. Matthias is currently a full professor of computer and cognitive science in the Department of Computer Science at Tufts University, and Senior Gordon Faculty Fellow in the School of Engineering at Tufts where he also directs the Human-Robot Interaction Laboratory. He has over 300 peer-reviewed publications in artificial intelligence, artificial life, agent-based computing, natural language processing, cognitive modeling, robotics, human-robot interaction and foundations of cognitive science. His current research and teaching interests include multi-scale agent-based models of social behavior and complex cognitive and affective autonomous robots with natural language and ethical reasoning capabilities for natural human-robot interaction.

Variability in input: a corpus study of discourse markers in immigrant parents’ speech.

Professor Sophia Malamud

Friday, April 13

We focus on the input properties in the bilingual acquisition of two Russian expressions (namely, aa and mm), extending the line of research that treats disfluencies not as performance errors irrelevant to the study of grammar, but as part of native speakers’ linguistic competence (Bell et al. 2003; Erker & Bruso 2017; Ginzburg et al. 2014 inter alia). Using the audio-aligned corpus of monolingual and bilingual child-directed and child speech in Russia, Germany and the U.S., which is being constructed by the authors of the study (BiRCh Corpus), the paper discusses the range of functions these words can serve in Russian and presents evidence of differences between the monolingual and bilingual parents in their use of these words.

Alexa Spoken Language Understanding

Andy Rosenbaum

Friday, March 23

Alexa is the groundbreaking cloud-based voice service that powers Amazon Echo and other devices designed around your voice. Our mission is to push the envelope in Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), and Audio Signal Processing, in order to provide the best-possible experience for our customers. In this seminar, I’ll give an overview of the science and technology that powers Alexa Spoken Language Understanding.

Bio: Andy Rosenbaum graduated from the Brandeis Computational Linguistics MA program in 2014. He works on Machine Learning, Speech Recognition, and Natural Language Understanding for Amazon Alexa.

What can be Accomplished with the State of the Art in Information Extraction?
Ralph Weischedel, Senior Scientist,
Information Sciences Institute
Friday, March 2 at 3:30
We present three deployed applications where information extraction is being used, and note the common features of those applications that have already led to success. Thus, the state of the art is proving valuable for some applications. We also identify key research challenges whose solution seems essential for further successes. Since a few practical deployments already exist and since breakthroughs on particular challenges would greatly broaden the technology’s deployment, further research will yield even greater value.
Dr. Ralph Weischedel, a Senior Supervising Computer Scientist and Research Team Lead at the University of Southern California’s Information Sciences Institute, has diverse experience in Natural Language Processing and its application to Government needs. For over 30 years, he has led text understanding research, focusing on statistical learning algorithms. He has more than 120 papers, three patents, and a best paper award from the Association for Computational Linguistics (ACL) in Machine Translation. He is a Fellow of the ACL, a distinction held by barely more than 1% of ACL members. He has served as principal investigator on diverse efforts.

NOTE: Dr. Weishedel is from ISI’s Waltham Office, where they are looking for both CL and CS interns and new grads. ISI will also be at the Industry Reception on the 28th.

Question Answering R&D at Microsoft

TJ Hazen, Microsoft Research

Friday, February 9

Natural language processing technology for open ended question answering tasks is now readily available to anyone with internet access using web sites such as Bing or Google. This talk will present a general overview of how question answering inside of Microsoft Bing works and discuss techniques used to expand and improve Bing’s question answering capabilities. The talk will also discuss recent advances in deep learning modeling techniques to perform open ended machine reading comprehension and question answering tasks.

TJ Hazen is a Principal Research Manager with Microsoft Research where his current work is focused on the tasks of machine reading comprehension and question answering. Prior to joining Microsoft in 2013, TJ was a Research Scientist at MIT where he spent six years as a member of the Human Language Technology Group at MIT Lincoln Laboratory and nine years as a Research Scientist at the MIT Computer Science and Artificial Intelligence Laboratory. TJ holds S.B., S.M., and Ph.D. degrees in Electrical Engineering and Computer Science from MIT.

Translation Divergence between Chinese-English Machine Translation:
An Empirical Investigation

Nianwen Xue

Friday, February 2 

Alignment is an important part of building a parallel corpus that is useful for both linguistic analysis and NLP applications. We propose a hierarchical alignment scheme where word-level and phrase-level alignments are carefully coordinated to eliminate conflicts and redundancies. Using a parallel Chinese-English Treebank annotated with this scheme we show that some high-profile translation divergences that motivate previous research are actually very rare in our data, whereas other translation divergences that have previously received little attention actually exist in large quantities. We also show that translation divergences can to a large extent be captured by the syntax-based translation rules extracted from the parallel treebank, a result that supports the contention that semantic representations may be impractical and unnecessary to bridge translation divergences in Chinese-English MT.

Nianwen Xue is an Associate Professor in the Computer Science Department and the Language & Linguistics Program at Brandeis University. He has devoted substantial efforts to developing linguistically annotated resources for natural language processing purposes. The other thread of his research involves using statistical and machine learning techniques to solve natural language processing problems. His research has received support from the National Science Foundation (NSF), IARPA and DARPA. He is currently the editor-in-chief of the ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), and he also serves on the editorial boards of Language Resources and Evaluation , and Lingua Sinica. He is currently the Vice Chair/Chair-Elect of Sighan, an ACL special interest group in Chinese language processing.

Can we Produce Multilingual NLP Workbenches for DH Researchers?  Report on the LitText Experiment

Andrew U. Frank
TU Wien, Department of Geodesy and Geoinformation frank@geoinfo.tuwien.ac.at

Tuesday, January 16

Many researchers in the Digital Humanities are mining corpora of natural language texts in several languages; they would be helped with an NLP work- bench to process the text and annotate them for querying. NLP support for many languages is available; the impediment is “only” the integration.
LitText is an experiment to build a workbench to advance a Computational Comparative Literary study. It is work in progress and demonstrates current difficulties. The workbench proposes a three step process to the DH researcher:
1. Collect the text and prepare them for NLP processing; each text is a file and metadata is included with a simple and extensible markup language.
2. Process the text with NLP tools and convert the result to RDF triples.
3. Build the corpus in a triple store and query with SPARQL.
Current state:
• Use of Stanford coreNLP with support for English, German, French, Spanish and tint for Italian, each as a server at a specific URL.
• Coding of POS is somewhat uniform, but NER includes surprises and UD is certainly a step in the right direction.
• Translation of NLP output (XML) to RDF triples is currently preserving structure and encoding.
• Triple stores (we use Jena and Fuseki) seem fast enough.
• SPARQL is very flexible and likely to be difficult for DH researchers.