Spring 2018 Seminar Presentations

Here’s a summary from the Spring 2018 Seminar Series

Beyond Direct Command-Based Natural Language Interactions with Robots

Matthias Scheutz
Tufts University

Friday, April 20

A main goal of human-robot interaction research is to make human-robot interactions as natural as possible. Critically, this includes natural language (NL) interactions, even though NL capabilities were traditionally either not included at all in robotic architectures or at best restricted to simple command-based interfaces.

In this presentation, I will provide an overview of our recent architectural and empirical work in NL-based human-robot interaction, with focus on the pragmatic aspects of situated NL understanding and generation. In addition to video demonstrations showing our algorithms at work, I will also present results from human subject experiments that hint at the complex interplay between linguistic and non- linguistic aspects of human-robot interactions.

Bio: Matthias Scheutz received degrees in philosophy (M.A. 1989, Ph.D. 1995) and formal logic (M.S. 1993) from the University of Vienna and in computer engineering (M.S. 1993) from the Vienna University of Technology (1993) in Austria. He also received the joint Ph.D. in cognitive science and computer science from Indiana University in 1999. Matthias is currently a full professor of computer and cognitive science in the Department of Computer Science at Tufts University, and Senior Gordon Faculty Fellow in the School of Engineering at Tufts where he also directs the Human-Robot Interaction Laboratory. He has over 300 peer-reviewed publications in artificial intelligence, artificial life, agent-based computing, natural language processing, cognitive modeling, robotics, human-robot interaction and foundations of cognitive science. His current research and teaching interests include multi-scale agent-based models of social behavior and complex cognitive and affective autonomous robots with natural language and ethical reasoning capabilities for natural human-robot interaction.

Variability in input: a corpus study of discourse markers in immigrant parents’ speech.

Professor Sophia Malamud

Friday, April 13

We focus on the input properties in the bilingual acquisition of two Russian expressions (namely, aa and mm), extending the line of research that treats disfluencies not as performance errors irrelevant to the study of grammar, but as part of native speakers’ linguistic competence (Bell et al. 2003; Erker & Bruso 2017; Ginzburg et al. 2014 inter alia). Using the audio-aligned corpus of monolingual and bilingual child-directed and child speech in Russia, Germany and the U.S., which is being constructed by the authors of the study (BiRCh Corpus), the paper discusses the range of functions these words can serve in Russian and presents evidence of differences between the monolingual and bilingual parents in their use of these words.

Alexa Spoken Language Understanding

Andy Rosenbaum

Friday, March 23

Alexa is the groundbreaking cloud-based voice service that powers Amazon Echo and other devices designed around your voice. Our mission is to push the envelope in Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), and Audio Signal Processing, in order to provide the best-possible experience for our customers. In this seminar, I’ll give an overview of the science and technology that powers Alexa Spoken Language Understanding.

Bio: Andy Rosenbaum graduated from the Brandeis Computational Linguistics MA program in 2014. He works on Machine Learning, Speech Recognition, and Natural Language Understanding for Amazon Alexa.

What can be Accomplished with the State of the Art in Information Extraction?
Ralph Weischedel, Senior Scientist,
Information Sciences Institute
Friday, March 2 at 3:30
We present three deployed applications where information extraction is being used, and note the common features of those applications that have already led to success. Thus, the state of the art is proving valuable for some applications. We also identify key research challenges whose solution seems essential for further successes. Since a few practical deployments already exist and since breakthroughs on particular challenges would greatly broaden the technology’s deployment, further research will yield even greater value.
Dr. Ralph Weischedel, a Senior Supervising Computer Scientist and Research Team Lead at the University of Southern California’s Information Sciences Institute, has diverse experience in Natural Language Processing and its application to Government needs. For over 30 years, he has led text understanding research, focusing on statistical learning algorithms. He has more than 120 papers, three patents, and a best paper award from the Association for Computational Linguistics (ACL) in Machine Translation. He is a Fellow of the ACL, a distinction held by barely more than 1% of ACL members. He has served as principal investigator on diverse efforts.

NOTE: Dr. Weishedel is from ISI’s Waltham Office, where they are looking for both CL and CS interns and new grads. ISI will also be at the Industry Reception on the 28th.

Question Answering R&D at Microsoft

TJ Hazen, Microsoft Research

Friday, February 9

Natural language processing technology for open ended question answering tasks is now readily available to anyone with internet access using web sites such as Bing or Google. This talk will present a general overview of how question answering inside of Microsoft Bing works and discuss techniques used to expand and improve Bing’s question answering capabilities. The talk will also discuss recent advances in deep learning modeling techniques to perform open ended machine reading comprehension and question answering tasks.

TJ Hazen is a Principal Research Manager with Microsoft Research where his current work is focused on the tasks of machine reading comprehension and question answering. Prior to joining Microsoft in 2013, TJ was a Research Scientist at MIT where he spent six years as a member of the Human Language Technology Group at MIT Lincoln Laboratory and nine years as a Research Scientist at the MIT Computer Science and Artificial Intelligence Laboratory. TJ holds S.B., S.M., and Ph.D. degrees in Electrical Engineering and Computer Science from MIT.

Translation Divergence between Chinese-English Machine Translation:
An Empirical Investigation

Nianwen Xue

Friday, February 2 

Alignment is an important part of building a parallel corpus that is useful for both linguistic analysis and NLP applications. We propose a hierarchical alignment scheme where word-level and phrase-level alignments are carefully coordinated to eliminate conflicts and redundancies. Using a parallel Chinese-English Treebank annotated with this scheme we show that some high-profile translation divergences that motivate previous research are actually very rare in our data, whereas other translation divergences that have previously received little attention actually exist in large quantities. We also show that translation divergences can to a large extent be captured by the syntax-based translation rules extracted from the parallel treebank, a result that supports the contention that semantic representations may be impractical and unnecessary to bridge translation divergences in Chinese-English MT.

Nianwen Xue is an Associate Professor in the Computer Science Department and the Language & Linguistics Program at Brandeis University. He has devoted substantial efforts to developing linguistically annotated resources for natural language processing purposes. The other thread of his research involves using statistical and machine learning techniques to solve natural language processing problems. His research has received support from the National Science Foundation (NSF), IARPA and DARPA. He is currently the editor-in-chief of the ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), and he also serves on the editorial boards of Language Resources and Evaluation , and Lingua Sinica. He is currently the Vice Chair/Chair-Elect of Sighan, an ACL special interest group in Chinese language processing.

Can we Produce Multilingual NLP Workbenches for DH Researchers?  Report on the LitText Experiment

Andrew U. Frank
TU Wien, Department of Geodesy and Geoinformation frank@geoinfo.tuwien.ac.at

Tuesday, January 16

Many researchers in the Digital Humanities are mining corpora of natural language texts in several languages; they would be helped with an NLP work- bench to process the text and annotate them for querying. NLP support for many languages is available; the impediment is “only” the integration.
LitText is an experiment to build a workbench to advance a Computational Comparative Literary study. It is work in progress and demonstrates current difficulties. The workbench proposes a three step process to the DH researcher:
1. Collect the text and prepare them for NLP processing; each text is a file and metadata is included with a simple and extensible markup language.
2. Process the text with NLP tools and convert the result to RDF triples.
3. Build the corpus in a triple store and query with SPARQL.
Current state:
• Use of Stanford coreNLP with support for English, German, French, Spanish and tint for Italian, each as a server at a specific URL.
• Coding of POS is somewhat uniform, but NER includes surprises and UD is certainly a step in the right direction.
• Translation of NLP output (XML) to RDF triples is currently preserving structure and encoding.
• Triple stores (we use Jena and Fuseki) seem fast enough.
• SPARQL is very flexible and likely to be difficult for DH researchers.

TJ Hazen from Microsoft Research presents February 9

Language Technology Seminar Series
 
Question Answering R&D at Microsoft
 
TJ Hazen, Microsoft Research
 
Friday, February 9 at 3:30
Volen 101
 
Natural language processing technology for open ended question answering tasks is now readily available to anyone with internet access using web sites such as Bing or Google. This talk will present a general overview of how question answering inside of Microsoft Bing works and discuss techniques used to expand and improve Bing’s question answering capabilities. The talk will also discuss recent advances in deep learning modeling techniques to perform open ended machine reading comprehension and question answering tasks.
 
TJ Hazen is a Principal Research Manager with Microsoft Research where his current work is focused on the tasks of machine reading comprehension and question answering. Prior to joining Microsoft in 2013, TJ was a Research Scientist at MIT where he spent six years as a member of the Human Language Technology Group at MIT Lincoln Laboratory and nine years as a Research Scientist at the MIT Computer Science and Artificial Intelligence Laboratory.  TJ holds S.B., S.M., and Ph.D. degrees in Electrical Engineering and Computer Science from MIT.

Professor Xue presents at CL Seminar Feb 2

Translation Divergence between Chinese-English Machine Translation:      An Empirical Investigation
 
Nianwen Xue
 
Friday, February 2 at 3:30pm
Volen 101
Alignment is an important part of building a parallel corpus that is useful for both linguistic analysis and NLP applications. We propose a hierarchical alignment scheme where word-level and phrase-level alignments are carefully coordinated to eliminate conflicts and redundancies. Using a parallel Chinese-English Treebank annotated with this scheme we show that some high-profile translation divergences that motivate previous research are actually very rare in our data, whereas other translation divergences that have previously received little attention actually exist in large quantities. We also show that translation divergences can to a large extent be captured by the syntax-based translation rules extracted from the parallel treebank, a result that supports the contention that semantic representations may be impractical and unnecessary to bridge translation divergences in Chinese-English MT.
 
Nianwen Xue is an Associate Professor in the Computer Science Department and the Language & Linguistics Program at Brandeis University. He has devoted substantial efforts to developing linguistically annotated resources for natural language processing purposes.  The other thread of his research involves using statistical and machine learning techniques to solve natural language processing problems. His research has received support from the National Science Foundation (NSF), IARPA and DARPA. He is currently the editor-in-chief of the ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), and he also serves on the editorial boards of Language Resources and Evaluation , and Lingua Sinica. He is currently the Vice Chair/Chair-Elect of Sighan, an ACL special interest group in Chinese language processing.

Jibo, the first social robot for the home

The Brandeis CL Seminar Series hosts Jibo:

Roberto Pieraccini, Head of Conversational Technologies, Jibo Inc.

Friday, December 1, 3pm, Volen 101

Jibo is a robot that understands speech and sees. He has a moving body that complements his verbal communication and expresses his emotions, cameras and microphones to make sense of the world around him. He detects where sounds come from and can track and recognize people’s faces. He has a display to show images, an eye that follows you, and touch sensors.  With this array of technologies, Jibo encompasses the ultimate human-machine interface.

In this talk we will give an overview of the technological complexity we embarked into when, more than 4 years ago, we started the journey of building the first consumer social robot. We will describe some of the solutions we adopted and give a demo of the product that started shipping a few weeks ago. We will conclude with a discussion on the future challenges for short and long-term research.

ABOUT THE SPEAKER:

Roberto Pieraccini, a scientist, technologist, and the author of “The Voice in the Machine,” (MIT Press, 2012) has been at the forefront of speech, language, and machine learning innovation for more than 30 years. He is widely known as a pioneer in the fields of statistical natural language understanding and machine learning for automatic dialog systems, and their practical application to industrial solutions. As a researcher he worked at CSELT (Italy), Bell laboratories, AT&T Labs, and IBM T.J. Watson. He led the dialog technology team at SpeechWorks Int.l, he was the CTO of SpeechCycle, and the CEO of the International Computer Science Institute (ICSI) in Berkeley. He now leads the Conversational Technologies team at Jibo.  http://robertopieraccini.com

 

Harry Bunt: Issues in the semantic annotation of quantification

Harry Bunt

Professor of Language and Artificial Intelligence
Tilburg University

Thursday Oct. 26 at 3:30
Volen 101

Quantification is ubiquitous in natural language: it occurs in every sentence. It occurs whenever a predicate P is applied to a set S of objects, where it gives rise to such questions as (1) To how many members of S is P applied? (2) Is P applied to individual members of S, or to S as a whole, or to certain subsets of S? (3) What is the size of S? (4) How is S determined by lexical, syntactic and contextual information? Moreover, if P is applied to combinations of members from different sets, issues of relative scope arise.

Quantification is a complex phenomenon, both from a semantic point of view and because of the complexity of the relation between the syntax and the semantics of quantification, and has been studied extensively by logicians, linguistics, and computational semanticists. Nowadays it is generally agreed that quantifier expressions in natural language are noun phrases, which is why quantification arises in every sentence.

The International Organization for Standardization ISO has in recent years started to develop annotation schemes for semantic phenomena, both in support of linguistic research in semantics and for building semantically more advanced NLP systems. The ISO-TimeML scheme (ISO 24617-1), based on Pustejovsky’s TimeML, was the first ISO standard that was established in this area; others concern the annotation of dialogue acts, discourse relations, semantic roles, and spatial information. Quantification is currently considered as a next candidate for an ISO standard annotation scheme. In this talk I will discuss some of the issues involved in developing such an annotation scheme, including the definition of an abstract syntax of the annotations, of concrete XML representations, and the semantics of the annotations.

Harry Bunt is professor of Linguistics and Computer Science at Tilburg University, The Netherlands. Before that he worked at Philips Research Labs. He studied physics and mathematics at the University of Utrecht and obtained a doctorate (cum laude) in Linguistics at the University of Amsterdam. His main areas of interest are computational semantics and pragmatics, especially in relation to (spoken) dialogue. He developed a framework for dialogue analysis called Dynamic Interpretation Theory, which has been the basis of an international standard for dialogue annotation (ISO 24617-2).

CL Seminar: Lexicography from Scratch

Lexicography from Scratch: Quantifying meaning descriptions with feature engineering

Orion Montoya 
Friday, October 20 at 3pm

Volen 101

When computational linguistics wishes to engage with the meaning of words, it asks the experts: lexicographers, who analyze evidence of usage and then record judgments in dictionaries, in the form of definitions. A definition is a finely-wrought piece of natural language, whose nuances are as elusive to computational processes as any other unstructured data. Computational linguists nevertheless squeeze as much utility as they can out of dictionaries of every stripe, from Webster’s 1913 to Wordnet. None of these resources had computational analysis of lexical meaning in mind when they were conceived or created. Despite the immense human cognitive effort that went into making them, most lexical resources constrain their computational users to a few simplistic lookup tasks.
If a lexical resource is designed, from its origins, to serve all the diverse human and computational applications for which dictionaries have been repurposed in the digital era, it might yield significant improvements both theoretically and practically. But who wants to make a dictionary from scratch? The theme of the 2017 Electronic Lexicography conference (Leiden, September 19-21: http://elex.link/elex2017/) was “Lexicography From Scratch”. This talk assembles a number of isolated recent innovations in lexicographical practice — often corpus-driven retrofits on to existing dictionary data — and attempts to map out a lexicographical process that would connect them all.
Such a process would yield meaning descriptions that are quantified, linked to corpus data, decomposable into individual semantic factors, and conducive to insightful comparison of lexicalized concepts in pairs and in groups. We describe a cluster-analysis framework that shows promise for automating the fussier parts of this by reducing cognitive loads on the lexical analyst. If aspects of lexical analysis can be automated through feature engineering, we may produce computational models of lexical meaning that are more useful for NLP tasks and more maintanable by lexicographers.
Bio: Orion Montoya graduated from the Brandeis CL MA program in 2017, with the thesis Lexicography as feature engineering: automatic discovery of distinguishing semantic factors for synonyms. Before coming to Brandeis, he spent fifteen years in and around the lexicography industry, computing with lexical data in all of its manifestations: digitizing old print dictionaries, managing lexicographical corpora, linking old lexical data to new corpus data. He also has a BA in Classics from the University of Chicago.

CL Seminar on the LAPPS Natural Language Grid, Feb 1

The Language Application Grid as a Platform for NLP Research
Keith Suderman
Vassar College

Wednesday, February 1 at 3pm
Volen 101
Brandeis University

The LAPPS Grid project (Vassar, Brandeis, CMU, LDC), which has developed a platform providing access to a vast array of language processing tools and resources for the purposes of research and development in natural language processing (NLP), has recently expanded to enhance its usability by non-technical users such as those in the Digital Humanities community. We provide a live demonstration of LAPPS Grid use, ranging from “from scratch” construction of a workflow using atomic tools to a pre-configured docker image that can be run off-the-shelf on a laptop or in the cloud, for several tasks of relevance to the NLP and DH communities.

Keith Suderman is a Research Assistant with the Department of Computer Science at Vassar College in Poughkeepsie, New York. Keith works full time on the development of the LAPPS Grid API, architecture, and tool integrations.