Spring 2018 Seminar Presentations

Here’s a summary from the Spring 2018 Seminar Series

Beyond Direct Command-Based Natural Language Interactions with Robots

Matthias Scheutz
Tufts University

Friday, April 20

A main goal of human-robot interaction research is to make human-robot interactions as natural as possible. Critically, this includes natural language (NL) interactions, even though NL capabilities were traditionally either not included at all in robotic architectures or at best restricted to simple command-based interfaces.

In this presentation, I will provide an overview of our recent architectural and empirical work in NL-based human-robot interaction, with focus on the pragmatic aspects of situated NL understanding and generation. In addition to video demonstrations showing our algorithms at work, I will also present results from human subject experiments that hint at the complex interplay between linguistic and non- linguistic aspects of human-robot interactions.

Bio: Matthias Scheutz received degrees in philosophy (M.A. 1989, Ph.D. 1995) and formal logic (M.S. 1993) from the University of Vienna and in computer engineering (M.S. 1993) from the Vienna University of Technology (1993) in Austria. He also received the joint Ph.D. in cognitive science and computer science from Indiana University in 1999. Matthias is currently a full professor of computer and cognitive science in the Department of Computer Science at Tufts University, and Senior Gordon Faculty Fellow in the School of Engineering at Tufts where he also directs the Human-Robot Interaction Laboratory. He has over 300 peer-reviewed publications in artificial intelligence, artificial life, agent-based computing, natural language processing, cognitive modeling, robotics, human-robot interaction and foundations of cognitive science. His current research and teaching interests include multi-scale agent-based models of social behavior and complex cognitive and affective autonomous robots with natural language and ethical reasoning capabilities for natural human-robot interaction.

Variability in input: a corpus study of discourse markers in immigrant parents’ speech.

Professor Sophia Malamud

Friday, April 13

We focus on the input properties in the bilingual acquisition of two Russian expressions (namely, aa and mm), extending the line of research that treats disfluencies not as performance errors irrelevant to the study of grammar, but as part of native speakers’ linguistic competence (Bell et al. 2003; Erker & Bruso 2017; Ginzburg et al. 2014 inter alia). Using the audio-aligned corpus of monolingual and bilingual child-directed and child speech in Russia, Germany and the U.S., which is being constructed by the authors of the study (BiRCh Corpus), the paper discusses the range of functions these words can serve in Russian and presents evidence of differences between the monolingual and bilingual parents in their use of these words.

Alexa Spoken Language Understanding

Andy Rosenbaum

Friday, March 23

Alexa is the groundbreaking cloud-based voice service that powers Amazon Echo and other devices designed around your voice. Our mission is to push the envelope in Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), and Audio Signal Processing, in order to provide the best-possible experience for our customers. In this seminar, I’ll give an overview of the science and technology that powers Alexa Spoken Language Understanding.

Bio: Andy Rosenbaum graduated from the Brandeis Computational Linguistics MA program in 2014. He works on Machine Learning, Speech Recognition, and Natural Language Understanding for Amazon Alexa.

What can be Accomplished with the State of the Art in Information Extraction?
Ralph Weischedel, Senior Scientist,
Information Sciences Institute
Friday, March 2 at 3:30
We present three deployed applications where information extraction is being used, and note the common features of those applications that have already led to success. Thus, the state of the art is proving valuable for some applications. We also identify key research challenges whose solution seems essential for further successes. Since a few practical deployments already exist and since breakthroughs on particular challenges would greatly broaden the technology’s deployment, further research will yield even greater value.
Dr. Ralph Weischedel, a Senior Supervising Computer Scientist and Research Team Lead at the University of Southern California’s Information Sciences Institute, has diverse experience in Natural Language Processing and its application to Government needs. For over 30 years, he has led text understanding research, focusing on statistical learning algorithms. He has more than 120 papers, three patents, and a best paper award from the Association for Computational Linguistics (ACL) in Machine Translation. He is a Fellow of the ACL, a distinction held by barely more than 1% of ACL members. He has served as principal investigator on diverse efforts.

NOTE: Dr. Weishedel is from ISI’s Waltham Office, where they are looking for both CL and CS interns and new grads. ISI will also be at the Industry Reception on the 28th.

Question Answering R&D at Microsoft

TJ Hazen, Microsoft Research

Friday, February 9

Natural language processing technology for open ended question answering tasks is now readily available to anyone with internet access using web sites such as Bing or Google. This talk will present a general overview of how question answering inside of Microsoft Bing works and discuss techniques used to expand and improve Bing’s question answering capabilities. The talk will also discuss recent advances in deep learning modeling techniques to perform open ended machine reading comprehension and question answering tasks.

TJ Hazen is a Principal Research Manager with Microsoft Research where his current work is focused on the tasks of machine reading comprehension and question answering. Prior to joining Microsoft in 2013, TJ was a Research Scientist at MIT where he spent six years as a member of the Human Language Technology Group at MIT Lincoln Laboratory and nine years as a Research Scientist at the MIT Computer Science and Artificial Intelligence Laboratory. TJ holds S.B., S.M., and Ph.D. degrees in Electrical Engineering and Computer Science from MIT.

Translation Divergence between Chinese-English Machine Translation:
An Empirical Investigation

Nianwen Xue

Friday, February 2 

Alignment is an important part of building a parallel corpus that is useful for both linguistic analysis and NLP applications. We propose a hierarchical alignment scheme where word-level and phrase-level alignments are carefully coordinated to eliminate conflicts and redundancies. Using a parallel Chinese-English Treebank annotated with this scheme we show that some high-profile translation divergences that motivate previous research are actually very rare in our data, whereas other translation divergences that have previously received little attention actually exist in large quantities. We also show that translation divergences can to a large extent be captured by the syntax-based translation rules extracted from the parallel treebank, a result that supports the contention that semantic representations may be impractical and unnecessary to bridge translation divergences in Chinese-English MT.

Nianwen Xue is an Associate Professor in the Computer Science Department and the Language & Linguistics Program at Brandeis University. He has devoted substantial efforts to developing linguistically annotated resources for natural language processing purposes. The other thread of his research involves using statistical and machine learning techniques to solve natural language processing problems. His research has received support from the National Science Foundation (NSF), IARPA and DARPA. He is currently the editor-in-chief of the ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), and he also serves on the editorial boards of Language Resources and Evaluation , and Lingua Sinica. He is currently the Vice Chair/Chair-Elect of Sighan, an ACL special interest group in Chinese language processing.

Can we Produce Multilingual NLP Workbenches for DH Researchers?  Report on the LitText Experiment

Andrew U. Frank
TU Wien, Department of Geodesy and Geoinformation frank@geoinfo.tuwien.ac.at

Tuesday, January 16

Many researchers in the Digital Humanities are mining corpora of natural language texts in several languages; they would be helped with an NLP work- bench to process the text and annotate them for querying. NLP support for many languages is available; the impediment is “only” the integration.
LitText is an experiment to build a workbench to advance a Computational Comparative Literary study. It is work in progress and demonstrates current difficulties. The workbench proposes a three step process to the DH researcher:
1. Collect the text and prepare them for NLP processing; each text is a file and metadata is included with a simple and extensible markup language.
2. Process the text with NLP tools and convert the result to RDF triples.
3. Build the corpus in a triple store and query with SPARQL.
Current state:
• Use of Stanford coreNLP with support for English, German, French, Spanish and tint for Italian, each as a server at a specific URL.
• Coding of POS is somewhat uniform, but NER includes surprises and UD is certainly a step in the right direction.
• Translation of NLP output (XML) to RDF triples is currently preserving structure and encoding.
• Triple stores (we use Jena and Fuseki) seem fast enough.
• SPARQL is very flexible and likely to be difficult for DH researchers.

CL Industry Meet’n’Greet Reception February 28

Our Spring Industry Reception will be Wednesday, February 28 from 3-5 pm.  It’s a great opportunity to talk with technical people from area companies about what they are doing as well as what opportunities they have for interns and new employees.

Companies attending:

Basis

Jibo
Mobile Heartbeat
Digital Lumens
Tibco

Dr. Ralph Weischedel to present March 2

Language Technologies Seminar Series

What can be Accomplished with the State of the Art in Information Extraction?

Ralph Weischedel, Senior Scientist
Information Sciences Institute

Friday, March 2 at 3:30
GZang. 124

We present three deployed applications where information extraction is being used, and note the common features of those applications that have already led to success. Thus, the state of the art is proving valuable for some applications. We also identify key research challenges whose solution seems essential for further successes. Since a few practical deployments already exist and since breakthroughs on particular challenges would greatly broaden the technology’s deployment, further research will yield even greater value.

Dr. Ralph Weischedel, a Senior Supervising Computer Scientist and Research Team Lead at the University of Southern California’s Information Sciences Institute, has diverse experience in Natural Language Processing and its application to Government needs. For over 30 years, he has led text understanding research, focusing on statistical learning algorithms. He has more than 120 papers, three patents, and a best paper award from the Association for Computational Linguistics (ACL) in Machine Translation. He is a Fellow of the ACL, a distinction held by barely more than 1% of ACL members. He has served as principal investigator on diverse efforts.

NOTE:  Dr. Weishedel is from ISI’s Waltham Office, where they are looking for both CL and CS interns and new grads.  ISI will also be at the Industry Reception on the 28th.

Jibo, the first social robot for the home

The Brandeis CL Seminar Series hosts Jibo:

Roberto Pieraccini, Head of Conversational Technologies, Jibo Inc.

Friday, December 1, 3pm, Volen 101

Jibo is a robot that understands speech and sees. He has a moving body that complements his verbal communication and expresses his emotions, cameras and microphones to make sense of the world around him. He detects where sounds come from and can track and recognize people’s faces. He has a display to show images, an eye that follows you, and touch sensors.  With this array of technologies, Jibo encompasses the ultimate human-machine interface.

In this talk we will give an overview of the technological complexity we embarked into when, more than 4 years ago, we started the journey of building the first consumer social robot. We will describe some of the solutions we adopted and give a demo of the product that started shipping a few weeks ago. We will conclude with a discussion on the future challenges for short and long-term research.

ABOUT THE SPEAKER:

Roberto Pieraccini, a scientist, technologist, and the author of “The Voice in the Machine,” (MIT Press, 2012) has been at the forefront of speech, language, and machine learning innovation for more than 30 years. He is widely known as a pioneer in the fields of statistical natural language understanding and machine learning for automatic dialog systems, and their practical application to industrial solutions. As a researcher he worked at CSELT (Italy), Bell laboratories, AT&T Labs, and IBM T.J. Watson. He led the dialog technology team at SpeechWorks Int.l, he was the CTO of SpeechCycle, and the CEO of the International Computer Science Institute (ICSI) in Berkeley. He now leads the Conversational Technologies team at Jibo.  http://robertopieraccini.com

 

Justin Su from QPID talks about Machine Learning in Healthcare

Machine Learning Approaches to Evaluate Clinical Evidence Quality

Justin Su
NLP/ML Engineer, QPID
 
Friday, November 10 at 3:00
Volen 101
 
The Data Science team at QPID has conducted a machine learning project to develop an approach that decides which request for a medical procedure are clinically appropriate based on clinical information from the patient’s medical record provided by a human user. In this talk, I will present and compare various machine learning and deep learning models that we have experimented to identify supportive clinical evidence for a given procedure.
Justin Su graduated from the CL MA program at Brandeis in 2017, and joined QPID Health as an NLP/ML Engineer. He does a bit of everything at QPID, which includes software engineering, NLP, machine learning, data science, and baking. 

Homesite

Homesite is committed to being the most trusted and valued customer- driven insurance company. Homesite is experiencing outstanding growth. Homesite was built on: Integrity, Respect, and striving for excellence. These values when combined with discipline and focus will lead success. We’re experts in homeowners, renters and condo insurance. It’s all we do. That’s why we’re really good at tailoring all of our products and services to your needs. Here at Homesite, your home is our focus.

The Experimentation Lab was formed in order to foster technology innovation, agility, and rapid development. The Experimentation Lab is a place to establish thought leadership backed by tactical and strategic projects/products that revolutionize the industry. It is an opportunity to work with state of the art technologies and start-ups combined with the business knowledge that comes from established leaders in the insurance industry. Specifically The Lab is looking to incorporate emerging Big Data and Analytics technologies to revolutionize how the insurance industry works. This can be everything from drone and satellite imagery to aid post catastrophe to social media analytics to better understand our customers needs to the Internet of Things and connect home and car technologies. 

Adobe

Adobe is changing the world through digital experiences. We give everyone – from emerging artists to global brands – everything they need to design and deliver exceptional digital experiences. Adobe Document Cloud is revolutionizing the way the world works with documents. It”s the newest cloud offering at Adobe, and a very exciting place to be. The Document Cloud combines a collection of online services integrated with Adobe Reader and Adobe Acrobat. Our subscription base is growing rapidly and we are continually rolling out new features and services. We work in small, agile teams with considerable autonomy and we value engineers with technical competence, creativity, flexibility, strong customer focus and an eagerness for learning and collaboration.

CL Seminar Series March 3: Dr. Sravana Reddy

Obfuscating Gender in Social Media Writing
Sravana Reddy
Researcher, Wellesley College
Friday March 3, 3:15 in Volen 101
The vast availability of textual data on social media has led to an interest in algorithms to automatically predict demographic attributes based on the user’s writing. These methods are valuable for social science research as well as targeted advertising and profiling, but also compromise the privacy of users who may not realize that their personal idiolects can give away their identities. Can we automatically modify a text so that the author is classified as a certain target gender, under limited knowledge of the classifier, while preserving the text’s fluency and meaning? In this talk, I present a model to modify a text, show empirical results with Twitter and Yelp data, and outline future directions.
Bio:
Sravana Reddy is a researcher at Wellesley College working on natural language processing and its intersections with privacy, the digital humanities, and sociolinguistics. She graduated with a PhD from the University of Chicago, and a bachelors’ from Brandeis University.

Brandeis CL spring “Meet’n’Greet” Industry Reception

The Brandeis Computational Linguistics program held its spring Industry Reception networking event.  Technical representatives from twelve companies attended and spent two hours talking with students about what they do in their companies and collected resumes from students looking for internships and jobs.  Alumni were among the representatives from Amazon, BBN, Basis, Burning Glass, and Callminer.  Other companies attending included Fidelity, Cobalt, ISI, SAP, Optum, SIFT, and Spotify.  Many of those companies also have interns or alumni from the program.

Check out descriptions of these and other companies that have attended our receptions in the past in the Industry Catalog.  The reception is held every semester, with generally 12-15 companies attending each time.

Welcome

The industry catalog is a resource for our students to know what companies do work related to computational linguistics. “Brandeis connections” lists students who work or have worked as interns or employees at the company. If you’d like to add your company, send a short description to mmeteer at brandeis dot com along with the categories you’d like to be listed under.

MITRE

The MITRE Corporation is a not-for-profit organization that operates research and development centers sponsored by the federal government. Our centers support our sponsors with scientific research and analysis, development and acquisition, and systems engineering and integration. We also have an independent research program that explores new and expanded uses of technologies to meet our sponsors’ needs. Our principal locations are in Bedford, Mass., and McLean, Va.
Brandeis Connections: David Tresner-Kirsch

Genesys

Genesys is the market leader in omnichannel customer experience (Voice, IVR, Web, Chat, E-Mail) and in contact center solutions, both in our Cloud offering and on Customer premises. We help brands of all sizes, from Microsoft, Nike, and Fidelity to Fred’s Pizza Shop, make great Customer Experience great business.
The Genesys Customer Experience platform powers optimal end-customer journeys across all touchpoints, channels and interactions to turn customers into brand advocates.
Genesys is trusted by over 4,500 Customers in 80 countries to orchestrate more than 100 million digital and voice interactions each day.

CL Seminar on the LAPPS Natural Language Grid, Feb 1

The Language Application Grid as a Platform for NLP Research
Keith Suderman
Vassar College

Wednesday, February 1 at 3pm
Volen 101
Brandeis University

The LAPPS Grid project (Vassar, Brandeis, CMU, LDC), which has developed a platform providing access to a vast array of language processing tools and resources for the purposes of research and development in natural language processing (NLP), has recently expanded to enhance its usability by non-technical users such as those in the Digital Humanities community. We provide a live demonstration of LAPPS Grid use, ranging from “from scratch” construction of a workflow using atomic tools to a pre-configured docker image that can be run off-the-shelf on a laptop or in the cloud, for several tasks of relevance to the NLP and DH communities.

Keith Suderman is a Research Assistant with the Department of Computer Science at Vassar College in Poughkeepsie, New York. Keith works full time on the development of the LAPPS Grid API, architecture, and tool integrations.

UFA

UFA is a leading, privately-held software engineering firm specializing in Air Traffic Control simulation technologies. UFA, Inc provide simulation products to civil aviation, military, and university customers worldwide. A key element of the product is speech recognition modeling Air Traffic Control commands.
Brandeis connections: Cynthia Goodman, Travis Hasley

Partners Healthcare

Partners HealthCare is a not-for-profit health care system that is committed to patient care, research, teaching, and service to the community locally and globally. Collaboration among our institutions and health care professionals is central to our efforts to advance our mission. Founded in 1994 by Brigham and Women’s Hospital and Massachusetts General Hospital, Partners HealthCare includes community and specialty hospitals, a managed care organization, a physician network, community health centers, home care and other health-related entities. Several of our hospitals are teaching affiliates of Harvard Medical School, and Partners is a national leader in biomedical research.
Brandeis connections: Ken Lai, Suzanne Blakeley, Jessica Huynh, Clay Riley

Nuance

Nuance Communications is reinventing the relationship between people and technology. We believe in the power of intelligent systems, and quite specifically what the power can do for you. Our innovations in voice, natural language understanding, reasoning and systems integration come together to create more human technology.
We are pioneers in making technology fluent in all things human: from understanding spoken words and extracting their meaning to adaptively and seamlessly interpreting the swipe of a fingertip. Every interaction can finally be understood to deliver exactly what a person needs. And we continuously evolve the ability to perceive the nuance of words, actions and meaning — to fit seamlessly into your life, your business and your world.
Nuance is headquartered in Burlington, Massachusetts, with more than 35 offices around the world, and approximately 14,000 employees worldwide. We have a significant portfolio of intellectual property, with more than 4,500 global issued and pending patents. Every day, millions of users and thousands of businesses experience our proven applications, including more than two-thirds of the Fortune 100 use our solutions.

IBM WATSON

IBM Watson Group is a leading-edge start-up business within IBM, charged with ushering in the new era of cognitive computing. Watson mirrors the same cognitive processes as humans with the ability to ingest massive amounts of unstructured data. At IBM Watson Group, we’re transforming a range of industries and professions from medical research and diagnosis to investment guidance and customer service. You can see some of the impact that IBM Watson is making at https://ibm.biz/watsondiscoveryadvisor or you can start building apps yourself with the Watson Developer Cloud at http://ibmwatson.com/developercloud
We have opportunities across the business, whether it’s developing code, conducting research in the cognitive space or implementing solutions for clients

HUMEDICA

Humedica is a clinical intelligence company that powers health care, life sciences, and research organizations to make better-informed, more effective decisions. Our cloud-based analytics solutions create a longitudinal view of both individual patients and patient populations. We gather, normalize, and analyze data from disparate sources that, uniquely, span the continuum of care–including EHRs, Practice Management Systems and claims. By the end of 2015, our data will account for 65M patient lives in the U.S.

Humedica NLP is at the heart of the process, extracting structure from billions of free-form physician notes. NLP data is combined with our other data assets to form one of the largest actionable pools of healthcare information in the world.
Big Data is ushering in a revolution in health care. We are looking for scientists and engineers who are excited by the idea of using their skills to make a difference in this field. We have the infrastructure and the data to enable that revolution–we are looking for people who want to be part of it.

Basis Technology

For over 20 years, Basis Technology has been a pioneer in machine learning, revealing meaningful information and intelligence from raw, unstructured text. The Rosette® text analytics platform, accessible on premise or as a web API, gives businesses and government agencies around the world the necessary interoperable linguistic tools for deep knowledge decision-making. We work with companies, large and small, building enterprise search solutions spanning social media monitoring, risk and compliance, identity management, and security scanning. Rosette adds a wealth of powerful functionality—from pure linguistics to analyses centered around entities, names and relationships in Asian, European, and Middle Eastern languages—to any underlying search or database infrastructure. For more information, email info@basistech.com or visit www.basistech.com.
Brandeis students and alumni: Zachary Yocum, Anna Astori, Justin Su

Apple Speech

Play a part in the next revolution in human-computer interaction. Contribute to a product that is redefining mobile computing. Create groundbreaking technology for large scale systems, spoken language, big data, and artificial intelligence. And work with the people who created the intelligent assistant that helps millions of people get things done — just by asking. Join the Siri Speech team at Apple.
The Siri team is looking for exceptionally skilled and creative Engineers eager to get involved in hands-on work improving the Siri experience.