A Day with BERT and Friends
Posted: 15 Jan 2020
Date Written: January 15, 2020
BERT (Bi-directional Encoder Representations from Transforms) was announced by Google in November 2018. Since then it has had a tremendous impact on NLP. It has shown itself to be supremely flexible; the state of the art for many NLP benchmarks is now set by systems incorporating BERT, or similar, representations. Staff from LexisNexis and Elsevier have been experimenting with BERT to see about modifications and extensions that tailor it for our needs. But what does BERT do well and poorly? Does it have anything to contribute to search? And how does it really compare with alternative methods like XLNet and FLAIR and with less-general methods like BiDAF?
Due to the intense interest in BERT we had a day-long workshop on the topic. The day began with Ron Daniel presenting an overview of BERT and a tutorial exercise that used BERT in a reading comprehension / question answering task. Participants could enter a paragraph or two of text and a question. The system would select a span of text from the paragraph(s) as an answer.
Jana Punuru then spoke about efforts to ‘distill’ a large BERT model to something that would require less memory and computation without sacrificing too much accuracy. This is a topic of considerable interest because BERT is very computationally demanding.
Corey Harper then spoke about research on extracting measurement information from articles. Identifying measurements is not difficult – they are a number (or a range or tolerance) and a unit. But by themselves a measurement like “1.4 ml” is not very useful. Corey described using question answering methods similar to those in the morning’s tutorial. By asking questions like “What was 1.4 ml?” the system could provide answers like “dosage”. That answer could then be used in a followup question like “What had a dosage of 1.4 ml?”.
Sujit Pal gave the last talk of the morning session, on “Searching for Knowledge Graph Evidence”. This described taking information from knowledge graph triples, such as “Diazepan treats anxiety”, converting it into a question, and searching large amounts of medical content to obtain evidence for the triple. The method used in the tutorial can’t be used here because it relies on knowing the question before scanning the text. That is impractical when there are gigabytes of text. Instead, Sujit described an approach based on “BERTserini”, where an Anserini information retrieval component returned the top N paragraphs for the question, then the BERT QA component was used to find the best answer.
Kate Farmer presented the afternoon session where she spoke about several BERT-related search efforts LexisNexis was making. Like Sujit’s talk, Kate mentioned the need to be able to search large amounts of content and the way the tutorial’s question answering was not suitable. She described several experiments they had run. One was an alternative to the BERTserini approach, where BERT was used to provide an embedding vector for sentences within the legal content. A vector for the question was also computed in the same manner and sentences were identified based on the similarity of the vectors. The containing paragraphs for the best sentences were fetched. Those were displayed with the best sentence highlighted.
The day concluded with an open mic question and answer session.
Keywords: Neural Language Models, Question Answering, Search
Suggested Citation: Suggested Citation