Google's LaMDA model is too convincing, and a researcher is fired
LaMDA: Language Models for Dialog Applications
A while ago, there was a very interesting story about a Google researcher who got convinced that the LaMDA AI was sentient. It did the rounds for a while, but now let’s spend some time investigating what made the model so convincing in the first place.
Introduction and Motivation
There are a few core problems with open dialog systems (and at large LLMs) right now:
Safety: How can we prevent models from outputing toxic or harmful content?
Factual Grounding: How can we help the model stay up to date with facts (“what’s the weather today?”)
Interestingness: How can we bias our models to produce interesting answers? No boring answers.
It is these problems that LaMDA attempts to solve.
Goal: Create a model that can self-evaluate SSI — sensibleness, specificity, and interestingness — while being safe. It can use a toolbox to figure out answers it does not know.
Development Details
The core contribution of this paper is the realization that fine-tuning on a small set of crowdworker-annotated data offers a promising approach to improving model safety. This data is 0.001% of the pre-training data, but is effective nonetheless.
The model is also given a toolset that includes a information retrieval system, a calculator, and a translator. The model learns when to use the model from fine-tuning as well.
The model can thus cite sources, do math, and swap between languages. What a polyglot!
Evaluation
First, we see that fine-tuning has an obvious improvement in both quality and safety and groundedness.
On a wider set of benchmarks, we see that LaMDA continues to perform impressively.
Limitations and Future Work
I am disappointed that LaMDA’s performance wasn’t compared to other dialog models (ChatGPT), which would have demonstrated how their techniques compare to other tactics like RLHF. However, it is very possible that ChatGPT was released after the paper was written 🤷
A lot of this technique depends on the quality of the crowdworker annotations. Thus, it is important to evaluate the patterns of these annotations, as well as make sure they are diverse and representative of various beliefs.
A model, despite having so many tools, still is not able to execute complex reasoning.
Fun Examples
I’ll end this with some fun LaMDA examples, for the model results are fun to read through!
References
[1] Paper link: https://arxiv.org/abs/2201.08239
[2] https://www.washingtonpost.com/technology/2022/06/11/google-ai-lamda-blake-lemoine/