Knowledge-Grounded Dialog Generation

Current approaches to dialog generation using external knowledge.

Mykyta Oliinyk
ML Researcher @ DataRoot Labs
17 Nov 2021
7 min read
Knowledge-Grounded Dialog Generation

What is dialog generation, and where is it used

Dialog generation (or dialogue generation) is the process of generating automatic text response as a reaction to the text input provided by a user. It is widely used in chatbots and virtual assistants. Real-life uses range from well-known all-purpose assistants like Siri, Alexa, Cortana, etc., to the more specialized ones like dialog agents helping lonely people. We see the practical use of automatically generated dialogs in such industries as tech, medtech, fintech, sales automation, and first-line technical support.

How does dialogue generation work

Popular language models like GPT-2, XLNet, T5, etc., allow producing coherent, good-looking text. But, unfortunately, the end result is lacking common sense and factual knowledge.

Classic models store knowledge in an implicit way as the parameters of a neural network. This means that:

  1. To store more knowledge, even larger networks need to be used (and they are already enormous).
  2. To update knowledge, the whole network needs to be re-trained.
  3. There is no confidence in the correctness of the end result as models can “hallucinate" and say things that aren't true, making slight factual mistakes — for example, confusing dates and numbers.

As seen in the scheme below, knowledge-grounded dialog generation adds the step on which relevant knowledge is introduced into the process to help generate a relevant response.

Classic vs knowledge-grounded dialog generation.

Classic vs knowledge-grounded dialog generation.

What is the best approach to knowledge-grounded dialog generation?

In this article, we’re considering and comparing document-grounded (document-based) and knowledge graph grounded approaches while also touching upon non-dialog text generation from knowledge graphs. None of these approaches presents an ultimate silver-bullet solution. However, each of them has advantages that could outweigh the cons for a given use case at hand. And each of these solutions improves text generation by adding external knowledge.

To select the correct approach for you, you need to be guided by what you are trying to build and how you are trying to help your client. So while this article won’t provide you with a clear-cut answer to what kind of approach will work best for you, we’ve gone through a considerable number of current scientific whitepapers regarding different types of knowledge-grounded dialogue generation. We also briefly covered non-dialog text generation with the help of knowledge graphs and a few popular datasets. All of this should allow you to find the approach that will work best for you.

Document-grounded dialog generation

The document-grounded dialog generation approach uses a certain document to create the response based on the contents of this document and the user’s query. It adds relevant textual documents as model input — usually, excerpts from Wikipedia articles are used.

All of the papers with this approach (including the ones listed below) assume that a document is provided, which is usually the case in datasets. The most frequently used datasets are:

High-level example of document grounded dialog system.

High-level example of document grounded dialog system.


Wizard of Wikipedia

A large dataset with conversations grounded using the knowledge retrieved from Wikipedia articles.

1365 discussion topics, 202,000 utterances


A dataset that contains movie chats. Each response is explicitly generated by copying and/or modifying sentences from unstructured background knowledge such as plots, comments, and movie reviews.

90,000 utterances from 9,000 conversations


A dataset based on Wikipedia articles about popular movies. It provides a relevant chat history while generating responses and also provides a source of information that the models could use.

4112 conversations with an average of 21.43 turns per conversation (per-utterance documents)

To be able to use the datasets in production, a separate document retriever is necessary. Classic retrievers were made using TF-IDF and BM25. But recently, methods with dense vector representations started to outperform TF-IDF and BM25 retrievers. Such new methods are, for example, Google’s ORQA and Facebook’s DPR, which formerly were mainly applied in question-answering context and not used for dialog generation.

We have studied numerous scientific articles that cover document-grounded dialogue generation. And these are the selected papers that we find to be useful for you. The papers are presented in order of relevance (most notable first), with our own opinionated summary for each in the “Concept” column:

TitleConceptSource codeFramework / Model backboneDataset(s)

Internet-Augmented Dialogue Generation (2021)

Part of BlenderBot 2.0. Converts dialog context into a query for a search engine. Then, all the retrieved documents are used in the Fusion-in-Decoder scheme. New dataset was used for supervised training of all the components.

PyTorch / Transformers

WoW, WoI (Wizard of Internet)

Knowledge-Grounded Dialogue Generation with Pre-trained Language Models (2020)

Utilizes knowledge selection module to fit the input — dialog context and external knowledge — into the GPT-2 length constraints. For training, reinforcement, and curriculum learning with warm-up on pseudo ground-truth selected knowledge is used. No training script and no retriever are available for this approach. However, a checkpoint is provided.


PyTorch / Transformers


Difference-aware Knowledge Selection for Knowledge-grounded Conversation Generation (2020)

Computes explicit differential information between the selected knowledge candidates and the knowledge from previous turns to be used for knowledge selection later.


PyTorch / RNN

WoW, Holl-E

DukeNet: A Dual Knowledge Interaction Network for Knowledge-Grounded Conversation (2020)

Uses dual learning for simultaneous unsupervised learning of knowledge shifter and posterior knowledge tracker, which is used only during training. The initial knowledge tracker used during inference is trained via posterior.


PyTorch / RNN

WoW, Holl-E

Incremental Transformer with Deliberation Decoder for Document Grounded Conversations (2019)

Uses an Incremental Transformer for incremental encoding of utterances with knowledge and Deliberation Decoder for two-step decoding — one for text coherence and the other for knowledge correctness.


PyTorch / Transformers


RefNet: A Reference-aware Network for Background Based Conversation (2020)

Dynamically switches between generative and extractive methods during the response generation to combine the best of two worlds.


Tensorflow / RNN


Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue (2020)

Treats selected knowledge as a latent variable for end-to-end sequential knowledge selection and response generation, with copy mechanism.


Tensorflow / Transformers, RNN

WoW, Holl-E

Thinking Globally, Acting Locally: Distantly Supervised Global-to-Local Knowledge Selection for Background Based Conversation (2020)

Selects global knowledge utterances. Next, on each generation step, it can select local knowledge — tokens from global knowledge.


PyTorch / RNN


Knowledge graph grounded dialog generation

The knowledge graph grounded dialog generation approach adds a relevant chunk of a large knowledge graph or a single triple from it as model input. The main knowledge graphs used in knowledge graph grounded approaches are ConceptNet, ATOMIC, Freebase, with the main dialog source being Reddit.

High-level example of knowledge graph grounded dialog system.

High-level example of knowledge graph grounded dialog system.

These are the selected papers that propose knowledge graph grounded approaches. They are also chosen and annotated by us in the "Concept" section and are presented in order of relevance:

TitleConceptSource codeFramework / Model backboneDataset(s) / Knowledge graph

Commonsense Knowledge Aware Conversation Generation with Graph Attention (2018)

A classic work on combining dialog generation with external commonsense knowledge. One of the first works of this kind. Its official codebase is written in Python 2; however, there is an unofficial implementation that utilizes Python 3.

@thu-coai/ccm unofficial

Tensorflow / RNN

Reddit / ConceptNet

Diverse and Informative Dialogue Generation with Context-Specific Commonsense Knowledge Awareness (2020)

Can be viewed as further development of CCM. The knowledge fact set is retrieved from the graph and ranked by felicitous fact recognizer. The results are later used by a triple decoder, which can either copy words from context, subgraph, or select them from the vocabulary.


Tensorflow / RNN

Reddit, Weibo / ConceptNet

Grounded Conversation Generation as Guided Traverses in Commonsense Knowledge Graphs (2020)

Augments response generation with a chunk of the commonsense knowledge graph. It starts with entities found in the previous utterance and grows a subgraph by adding 1- and 2-hop concepts, which form the inner and the outer flow and are used separately.


PyTorch / RNN

Reddit / ConceptNet

TopicKA: Generating Commonsense Knowledge-Aware Dialogue Responses Towards the Recommended Topic Fact (2020)

Predicts triple from knowledge graph for language model conditioning and diffuses the triple into a distribution over facts. Since the generator requires discrete fact for conditioning, this approach integrates non-differentiable sampling from the distribution via four different schemes, including latent variable and Gumbel-softmax approaches. The triple decoder can sample tokens from three distributions — fixed vocabulary, conditioning triple, and dialog context.


Tensorflow / RNN

Some chinese dataset from Weibo / ConceptNet

Improving Knowledge-aware Dialogue Generation via Knowledge Base Question Answering (2020)

Uses separately trained knowledge-base question-answering model for dialog generation.


Tensorflow / RNN

Reddit / ConceptNet and FB2M (Freebase)

Opendialkg: Explainable conversational reasoning with attention-based walks over knowledge graphs (2019)

A dialog dataset annotated with a large-scale knowledge graph alongside a graph walker model.


No implementation code for the baseline model exists; there is only data / RNN

New dataset / Freebase subset

Proactive Human-Machine Conversation with Explicit Conversation Goa (2019)

A dialog dataset in Chinese with dialogs over a knowledge graph + a baseline model.


PaddlePaddle / Transformers, RNN

Original Chinese dataset — DuConv / Small knowledge graphs from the dataset

DyKgChat: Benchmarking Dialogue Generation Grounding on Dynamic Knowledge Graphs (2019)

A dialog dataset with dialogs from two Chinese and English-language sitcoms (“Friends”) alongside dynamic knowledge graphs and a baseline model for new tasks.


Tensorflow / RNN

Own data based on “Friends” TV series and a Chinese sitcom / Small dynamic knowledge graphs from the dataset

Non-dialog text generation from knowledge graphs

Even though we focus on dialog generation in this article, there are a number of text2text problems that can also benefit from external commonsense knowledge, for example — story generation. The results of these tasks can be transferred to dialog generation, and this poses an interesting research direction. All the tasks here are sequence-to-sequence text generation, just like the main dialog generation problem. By solving one problem, the approach can often be applied to similar problems.

The selected papers that propose non-dialog text generation with knowledge graphs are also chosen and annotated by us in the "Concept" section and are presented in order of relevance:

TitleConceptSource codeFramework / Model backboneDataset(s) / Knowledge graph

KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense Reasoning (2020)

Knowledge-graph augmented pretrained language model. It is trained on concepts-to-text task, which later can be transferred to other problems, including dialog generation.


PyTorch / Transformers

CommonGen / ConceptNet

Language Generation with Multi-Hop Reasoning on Commonsense Knowledge Graph, 2020

At its core, this is the Seq2Seq approach that uses commonsense knowledge graph ConceptNet. The subgraph with mentioned entities is extracted and enriched with adjacent nodes. The node and relational embeddings are computed via graph network and later are scored using context embedding. During generation, the gate mechanism can select these entities as tokens based on the computed score.


PyTorch / Transformers

ROC stories / ConceptNet

Text Generation from Knowledge Graphs with Graph Transformers, 2019

A new dataset consisting of scientific texts, paired with knowledge graphs for data-to-text task + Transformer model.


PyTorch / Transformers

New dataset — AGENDA / Graphs from dataset

A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation, 2020


Tensorflow / Transformers

ROC stories / ConceptNet, ATOMIC

Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven Cloze Reward, 2020


PyTorch / Transformers

NYT, CNN/DM / Graphs built using StanfordNLP and OpenIE

Story Ending Generation with Incremental Encoding and Commonsense Knowledge, 2018


PyTorch / RNN

ROC stories / ConceptNet

More daily dialogue datasets and additional information

It’s also worth introducing you to several new, purely-dataset papers and a knowledge graph paper with no baseline models. These papers can be as important as the dialog-generation approaches that were covered above.

Last but not least, we recommend checking out the 2020 survey paper “Towards information-rich, logical text generation with knowledge-enhanced neural models” that covers the knowledge-enhanced text generation systems, the state of the research that deals with these systems, and suggests new research directions.

Knowledge-Grounded Dialogue Generation: Summary

The process of generating dialogues that are both accurate and usable for real-life applications (virtual assistants, intelligent chatbots, etc.) requires using relevant external knowledge. The main existing approaches to internet-augmented dialogue generation are:

  • Document-grounded dialog generation
  • Knowledge graph grounded dialog generation

There are advantages and disadvantages to each approach, depending on your end goal. Which is why in this article we’ve covered dozens of research papers that present their own take on knowledge-grounded dialogue generation, adding a short summary for each. Remember that no single approach is perfect and consider your initial goal when selecting the approach to use. Hoping that this overview will make it easier for you.

Have an idea? Let's discuss!

Book a meeting
Yuliya Sychikova
Yuliya Sychikova
COO @ DataRoot Labs
Do you have questions related to your AI-Powered project?

Talk to Yuliya. She will make sure that all is covered. Don't waste time on googling - get all answers from relevant expert in under one hour.
Send us a note
File requirements pdf, docx, pptx


Mykyta Oliinyk
ML Researcher @ DataRoot Labs
Mykyta is a Deep Learning Researcher at DataRoot Labs with the main focus on NLP projects. He has solid experience with research, implementation, and validity check of state-of-the-art deep learning papers including in reinforcement learning & generative adversarial networks (GANs). On daily basis, he runs experiments on training SotA neural networks and speech synthesis. Mykyta has graduated with a Degree in Software Engineering from Kyiv National University and is a twice bronze winner of all-Ukrainian National Olympiad in Informatics. In his free time, he enjoys swimming, tennis, and chess.


Ivan Didur
CTO @ DataRoot Labs
dataroot labs logo
Copyright © 2016-2024 DataRoot Labs, Inc.