Empathetic & Intelligent Virtual Assistant

Talk to her and stop being lonely.

Yuliya Sychikova
COO @ DataRoot Labs
11 Jun 2020
4 min read
Empathetic & Intelligent Virtual Assistant

Summary

  • Millions of people, especially in Asia, suffer from loneliness.
  • Our client develops a dialog agent to be a user's friend.
  • Dialog agent is capable to remember all the user information and detect emotions in the user's voice as well as the content from the user.
  • DRL used advanced NLP technologies to build an end-to-end Dialog Agent which can be installed to Alexa or Google device (currently available as a closed beta).
11
Months
7
Engineers
7+
Technologies

Tech Stack

AWS
GCP
PyTorch
Python
React Native
TensorFlow
Scala

Tech Challenge

  • System has to be able to speak with different known voices, such as Scarlett Johansson.
  • Voice has to be generated with 15 different emotions and intensity levels.
  • Voice has to be generated in less than 1/100 of the second independently of audio length.
  • The system is able to chat with the user, have some common sense on things, know and remember a lot of different information, be disconnected from the internet (we don’t want to create one more Google Assistant).
  • System should be able to pass the Turing Test (Imitation game).

Solution

  • For the voice, seq2seq with attention architecture was used, called tacotron2. It was implemented almost from scratch, by that time there weren't good implementations.
  • For voice emotion control GST tokens were applied, which utilizes multi-head self-attention mechanism.
  • For faster than real-time inference WaveGlow post-processor was used, which is basically parallelized version of WaveNet.
  • For the best possible quality of voice more data was gathered from voice actors. Special technical task was created to reach the best possible quality.
  • For the dialog system, Blender + Large Attention + PPLM is used with custom conditioning mechanism, which gives us control over different Dialogue Acts, Emotions, Topics, Contexts and Tasks (Q&A, chit-chatting, generation).
  • The consistency problem is solved by parsing all the utterances with SOA and NER mechanisms and updating the database. The best possible database of knowledge about the user is Neo4j. We storing the data as a graph.
  • In order to solve the dialog management problem, we apply DDPG, Rainbow and A3C + Imagination & Curiosity blocks with Dialogue Acts, Emotions, Topics, Contexts and Tasks as an action space (Reinforcement Learning).
  • In order to solve Reinforcement Learning the WebSocket interface was created for Amazon Mechanical Turk to receive rewards for dialog from users.
  • In order to gather good quality dialogs we randomly merge users to chat & rate each other without them knowing they speak with bot or user.

Impact

  • Project is in the active development phase, and currently is purely technical without any business part.

Author

Yuliya Sychikova
COO @ DataRoot Labs
Yuliya is a co-founder and COO of DataRoot Labs, where she oversees operations, sales, communication, and Startup Venture Services. She brings onboard business and venture capital experience that she gained at a leading tech investment company in CEE, where she oversaw numerous deals and managed a portfolio across various tech niches including AI and IT service companies.

Co-Authors

Ivan Didur
CTO @ DataRoot Labs
offices map
Kyiv (HQ)
Max Frolov
CEO @ DataRoot Labs
Tel Aviv
Ivan Didur
CTO @ DataRoot Labs
Los Angeles
Yuliya Sychikova
COO @ DataRoot Labs
builds and implement AI-powered systems across different verticals to help our clients operate effectively.