System has to be able to speak with different known voices, such as Scarlett Johansson.
Voice has to be generated with 15 different emotions and intensity levels.
Voice has to be generated in less than 1/100 of the second independently of audio length.
The system is able to chat with the user, have some common sense on things, know and remember a lot of different information, be disconnected from the internet (we don’t want to create one more Google Assistant).
System should be able to pass the Turing Test (Imitation game).
For the voice, seq2seq with attention architecture was used, called tacotron2. It was implemented almost from scratch, by that time there weren't good implementations.
For voice emotion control GST tokens were applied, which utilizes multi-head self-attention mechanism.
For faster than real-time inference WaveGlow post-processor was used, which is basically parallelized version of WaveNet.
For the best possible quality of voice more data was gathered from voice actors. Special technical task was created to reach the best possible quality.
For the dialog system, Blender + Large Attention + PPLM is used with custom conditioning mechanism, which gives us control over different Dialogue Acts, Emotions, Topics, Contexts and Tasks (Q&A, chit-chatting, generation).
The consistency problem is solved by parsing all the utterances with SOA and NER mechanisms and updating the database. The best possible database of knowledge about the user is Neo4j. We storing the data as a graph.
In order to solve the dialog management problem, we apply DDPG, Rainbow and A3C + Imagination & Curiosity blocks with Dialogue Acts, Emotions, Topics, Contexts and Tasks as an action space (Reinforcement Learning).
In order to solve Reinforcement Learning the WebSocket interface was created for Amazon Mechanical Turk to receive rewards for dialog from users.
In order to gather good quality dialogs we randomly merge users to chat & rate each other without them knowing they speak with bot or user.
Project is in the active development phase, and currently is purely technical without any business part.
Yuliya is a co-founder and COO of DataRoot Labs, where she oversees operations, sales, communication, and Startup Venture Services. She brings onboard business and venture capital experience that she gained at a leading tech investment company in CEE, where she oversaw numerous deals and managed a portfolio across various tech niches including AI and IT service companies.