ML-Powered Data Warehouse for SERP Analytics

Monitoring website position inside global and local search engines.

Yuliya Sychikova
COO @ DataRoot Labs
15 Jun 2020
6 min read
ML-Powered Data Warehouse for SERP Analytics
Verified review on
Client Services
Performance Marketing
independent verified review on
Read independent verified review on


  • Modern approaches for monitoring the site position inside global and local search engines require huge amounts of different textual queries.
  • A leading digital marketing agency wanted a scalable and automated service, dedicated to performing search engine optimization analysis.
  • The agency used such engine for day-to-day and long term analytics and monitoring of performance of SEO optimized sites.
  • Natural Language Processing and other Machine Learning methods were the foundations of the solution implemented by our team.

Tech Stack

Apache Spark


2 Weeks
Data Gathering Parser
Data Engineer
1 Week
Solution Architecture Design
Solution Architect
1 Week
Feature Extraction Pipeline Development
Deep Learning Engineer
Deep Learning Researcher
1 Week
Development of Customised LSH
Deep Learning Researcher
Deep Learning Engineer
1 Week
Clustering Performance Optimization
Deep Learning Engineer
Data Engineer
2 Weeks
Data Warehouse Configuration
Data Engineer
6 Weeks
Web Platform Development
Backend Developer
Frontend Developer
2 Weeks
Integration & Deployment
Backend Developer
Dev Ops

Tech Challenge

  • Implementation required removal of the laborious manual tasks from the SEO team, allowing the client to considerably improve quality and revenues of the services.
  • It was important that the entire range of analytics, from days to years, is in full disposal of the SEO specialist to adjust the parameters and predict the outcome.
  • Parsing of Google Search Console of the websites and then parse google for search queries results taken from GSC.
  • Clustering query-results matrix by links where the size of the matrix could be tens of millions squared.


  • Peek queries are formed automatically by our own Natural Language Processing algorithm, applying modern approach to monitoring the site position inside global and local search engines.
  • This algorithm considers structure and content of the target site pages and builds huge amounts of different textual queries to get the whole picture of the site’s performance.
  • Those queries are performed and stored inside the database on a daily basis for whole range of sites. Each site is then analyzed against the competition, using the tool we have built.
  • Different ranges of analytics on day-to-years scale are accessible to the SEO specialist for further iterations.
  • Customized and optimized Apache Spark based LSH (Locality Sensitive Hashing) for approaching near linear clustering complexity - O(bn). Avgerage clustering time for 10M x 10M matrix takes near 15min.


  • Our team has built a scalable service, which performs search engine optimization analysis.
  • It is used for daily as well as long-term analytics and monitoring the performance of the SEO optimized sites.
  • By eliminating the need for manual creation of SEO queries, our solutions has saved the agency at least thousands of working hours allowing to allocate people resources elsewhere.


Yuliya Sychikova
COO @ DataRoot Labs
Yuliya is a co-founder and COO of DataRoot Labs, where she oversees operations, sales, communication, and Startup Venture Services. She brings onboard business and venture capital experience that she gained at a leading tech investment company in CEE, where she oversaw numerous deals and managed a portfolio across various tech niches including AI and IT service companies.


Ivan Didur
CTO @ DataRoot Labs
offices map
Kyiv (HQ)
Max Frolov
CEO @ DataRoot Labs
Tel Aviv
Ivan Didur
CTO @ DataRoot Labs
Los Angeles
Yuliya Sychikova
COO @ DataRoot Labs
builds and implement AI-powered systems across different verticals to help our clients operate effectively.