ML-Powered Data Warehouse for SERP Analytics

Monitoring website position inside global and local search engines.

DRL Team
AI R&D Center
15 Jun 2020
6 min read
ML-Powered Data Warehouse for SERP Analytics
Client Services
Industries
Performance Marketing
independent verified review on clutch.co
Read independent verified review on Clutch.co

Summary

  • Modern approaches for monitoring the site position inside global and local search engines require huge amounts of different textual queries.
  • A leading digital marketing agency wanted a scalable and automated service, dedicated to performing search engine optimization analysis.
  • The agency used such engine for day-to-day and long term analytics and monitoring of performance of SEO optimized sites.
  • Natural Language Processing and other Machine Learning methods were the foundations of the solution implemented by our team.

Tech Stack

Akka
Apache Spark
Cassandra
GlusterFS
Kafka
PostgreSQL
Python
Scala
TensorFlow

Timeline

2 Weeks
Data Gathering Parser
Data Engineer
1 Week
Solution Architecture Design
Solution Architect
1 Week
Feature Extraction Pipeline Development
Deep Learning Engineer
Deep Learning Researcher
1 Week
Development of Customised LSH
Deep Learning Researcher
Deep Learning Engineer
1 Week
Clustering Performance Optimization
Deep Learning Engineer
Data Engineer
2 Weeks
Data Warehouse Configuration
Data Engineer
6 Weeks
Web Platform Development
Backend Developer
Frontend Developer
2 Weeks
Integration & Deployment
Backend Developer
Dev Ops

Tech Challenge

  • Implementation required removal of the laborious manual tasks from the SEO team, allowing the client to considerably improve quality and revenues of the services.
  • It was important that the entire range of analytics, from days to years, is in full disposal of the SEO specialist to adjust the parameters and predict the outcome.
  • Parsing of Google Search Console of the websites and then parse google for search queries results taken from GSC.
  • Clustering query-results matrix by links where the size of the matrix could be tens of millions squared.

Solution

  • Peek queries are formed automatically by our own Natural Language Processing algorithm, applying modern approach to monitoring the site position inside global and local search engines.
  • This algorithm considers structure and content of the target site pages and builds huge amounts of different textual queries to get the whole picture of the site’s performance.
  • Those queries are performed and stored inside the database on a daily basis for whole range of sites. Each site is then analyzed against the competition, using the tool we have built.
  • Different ranges of analytics on day-to-years scale are accessible to the SEO specialist for further iterations.
  • Customized and optimized Apache Spark based LSH (Locality Sensitive Hashing) for approaching near linear clustering complexity - O(bn). Avgerage clustering time for 10M x 10M matrix takes near 15min.

Impact

  • Our team has built a scalable service, which performs search engine optimization analysis.
  • It is used for daily as well as long-term analytics and monitoring the performance of the SEO optimized sites.
  • By eliminating the need for manual creation of SEO queries, our solutions has saved the agency at least thousands of working hours allowing to allocate people resources elsewhere.

Have an idea? Let's discuss!

Book a meeting
Yuliya Sychikova
Yuliya Sychikova
COO @ DataRoot Labs
Do you have questions related to your AI-Powered project?

Talk to Yuliya. She will make sure that all is covered. Don't waste time on googling - get all answers from relevant expert in under one hour.
OR
Send us a note
Optional
File requirements pdf, docx, pptx

Author

DRL Team
AI R&D Center
Our team shares experiences and insights on how AI and ML change and shape new markets, optimize various industries and our lives.

Co-Authors

Ivan Didur
CTO @ DataRoot Labs
dataroot labs logo
Copyright © 2016-2024 DataRoot Labs, Inc.