An Introduction To Python & Machine Learning For Technical SEO – Search Engine Journal

Share on facebook
Share on google
Share on twitter
Share on linkedin

Python is used to power platforms, perform data analysis, and run their machine learning models. Get started with Python for technical SEO.
Since I first started talking about how Python is being used in the SEO space two years ago, it has gained even more popularity and a lot of people have started to utilize and see the benefits of using it in their day-to-day roles.
It’s really exciting to see so many SEOs share their experiences, the cool scripts they have written, and the impact it has had on their jobs.
It wouldn’t be right for me to publish this without mentioning the impact that Hamlet Batista had on me and so many other people. He loved seeing people learn and use Python.
I know he would be so proud to see so many people sharing their journey of learning Python, and all of the amazing scripts that people have written.
In short, Python is an open-source, object-oriented interactive programming language that is interpreted line by line.
With simple and easy to learn syntax, as well as advanced readability and support for several modules and libraries, Python is well-loved due to the increased productivity it provides.
As a testament to this, Python is used by some of the biggest organizations in the world to power their platforms, perform data analysis, and run their machine learning models.
Companies including Google, YouTube, Netflix, NASA, Spotify, and IBM have publicly stated Python has been an important part of their growth, due to its simplicity, speed, and scalability.
In fact, Google’s first web-crawler was actually written in Python and it remains one of their official server-side languages.
You can run Python scripts in several ways, depending on what works best for you.
Most systems come with Python already installed, this will more than likely be Python 3, but you can find out which version you have by typing python –version in your terminal.
If you have Python 2 installed, you can update this to Python version 3 by downloading Python 3 from the Python website as Python 2 was officially deprecated in 2020 and there are some syntax differences between the two, so it is best to ensure you use Python 3.
You can run Python from your terminal or command line IDE (Integrated Development Environment), as well as desktop-based platforms including Pycharm or VSCode. Alternatively, you can use cloud-based alternatives including:
These provide an easier experience for beginners to learn and test elements of code line by line, as well as to share and collaborate with your team.
There are several online tools available for learning Python, and the best method depends on your own learning style. For example, if you are a visual learner and enjoy following along to video coding, then freeCodeCamp is a great place to start.
If you work better with a more project-structured learning style then Codecademy and Sololearn are great places to try out. These websites also provide a way to track your learning and start a project portfolio.
Some sites gamify the learning journey, such as CodeCombat and Checkio, these provide a great way to build a habit of coding each day, in a fun way.
If you prefer to code along with an instructor in real-time and identify as a woman or non-binary, then you can also sign up for a free 8-week course with Code First Girls (disclaimer, I work for Code First Girls).
Once you feel comfortable with the fundamentals of Python, the best thing to do is start working on projects, either creating your own, or building upon one of the many scripts that have been shared in the Python community.
These projects don’t necessarily need to be related to SEO, but it can sometimes be useful to have practical examples to use when working on projects.
If you’re interested in the data analysis side of Python, then it’s definitely worth checking out and using the free datasets available on Kaggle.
The main power of Python is in its libraries, which enable several extra functions including:
Some useful libraries for tasks involving data analysis and automation in SEO include:
While having an understanding of the languages which power the websites we work on (such as HTML, CSS, and JavaScript) is important, Python provides many automation opportunities for low-level tasks which we would usually spend several hours undertaking.
Python empowers SEO professionals in several ways as it not only enables us to automate repetitive tasks but also to extract and analyze large data sets.
The amount of data marketers work with is only increasing, so being able to efficiently analyze this will help to solve many complex problems in a shorter amount of time.
This in turn saves valuable time and allows us to be more efficient in undertaking other important SEO tasks. These factors combined have led to a growth in the popularity of Python amongst SEO professionals.
The ability to better understand data will not only help us do our jobs better but will also allow us to make data-driven decisions.
These decisions will then enable us to provide concrete insights for our clients and stakeholders and have more confidence in the recommendations we implement.
While Python will not be able to imitate human, emotion-led strategies, Python scripts can be used to automate a large number of time-consuming tasks.
This list of tasks you can automate with Python is growing continuously but includes:
The best way to add Python into your workflow is to start thinking about what can be automated, particularly tedious, time-consuming tasks.
Alternatively, think of ways you can more efficiently deal with and make conclusions from the data you have available to you.
A great way to get started is to play around with the data from your website that you already have access to, for example from a site crawl or your analytics tool.
Don’t be afraid to take inspiration from other people’s scripts, play around and even break something when learning, as this is often the best way to learn.
Finding the cause of an issue and ways to fix it is a big part of what we do as SEOs, and it’s really the same when learning and using Python.
There are also so many useful articles from other SEOs who have shared practical examples of how they are using Python for SEO-related tasks. I would recommend checking out SEO Pythonistas to explore some of these.
Ready to get started with Python?
Here are a few useful scripts which I have found useful for numerous tasks, along with a brief description of how each one works and the challenges they solve.
The first practical way you can use Python is to identify if the redirect mapping that has been implemented for a migration is accurate, by creating a redirect relevancy script.
This involves taking a crawl of your site pre and post-migration and segmenting the different categories based on their URL structure.
You can then use some of Python’s built-in comparison operators to determine if the folder and depth of each page have stayed the same or changed following the migration.
The script will take each of your URLs and compare them pre and post-migration to identify if they are the same and the results will output to a new table that will state True if they are the same, or False if they have changed.
You can also use the Python library Pandas to create a pivot table that can display a count of how many URLs for each category match and how many have changed.
This will enable you to investigate any categories or URLs which don’t match and review the redirect rules that have been set up.
Another practical script that uses crawl data is using Python to perform internal link analysis.
This will allow you to identify the sections of your site that have the most internal links, as well as discover opportunities to improve internal linking for different sections.
This will again use segmentation to determine the different categories of the URLs and pivot tables to export a count of the number of internal links to each category on the site.
This is the first script that introduced me to the language and the one that kick-started my desire to learn.
Using Pythia, which is a modular deep learning framework created by Facebook, this script generates a caption for an image URL.
This caption can then be used for images currently missing alt tags, which are important for accessibility and image search.
The script is based upon the bottom-up and top-down mechanism, which calculates results by focusing attention on different elements within an image.

For each word generated, attention is weighted to individual pixels within the image, outlining the region with the maximum attention.
The ease of this script is because it can be run straight from Google Colab and requires no direct coding.
Once a copy of the necessary code is saved to your personal Google Colab drive, all cells can be run, performing each step for you.
This will download the data sources needed to run the process, as well as automatically complete all of the steps that would typically need to be undertaken manually.
For example, all libraries will be installed, classes will be created and functions assigned.
This will generate an area to add in your image URL and a button to caption the image.
A caption will then be provided for each image, which can be directly used as an alt tag or to inspire the creation of one.
Hamlet has written a comprehensive guide to generate text from images with Python which shows this script in action.
Python is also great to use with APIs, for example, Google’s Page Speed Insights API. This will allow you to measure key performance metrics at scale, saving you time from having to test each URL.
Using a CSV file with all of the URLs you want to test, you can run each through the API and create a response object to hold all of the metrics for each URL.
You can then extract the specific metrics, for example, LCP, CLS, and FID, and generate a table displaying these metrics for each URL.
You can also extract other useful things from the API including layout shifting elements for each page, the largest contentful paint element, and a list of all third-party blocking tags or unused CSS and JS files on each page.
These examples are just scratching the surface, there are many more automation and optimization possibilities using Python scripts, including:
Python is also a popular language used to power machine learning applications due to its simple, intuitive, and accessible syntax.
In addition, there are a large number of useful libraries which are helpful when working with and training machine learning models.
Machine learning is essentially “an application of artificial intelligence that provides systems with the ability to automatically learn and improve from experience, without the need to be explicitly programmed” (a full definition can be found here).
Machine learning is often used to identify patterns in data, upon which predictions can then be made.
There are two main types of machine learning, the first is supervised learning which is trained on labeled data, where a training set has input with the desired output.
The learning algorithm is therefore already given the answer when reading the data. The correct outcome for each data point is explicitly labeled when training the model.
Whereas unsupervised learning is trained using information that is not labeled so it allows the algorithm to act on the information without guidance. This is often used to test the capabilities of the system or when you do not have pre-labeled data.
Run in conjunction with machine learning, Python can be used to power scripts for training a dataset, before it summarizes and visualizes the data.
From here, the model will evaluate the algorithms to enable predictions to be made.
The use of machine learning on the web is increasing all the time, with new models being created and training data becoming more accessible daily. In some cases, we are also being used to help train them.
Some real-world machine learning examples include:
Due to their ability to solve complex problems, it is no surprise that machine learning models are being used to help make marketers’ lives easier.
As Britney Muller says:
“Machine Learning is becoming more accessible and will free us up to work on higher-level strategy.”
This will enable you to spend more time finding solutions, rather than just identifying problems.
Some examples of machine learning models used in SEO include:
Here are some examples of Machine Learning that are being used for SEO tasks, which you may have even come across.
Based on user navigation patterns from website analytics, tools such as guess.js build machine learning models that can predict which pages users are most likely to visit next and prefetch the resources that will need loading.
Other examples of this in practice include predicting the next piece of content a user is likely to want to view and adjusting user experience to account for this.
As well as predicting widgets that a user is likely to interact with and tailoring a more custom experience with this in mind.
There are two different ways machine learning can help with internal linking.
The first is to update broken links, this can be done by crawling to identify broken internal links, then using an algorithm to suggest the most accurate replacement page and replacing broken internal links.
The other is suggesting relevant internal linking based on big data. These tools use algorithms that are fine-tuned to constantly acquire new information so that they can suggest more internal links after some time.
They also start suggesting relevant internal links as an article is being written.
The next example is improving content quality by predicting what users and search engines would prefer. You can do this by building a model that generates insights on the factors that are most important.
These factors can include things such as search volume and traffic, conversion rate, internal links, bounce rate, time on page, and word count.
You will then use those important factors to train a machine learning model, which generates a content quality score for each page.
Machine learning is also being used to help improve user experience, and there are many examples of how this is being used, for example, Instagram uses sentiment analysis to identify and address bullying language.
Twitter also uses it for image cropping, to ensure they crop images to display the most important part, for example, to focus on the text.
The text for these images is in different places on each, but Twitter crops them to display the text in the preview. This machine learning model was trained on thousands of images, and started like this, before being able to identify the most important part of the image.
Computer vision is also being used to help with user experience, by automatically identifying what is in an image, to make images accessible by explaining to users what an image is.
I hope this has inspired you to start learning Python and explore how it can help you with automating tasks and analyzing complex data to increase your efficiency.
As a final note, please remember that you don’t need to learn Python to be a good SEO, but if you’re intrigued or interested then I hope you have fun learning and putting into practice some Python scripts into your workflow.
To continue to honor Hamlet’s passion for encouraging and celebrating others, I wanted to share some of the amazing things shared by the SEO community this year.
Moshe Ma-yafit wrote a cool script on how to detect competitors’ price changes with Python & send email alerts. You can find an article explaining this together with a Github repository.
Lazarina Stoy has a script for generating meta descriptions as well as a guide to using Pytrends with Python.
Francis Angelo Reyes has written a script for a simple redirect mapping tool in Python. It goes through each URL and finds its match. The app is also in the article so you can try it there!
Yaniss Illoul has worked on a Broken Links Finder in Python. As well as a tool to capture keywords rankings across multiple domains.
Danielle Rohe shared a script to download all sitemaps within a sitemap index as well as loop through each and extract all URLs into a CSV file.
Muhammad Hammad has built a really cool script for NLP and content analysis of SERPs.
Charley Warginer has also shared some awesome scripts this year, including one to generate FAQs for your pages automatically, the BERT Keyword Extractor, and a Keyword Clustering app.
More resources:

Featured Image: fatmawati achmad zaenuri/Shutterstock
Get our daily newsletter from SEJ’s Founder Loren Baker about the latest news in the industry!
Ruth is a Programmes and Data Manager at Code First Girls and spends her time managing the coding programmes and … [Read full bio]
Subscribe to our daily newsletter to get the latest industry news.
Subscribe to our daily newsletter to get the latest industry news.

source