At SMX Advanced back in June I attended a session where I heard about a really cool idea. The concept was that you could take a ton of keywords, and then map out what sites were showing up on those keywords to really get a good idea of who your real search competitors are.
The bad thing was that they said that they used their own developed software. I’m always bothered when I get excited about an idea, but then have no way of reproducing it. Well, I did some digging around and was able to find a way to do a similar type of report. Sure, it’s more labor intensive, but you can really learn some thing about who else is competing for your same keyword sets.
Below I’m going to outline the techniques I use to get the data, how to organize it, and what to do with it once you have the information. I admit that it may be a little choppy, so if you have any questions on it feel free to hit me up on Twitter: @dan_patterson
Step 1 – Get a List of Keyword Sets
Since the goal of this whole exercise is to find other sites that are going after the same keyword sets as you, the first step in this process is to come up with a large list of keywords. A good place to start is to go through you analytics to find keywords that you’re already getting traffic from.
Once you have a list, break them up into topic sets. This way, you will be able to find which sites are full competitors or just partial competitors. Come up with a short name for all of these sets, and use this when you’re scraping results to identify which set that scrape belongs in.
Step 2 – Scraping the SERPs
There are plenty of tools out there that you can use to scrape the search results. Some of them use proxies and other tactics that the search engines aren’t fond of, so instead I’m going to go over two tools you can use that shouldn’t raise any of these problems.
The first is a handy little Firefox plugin called OutWit Hub. The second I’ll go over is the SEOmoz Pro Keyword Difficulty & SERP Analysis Tool for those of you that are already members of SEOmoz Pro.
Scraping SERPs with OutWit Hub
1- Download OutWit Hub
Go to http://www.outwit.com and download OutWit Hub (Free Firefox Plugin). Technically it’s a site content scraper tool, so we’re going to use it to scrape URLs from the SERPs.
2- Change Your Google Search Settings
In order to effectively use the plugin, you’re going to have to change a few of your Google search settings
- Turn off Google Instant by going to your account Search Settings, and then choosing “Do not use Google Instant”.
- Decide how deep you want to look and set “Number of Results” to match. You can choose 10 (default), 20, 30, 50, or 100.
- Save your preferences and go back to Google Search
Do a search for your first term. Once the SERPs come up, click on the OutWit Button in Firefox.
This will give you the OutWit Hub Window. In the window, click on the ‘Guess’ option.
This will give you the info from the SERPs. We’re most interested in the “id” (rank) and “URL” columns. You can either export this info or just copy and paste it into Excel.
Ann Smarty did a post about OutWit a while back and also has a custom scraper you can use for Google results. The only problem I’ve found with this is that sometimes you’ll get URLs with spaces in them from breadcrumbs, which makes it a little harder to filter things down in Step 3. If you are in a niche that doesn’t have this problem, this can be a faster way to go.
4- Download Your Scrape and Clean It Up
One problem with OutWit Hub is that it can be inconsistent. Sometimes you get local listings in the export, sometimes you don’t. Sometimes you get paid listings in there. Somteims you don’t. So you have to watch what you’re scraping and make sure you’re actually getting the right info. They usually have a heading row, but you still have to do some filtering and cleanup work to get an accurate list. When you do this, make sure you also update the id (rank) column to reflect the real ranking you’re seeing.
You can either export the data to a CSV, or you can also just copy and paste it into Excel. I like the copy and paste option because if I see some paid ads at the top or bottom of the data, I can just not copy those rows.
5- Rinse and Repeat
This is unfortunately the labor intensive part of this whole process. You’ll have to repeat this process for all of the keywords you want to check. Again, there are other tools that do a little bit more brute force against Google, but OutWit Hub is a great FREE tool that will help you get the data you need if you’re willing to take the time.
No matter which method you use, make sure that you add a column at the beginning that includes your shortname for each set before the rank and URL of each scrape. This way you can identify which set the rankings and URLs belong to later.
Also, make sure you’re combining all of your data into one spreadsheet so we can do the comparison and filtering later. In Step 3 of this whole process I’ll show you what to do once you have all of your scraping done.
SEOmoz Keyword Difficulty & SERP Analysis Tool
In the long run, I think that using the SEOmoz tool is a lot easier and cleaner to use for this exercise. One nice thing about using the Keyword Difficulty & SERP Analysis Tool is that you can run up to 5 keywords at a time, and you don’t have the cleanup work that you have to do with OutWit. Once difference between the two is that with OutWit you can dig as deep as you want to set your Google Search settings. With SEOmoz you will get the top 25 and that’s it.
Here are the steps to getting the same data with the SEOmoz Keyword Difficulty & SERP Analysis Tool:
1- Run a Report (up to 5 at a time)
Sometimes I’ve found that the tool will time out if you run 4 or 5, so if you’re having that problem just run 3 and you’ll have an easier time.
2- CSV Export
Once the report loads, click on it and then choose the “Export to CSV” link down towards the bottom. It’s above the table with all the pretty greens and reds.
3- Rinse and Repeat
The only columns we need are ‘Rank’ and ‘URL’. If you want to start getting in to Domain and Page Authority comparisons you could use that data as well, but for this blog post I’m just going to keep it simple.
Just like with the OutWit Hub data, make sure you’re combining all of your CSV download into one master file so you can do the filtering you’ll need to do.
Step 3 – Filter Down to Just Domain Names
In order to really do the comparison, you need to filter your SERP scraping down to just the domain names. With a little Excel formula magic, this is easily done. Here are the basic steps to follow in Excel. Since there are so many different version of Excel and other spreadsheet programs, I’m just going to give you the basic steps and formulas here so you can do what you need to in the program/version you’re using.
Before you do the steps below, make sure to MAKE A COPY OF ALL YOUR SCRAPED DATA. We’re going to filter down to just the domain names you’ve scraped, but that’s only so we have a list of unique domains that we can then do some counting and averages on. You have to leave your original data so you can get the counts. So I repeat, make a copy of all your scraped data and do the steps below on the copy.
- Use ‘Text to Columns’ and delimit on ‘/’. This is probably the easiest way to break out the http: and any other folders in the URLs you’ve scraped. Delete all of the columns that don’t have just the domain name.
- Get rid of www. Since some of your scraping with have URLs with www and some won’t, we need to get rid of these. Sort your list of domain names alphabetically. Then, do another text to columns on the domains that have the www in them. You can do this the easiest by doing the ‘Fixed Width’ option since www. is always the same width. You may also have other subdomains in your list, but honestly I would just treat these as separate sites from the main.
- De-dupe. Now that you have your list of just domain names without the www and folders, de-dupe this list so that you have a list of unique domain names.
Step 4 – Count # of Results and Average Rank For All Unique Domains
Once again, for this part I’m just going to give you the steps rather than screenshots since it might vary a little bit from spreadsheet program to spreadsheet program.
To set up your spreadsheet matrix, you should have all of your unique domains down the left, and then across the top you’ll have a column for # Results and Avg Rank for each of your keyword set shortnames. Put these at the top of the two columns and merge over them if you want to make it a little prettier, and it will give you something to reference in your formulas.
Getting Number of Results Per Unique Domain
- This formula will vary a little bit based on how big your data set is and where your list of original domains is.
- For example, if your first unique domain is in cell B3, your full data set of URLs is from cell D120 to cell D293, the column in the data set with the keyword set short names is in cells A120 to A293, and your first short name column name is in cell C1 your formula would look like this: =COUNTIFS($D$120:$D$293,”*”&B3&”*”,$A$120:$A$293,$C$1)
- Notice the absolute references for the cell ranges. This is critical, otherwise you won’t get the correct count.
- The “*”&B3&”*” is a wild card that basically says match B3 with anything before or after it. So you’ll get www, non-www, home page, and any other page for that domain name.
- If your formula looks good, copy it down to all of your unique domains.
- Repeat this process or all of your keyword sets.
What this number tells you is the number of times that unique domain shows up in your scrapes for that keyword set. If they show up a lot, than that’s something they are going after, and can be higher if they have multiple listings as well. If they don’t show up very much, than it isn’t an important set for them.
Getting the Average Rank Per Unique Domain
- This formula will also vary a little bit based on how big your data set is, etc.
- Let’s use the same example cells as listed above, but your Rank data is in cells B120 to B293. Your formula in Excel would look like this: =AVERAGEIFS($B$120:$B$293,$D$120:$D$293,”*”&B2&”*”,$A$120:$A$293,$C$1)
- Again, notice the absolute references and make sure you have them in there.
- If your formula looks good, copy it down for all of your unique domains.
- Repeat this process or all of your keyword sets.
What this number tells you is the average rank for that domain name for that keyword set. Naturally, the lower the number the better they rank on average. The higher the number, the less of a threat they currently are, but it also shows that they are at least showing up for that set.
Step 5 – Organize and Analyze
Once you have all of your formulas down and you’re happy with what you see, I recommend copying and then pasting back the values for your matrix. This way you can sort the data any which way you want without it messing up the data (I made this mistake once and it wasn’t pretty).
Now you have a really cool matrix that will show you by keyword set which sites are going after different sets, and how important each set is to them based on how often they show up and what their average ranking is.
Have some fun sorting by different columns and even highlighting the numbers and sites that stand out to you. Here’s a screenshot sample of a matrix I did once to help.
What To Do With This Info
One of the problems with competitive analysis is that site owners and marketers only look at the companies they know, the major players in their space. Well, with this technique you will also see the affiliate sites that are competing that you may have overlooked, how big of a player sites like Wikipedia are in your space, etc.
If you run this every couple of months, you can also see the changes that are happening in the SERPs and better keep an eye on those sites that are becoming more of a threat.
As you identify new competitors, you also now have another site to analyze for marketing ideas, competitive links, etc.
Let Me Know What You Think
I really hope that this has been a helpful post for you to learn a technique to identify more of the true competitors in your space, and then what to do with that information. I’m sure that there are other ways to get this information, and if you have any additional tips please share them in the comments below.