SEO高手是如何分析竞争对手的?
是的,今天要不要脸一次,自称SEO高手了。
通过采集站长工具关键词库及百度凤巢关键词规划师,我们可以得到一大批关键词。如何优化这些关键词呢?直接模(chao)仿(xi)SEO表现好的竞争对手是省时省力的做法。但如何正确地模仿竞争对手呢?万一模仿非人可就悲剧了。于是问题转化为找出SEO真正表现好的竞争对手。
通过搜索接个关键词能找到一些行业内排名靠前的网站,但因为样本偏小,很可能存在较大偏差。在某几个关键词排名表现好的网站,并不代表网站在整个行业内领先。因此需要扩大样本,并且最好能数据化。
比如,采集10000个关键词排名首页的着陆页并记录下域名,根据不同的排名位置赋予不同的得分,最后计算每个域名的总得分并从高到低排序。显然的,得分越高的网站,总体SEO表现越好。
等等,总得分高并不代表排名能力就越强,有可能网站排名能力稍弱,但整体覆盖的关键词更多导致总得分高。因此,仍需计算每个域名下有多少个关键词排名首页,再用总得分除以关键词排名个数的值代表网站排名能力。总得分越高,关键词排名多且排名能力强的网站越有参考价值。
显然的,人工搜索统计不太现实,这里使用scrapy爬虫框架和数据分析利器pandas来实现。scrapy负责抓取搜索结果,提取关键词,域名,着陆页及排名,pandas用来做统计计算。以移动端为例,毕竟现在流量都在移动端了。
需安装scrapy,pandas,pyquery等程序包,示例代码如下:
spider文件
import scrapy,urllib,re,time,jsonfrom mobilerank.items import MobilerankItemfrom pyquery import PyQuery as pqdef search(req,html):text = re.search(req,html)data = text.group(1) if text else 404return datatry:finish = [line.rstrip() for line in open('finished.txt')]except:finish = []allurls = ['https://m.baidu.com/s?word=%s&' % urllib.parse.quote(kw.rstrip()) for kw in open('keywords.txt',encoding='utf-8-sig')]not_grap = [url for url in allurls if url not in finish] #断点后,获取未抓取的链接print(len(not_grap))class seoSpider(scrapy.spiders.Spider):name = "mobilerank"start_urls = not_grapdef parse(self, response):link = response.urlif link not in finish:with open('finished.txt','a+') as f:f.write(link +'\n')html = response.textdoc = pq(html)divs = doc('.result').items()for div in divs:item = MobilerankItem()m = search('word=(.*?)&', link)if m != 404:kw = urllib.parse.unquote(m)else:kw = 404datalog = div.attr('data-log')if datalog:datalog = datalog.replace('\'', '"')data = json.loads(datalog)url = data.get('mu', '404.html')rank = int(data['order'])item['kw'] = kwitem['domain'] = urllib.parse.urlparse(url).netlocitem['url'] = urlitem['rank'] = rankyield item
settings.py文件
# -*- coding: utf-8 -*-BOT_NAME = 'mobilerank'SPIDER_MODULES = ['mobilerank.spiders']NEWSPIDER_MODULE = 'mobilerank.spiders'USER_AGENT = 'Mozilla/5.0 (iPhone; CPU iPhone OS 7_0_2 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A4449d Safari/9537.53'CONCURRENT_REQUESTS = 64FEED_EXPORT_FIELDS = ['kw','domain','url','rank']FEED_EXPORT_ENCODING = 'utf-8'
items.py文件
# -*- coding: utf-8 -*-import scrapyclass MobilerankItem(scrapy.Item):# define the fields for your item here like:# name = scrapy.Field()kw = scrapy.Field()domain = scrapy.Field()url = scrapy.Field()rank = scrapy.Field()pass
run.py文件
# -*- coding: utf-8 -*-from subprocess import Popenimport time,datetime,reprint(datetime.datetime.now())start = time.time()spider = Popen("scrapy crawl mobilerank -o new_ranks.csv") #调用命令行执行蜘蛛程序spider.wait() #等待Spider程序进程结束,才运行下边的代码print('开始数据处理')rank_values = [2.856,1.923,1.020,0.814,0.750,0.572,0.401,0.441,0.553,0.670]def num(pos):value = rank_values[int(pos) - 1]return valueimport pandas as pddf = pd.read_csv('new_ranks.csv',encoding='utf-8',error_bad_lines=False)df = df[df['rank'] < 11].drop('kw',1)df['分值'] = df['rank'].apply(num)df['个数'] = 1datas = df.groupby([df['domain']]).agg({'分值':sum,'个数':sum}).sort_values(by="分值" , ascending=False)datas['排名能力'] = datas['分值']/ datas['个数']datas['排名能力'] = datas['排名能力'].apply(lambda x: format(x, '.2f')).astype(float) #保留两位小数print(datas.reset_index().head(60))datas.to_excel('排名能力分析.xlsx')
将关键词保存在keywords.txt文件中,一行一个。然后运行run.py文件。程序运行完毕会将最终结果保存在Excel文件中,如下图所示。各个竞争对手情况一目了然。
阅读剩余
THE END