免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 2845 | 回复: 0
打印 上一主题 下一主题

A Python script to check Google rankings for a spe [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2009-04-06 17:15 |只看该作者 |倒序浏览

Using
Python
’s pycurl (
cURL
) and re (
Regular Expression
) libraries, it’s possible to write a script that will check the Google ranking of a specific domain for a specific search term.
To check for and install Python 2.4 and the py-curl library on Mac OS X:
Follow these instructions to install MacPorts
if it hasn’t been installed yet, then open a new Terminal window and enter the following command to see a listing of all installed ports:
sudo port installed
If ‘python24‘ and ‘py-curl‘ are not listed amongst the installed ports, install them by entering:
sudo port install python24
sudo port install py-curl
To check for and install Python 2.4 and the pycurl library on
Ubuntu
Linux:
Open a new Terminal window and enter the following command to install Python and the pycurl library (you’ll be notified if they’ve already been installed):
sudo apt-get install python2.4
sudo apt-get install python-pycurl
To run the rankcheck.py script:
Download Geekology’s version of this script here
, or copy the code below to create your own rankcheck.py script file:
#!/usr/bin/python

"""

This script accepts Domain, Search String and Google Locale arguments, then returns
which Search String results page for the Google Locale the Domain appears on.


Usage example:

  rankcheck {domain} {searchstring} {locale}


Output example:

  rankcheck geekology.co.za 'bash scripting' .co.za
   - The domain 'geekology.co.za' is listed in position 2 (page 1) for the search 'bash+scripting' on google.co.za

"""

__author__    = "Willem van Zyl (willem@geekology.co.za)"
__version__   = "$Revision: 1.5 $"
__date__      = "$Date: 2009/02/10 12:10:24 $"
__license__   = "GPLv3"

import sys, pycurl, re

# check if all arguments were specified and whether help was requested:
if len(sys.argv)  4:
  if len(sys.argv) == 1:
    print "usage: rankcheck DOMAIN SEARCHSTRING LOCALE";
    print "`rankcheck --help' for more information."
    sys.exit()
  elif sys.argv[1] == '--help':
    print "usage: rankcheck DOMAIN SEARCHSTRING LOCALE"
    print "Check the Search String page ranking of a Domain on a specific Google Locale"
    print "\nExample: rankcheck geekology.co.za 'bash scripting' .co.za"
    print "\nReport bugs to ."
    sys.exit()
  else:
    print "usage: rankcheck DOMAIN SEARCHSTRING LOCALE";
    print "`rankcheck --help' for more information."
    sys.exit()


# some initial setup:
USER_AGENT = 'Mozilla/4.0 (compatible; MSIE 6.0)'
FIND_DOMAIN = sys.argv[1]
SEARCH_STRING = sys.argv[2].replace(' ', '+')
LOCALE = sys.argv[3]

# check if the locale is valid:
if sys.argv[3] == '.co.za':
  SEARCH_COUNTRY = '&meta=cr%3DcountryZA'
elif sys.argv[3] == '.co.uk':
  SEARCH_COUNTRY = '&meta=cr%3DcountryUK'
elif sys.argv[3] == '.com':
  SEARCH_COUNTRY = ''
else:
  print "Only the '.com', '.co.uk' and '.co.za' locales are allowed."
  sys.exit()

ENGINE_URL = 'http://www.google' + LOCALE + '/search?q=' + SEARCH_STRING + SEARCH_COUNTRY


# define class to store result:
class RankCheck:
  def __init__(self):
    self.contents = ''

  def body_callback(self, buf):
    self.contents = self.contents + buf


# instantiate curl and result objects:
rankRequest = pycurl.Curl()
rankCheck = RankCheck();


# set up curl:
rankRequest.setopt(pycurl.USERAGENT, USER_AGENT)
rankRequest.setopt(pycurl.FOLLOWLOCATION, 1)
rankRequest.setopt(pycurl.AUTOREFERER, 1)
rankRequest.setopt(pycurl.WRITEFUNCTION, rankCheck.body_callback)
rankRequest.setopt(pycurl.COOKIEFILE, '')
rankRequest.setopt(pycurl.HTTPGET, 1)
rankRequest.setopt(pycurl.REFERER, '')

# run curl:
for i in range(0, 10):
  rankRequest.setopt(pycurl.URL, ENGINE_URL + '&start=' + str(i * 10))
  rankRequest.perform()

# close curl:
rankRequest.close()


# collect the search results
html = rankCheck.contents
counter = 0
result = 0

url=unicode(r'(\w\d:#@%/;$()~_?\+-=\\\.&]*)')

for google_result in re.finditer(url, html):
  # print m.group()
  this_url = google_result.group()
  this_url = this_url[21:]
  counter += 1

  google_url_regex = re.compile("((https?):((//))+([\w\d:#@%/;$()~_?\+-=\\\.&])*" + FIND_DOMAIN + "+([\w\d:#@%/;$()~_?\+-=\\\.&])*)")
  google_url_regex_result = google_url_regex.match(this_url)
  if google_url_regex_result:
    result = counter
    break


# show results
if result == 0:
  print " - The domain '" + FIND_DOMAIN + "' wasn't listed in the first 10 pages for the search '" + SEARCH_STRING + "' on google" + LOCALE
else:
  print " - The domain '" + FIND_DOMAIN + "' is listed in position " + str(result) + " (page " + str((result / 10) + 1) + ") for the search '" + SEARCH_STRING + "' on google" + LOCALE
Open a new Terminal window and navigate to the folder containing the script, then execute it by entering:
python ./rankcheck.py {domain} '{search string}' {locale}
… filling in the Domain, Search String and Locale that you want to check.
Because the Python script file starts with ‘#!/usr/bin/python‘, you’ll be able to execute it from the command line without invoking the python executeable if you
set execute permissions
on the file:
sudo chmod 744 rankcheck.py

./rankcheck.py {domain} '{search string}' {locale}


本文来自ChinaUnix博客,如果查看原文请点:http://blog.chinaunix.net/u/78/showart_1891400.html
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP