- 论坛徽章:
- 0
|
Using
Python
’s pycurl (
cURL
) and re (
Regular Expression
) libraries, it’s possible to write a script that will check the Google ranking of a specific domain for a specific search term.
To check for and install Python 2.4 and the py-curl library on Mac OS X:
Follow these instructions to install MacPorts
if it hasn’t been installed yet, then open a new Terminal window and enter the following command to see a listing of all installed ports:
sudo port installed
If ‘python24‘ and ‘py-curl‘ are not listed amongst the installed ports, install them by entering:
sudo port install python24
sudo port install py-curl
To check for and install Python 2.4 and the pycurl library on
Ubuntu
Linux:
Open a new Terminal window and enter the following command to install Python and the pycurl library (you’ll be notified if they’ve already been installed):
sudo apt-get install python2.4
sudo apt-get install python-pycurl
To run the rankcheck.py script:
Download Geekology’s version of this script here
, or copy the code below to create your own rankcheck.py script file:
#!/usr/bin/python
"""
This script accepts Domain, Search String and Google Locale arguments, then returns
which Search String results page for the Google Locale the Domain appears on.
Usage example:
rankcheck {domain} {searchstring} {locale}
Output example:
rankcheck geekology.co.za 'bash scripting' .co.za
- The domain 'geekology.co.za' is listed in position 2 (page 1) for the search 'bash+scripting' on google.co.za
"""
__author__ = "Willem van Zyl (willem@geekology.co.za)"
__version__ = "$Revision: 1.5 $"
__date__ = "$Date: 2009/02/10 12:10:24 $"
__license__ = "GPLv3"
import sys, pycurl, re
# check if all arguments were specified and whether help was requested:
if len(sys.argv) 4:
if len(sys.argv) == 1:
print "usage: rankcheck DOMAIN SEARCHSTRING LOCALE";
print "`rankcheck --help' for more information."
sys.exit()
elif sys.argv[1] == '--help':
print "usage: rankcheck DOMAIN SEARCHSTRING LOCALE"
print "Check the Search String page ranking of a Domain on a specific Google Locale"
print "\nExample: rankcheck geekology.co.za 'bash scripting' .co.za"
print "\nReport bugs to ."
sys.exit()
else:
print "usage: rankcheck DOMAIN SEARCHSTRING LOCALE";
print "`rankcheck --help' for more information."
sys.exit()
# some initial setup:
USER_AGENT = 'Mozilla/4.0 (compatible; MSIE 6.0)'
FIND_DOMAIN = sys.argv[1]
SEARCH_STRING = sys.argv[2].replace(' ', '+')
LOCALE = sys.argv[3]
# check if the locale is valid:
if sys.argv[3] == '.co.za':
SEARCH_COUNTRY = '&meta=cr%3DcountryZA'
elif sys.argv[3] == '.co.uk':
SEARCH_COUNTRY = '&meta=cr%3DcountryUK'
elif sys.argv[3] == '.com':
SEARCH_COUNTRY = ''
else:
print "Only the '.com', '.co.uk' and '.co.za' locales are allowed."
sys.exit()
ENGINE_URL = 'http://www.google' + LOCALE + '/search?q=' + SEARCH_STRING + SEARCH_COUNTRY
# define class to store result:
class RankCheck:
def __init__(self):
self.contents = ''
def body_callback(self, buf):
self.contents = self.contents + buf
# instantiate curl and result objects:
rankRequest = pycurl.Curl()
rankCheck = RankCheck();
# set up curl:
rankRequest.setopt(pycurl.USERAGENT, USER_AGENT)
rankRequest.setopt(pycurl.FOLLOWLOCATION, 1)
rankRequest.setopt(pycurl.AUTOREFERER, 1)
rankRequest.setopt(pycurl.WRITEFUNCTION, rankCheck.body_callback)
rankRequest.setopt(pycurl.COOKIEFILE, '')
rankRequest.setopt(pycurl.HTTPGET, 1)
rankRequest.setopt(pycurl.REFERER, '')
# run curl:
for i in range(0, 10):
rankRequest.setopt(pycurl.URL, ENGINE_URL + '&start=' + str(i * 10))
rankRequest.perform()
# close curl:
rankRequest.close()
# collect the search results
html = rankCheck.contents
counter = 0
result = 0
url=unicode(r'(\w\d:#@%/;$()~_?\+-=\\\.&]*)')
for google_result in re.finditer(url, html):
# print m.group()
this_url = google_result.group()
this_url = this_url[21:]
counter += 1
google_url_regex = re.compile("((https?):((//))+([\w\d:#@%/;$()~_?\+-=\\\.&])*" + FIND_DOMAIN + "+([\w\d:#@%/;$()~_?\+-=\\\.&])*)")
google_url_regex_result = google_url_regex.match(this_url)
if google_url_regex_result:
result = counter
break
# show results
if result == 0:
print " - The domain '" + FIND_DOMAIN + "' wasn't listed in the first 10 pages for the search '" + SEARCH_STRING + "' on google" + LOCALE
else:
print " - The domain '" + FIND_DOMAIN + "' is listed in position " + str(result) + " (page " + str((result / 10) + 1) + ") for the search '" + SEARCH_STRING + "' on google" + LOCALE
Open a new Terminal window and navigate to the folder containing the script, then execute it by entering:
python ./rankcheck.py {domain} '{search string}' {locale}
… filling in the Domain, Search String and Locale that you want to check.
Because the Python script file starts with ‘#!/usr/bin/python‘, you’ll be able to execute it from the command line without invoking the python executeable if you
set execute permissions
on the file:
sudo chmod 744 rankcheck.py
./rankcheck.py {domain} '{search string}' {locale}
本文来自ChinaUnix博客,如果查看原文请点:http://blog.chinaunix.net/u/78/showart_1891400.html |
|