Friday 27 February 2009

PHP Google Blog Search URL Scraper

0 comments
Sometimes you just need a crapload of URL’s from Wordpress blogs. It’s nobodies business why you need them, if you need them, you need them.
Enter DaPimp’s Google Blogsearch URL scraper.
In a nutshell, this script grabs the fist 1,000 results of Wordpress blogs for a given keyword, and spits them out in a nice list for you.
**You’ll need PHP5, and a server with cURL enabled for this script to work**
Instructions for use:
You can either download the script here (change the file extension to .php), or just copy and paste the code below (wordpress buggers up the quote marks in code, so you’ll probably need to go and replace them manually -just download the script, it’s much easier)
Open the script in a text editor, and change the $keyword variable at the top to the keyword you want to search for
Save the script and upload it to your server
Navigate to the script in your browser, and wait, you’ll get your list
============ Start PHP Script ================
//give the script a keyword to search for
$keyword = “ipod touch”;
$keyword = str_replace(” “, “+”, $keyword);
//start a counter so we can number our results
$num = 0;
//set a start for our paging of Google Blogsearch (we’re going to be getting 10 pages X 100 results)
$start = 0;
do {
//Create the feed URL we’re going to get from Google Blogsearch
$feed = ‘http://blogsearch.google.com/blogsearch_feeds?hl=en&q=%22′ .$keyword. ‘+%22powered+by+wordpress%22&ie=utf-8&num=100&start=’ .$start. ‘&output=rss’;
//We’re using cURL to actually go fetch the page from Google Blogsearch
$ch = curl_init($feed);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $feed);
$page = curl_exec($ch);
curl_close($ch);
//Loop through the feed, and suck out the URL’s
$xml = new SimpleXMLElement($page);
foreach ($xml->channel->item as $item) {
//Add 1 to our counter, so our list has numbers next to the URL’s
$num = $num + 1;
$link = $item->link;
//Print our shit to the page
echo $num. ‘ - ’ .$link. ‘
’;
}
//Have a rest so we don’t get banned for hitting Google too hard and fast
sleep(30);
//Add 100 to the start, so we can fetch the next 100 results
$start = $start + 100;
}
//Keep doing this shit until we get to page 10 of the Google results
while ($start < 1000);
?>
============ End PHP Script ================

Comments

0 comments to "PHP Google Blog Search URL Scraper"

Post a Comment

Labels

 

Copyright © 2009 by Free Blogger Themes. Revolution 2 Church Theme by Brian Gardner. Converted into Blogger Template by Bloganol dot Com