Php Web Scrapping



The PHP Playbook Book of 2011 Year. PHP Notes For Professionals Free Pdf Book. Web Database Applications With PHP And MySQL. PHP Web Scraping. Learning Three Js The Javascript 3d Library For Web Gl. Dreamweaver CS6 Mobile And Web Development With HTML5 CSS3 And Jquery Mobile. PHP And MongoDB Web Development Beginners Guide. Goutte is one of the screen scraping and web scraping library for PHP. It provides you a. This is making a HTTP(S) request to a server and pulling down data. A good web scraping library will also allow for large binary data blobs to be written directly to disk as they come down off the network instead of loading the whole thing into RAM. The ability to do dynamic form extraction and submission is also very handy. A really good library will let you fine-tune every aspect of each request.

  1. Php Scraping Library
  2. Web Scrap

In this post, I’ll explain how to do a simple web page extraction inPHPusing cURL, the ‘Client URL library’.
Thecurl is a part of libcurl, a library that allows you to connect to servers with many different types of protocols. It supports the http, https and other protocols. This way of getting data from web is more stable with header/cookie/errors process rather than using simplefile_get_contents(). If curl() is not installed, you can readhere for Winorhere for Linux.

Php Web Scrapping

Setting Up cURL

First, we need to initiate the cURL handle:

Php Scraping Library

Then, set CURLOPT_RETURNTRANSFER to TRUE to return the transfer page as a string rather than put it out directly:

Executing the Request & Checking for Errors

Now, start the request and perform an error check:

Closing the Connection

To close the connection, type the following:

Extracting Only the Needed Part and Printing It

After we have the page content, we may extract only the needed code snippet, underid=”case_textlist”:

The WholeScraperListing

Web Scrap

This sample will guide you and give you further practice in daily web scraping.

As a PHP programmer, we often need to get some data from another website for some purpose. Getting data from another websites is known as web scraping. Scrapping website data is not an easy task as it creates many challenges.

So if you’re looking for solution to scrape data, then you’re here at the right place. In this tutorial you will learn how to scrape data from website using PHP.

The tutorial is explained in easy steps with live demo and download demo source code.

Php Web Scrapping

So let’s start the coding. We will have following file structure for data scraping tutorial

  • index.php
  • scrape.js

Steps1: Create Form To Enter Website URL
As we will handle this tutorial with demo, so first we will create From in index.php with submit button to enter website URL to scrape data.

Web

Steps2: Create PHP Function Get Website Data
Now we will create a PHP function scrapeWebsiteData in scrape.php to get website data using PHP cURL library that allows you to connect and communicate to many different types of servers with many different types of protocols.

In above function, we are checking whether PHP cURL is installed or not. Here we have used three cURL functions curl_init() initializes the session, curl_exec() executes, and curl_close() to close connection. The variable CURLOPT_URL is used to set the website URL that we scrapping. The second CURLOPT_RETURNTRANSFER is used to tell to store scraped page in a variable rather than its default, which is to simply display the entire page as it is.

Steps3: Scrape Particular Data from Website
Now finally we will handle functionality to scrape particular section of page. As mostly we don’t want all data from page, just need section of page or data. So here in this example, we will look for latest posts at PHPZAG.COM. For this we will pass that particular section from which we start getting data and end point. Here we have have used CURLOPT_RETURNTRANSFER variable to that particular scraped section of page.

if(isset($_POST['submit'])){
$html = scrapeWebsiteData($_POST['website_url']);
$start_point = strpos($html, '<h3>Latest Posts</h3>');
$end_point = strpos($html, '</div>', $start_point);
$length = $end_point-$start_point;
$html = substr($html, $start_point, $length);
echo $html;
}

Now have a list of latest posts from PHPZAG.COM. This is really a simple example to get that particular section of page. You can go further to get useful data from websites according to your requirement. For example, you can scrape data from eCommerce websites to get product details, price etc. The point is, once the website data in your hands, you can do whatever you want.


You can view the live demo from the Demo link and can download the script from the Download link below.
DemoDownload