php抓取網頁的若干實現方式

最近在做乙個笑話平台，包含web版、安裝版，由於沒有笑話資源，所以就用php寫了乙個後台程式，每天定時從各大笑話**抓取資料，下面整理了一些php抓取網頁內容的基本方式。

一、 php抓取頁面的主要方法：

1. file()函式 2. file_get_contents()函式 3. fopen()->fread()->fclose()模式 4.curl方式 5. fsockopen()函式 socket模式 6. 使用外掛程式。

二、php解析html或xml**主要方式：

1. 正規表示式 2. php domdocument物件 3. 外掛程式

(如：php ****** html dom parser)

如果你對以上內容已經很了解，以下內容可以飄過……

php抓取頁面

1. file()函式

<?php
$url='';
$lines_array=file($url);
$lines_string=implode('',$lines_array);
echo htmlspecialchars($lines_string);
?>

2. file_get_contents()函式

使用file_get_contents和fopen必須空間開啟allow_url_fopen。方法：編輯php.ini，設定 allow_url_fopen = on，allow_url_fopen關閉時fopen和file_get_contents都不能開啟遠端檔案。

<?php
$url='';
$lines_string=file_get_contents($url);
echo htmlspecialchars($lines_string);
?>

3. fopen()->fread()->fclose()模式

<?php
$url='';
$handle=fopen($url,"rb");
$lines_string="";
do$lines_string.=$data;
}while(true);
fclose($handle);
echo htmlspecialchars($lines_string);
?>

4. curl方式

使用curl必須空間開啟curl。方法：windows下修改php.ini，將extension=php_curl.dll前面的分號去掉，而且需要拷貝ssleay32.dll和libeay32.dll到c:\windows\system32下；linux下要安裝curl擴充套件。

<?php
$url='';
$ch=curl_init();
$timeout=5;
curl_setopt($ch, curlopt_url, $url);
curl_setopt($ch, curlopt_returntransfer, 1);
curl_setopt($ch, curlopt_connecttimeout, $timeout);
$lines_string=curl_exec($ch);
curl_close($ch);
echo htmlspecialchars($lines_string);
?>

5. fsockopen()函式 socket模式

socket模式能否正確執行，也跟伺服器的設定有關係，具體可以通過phpinfo檢視伺服器開啟了哪些通訊協議，比如我的本地php socket沒開啟http，只能使用udp測試一下了。

<?php
$fp = fsockopen("udp:", 13, $errno, $errstr);
if (!$fp)  else 
?>

6. 外掛程式

網上應該有比較多的外掛程式，snoopy外掛程式是在網上搜到的，有興趣的可以研究一下。

php解析xml(html)

1. 正規表示式：

<?php
$url='';
$lines_string=file_get_contents($url);
eregi('(.*)',$lines_string,$title);
echo htmlspecialchars($title[0]);
?>

2. php domdocument()物件

如果遠端的html或xml存在語法錯誤，php在解析dom的時候會報錯。

<?php
$url='';
$html=new domdocument();
$html->loadhtmlfile($url);
$title=$html->getelementsbytagname('title');
echo $title->item(0)->nodevalue;
?>

3. 外掛程式

本文以php ****** html dom parser為例，進行簡單介紹，******_html_dom的語法類似jquery，它讓php操作dom，就像使用jquery操作dom一樣的簡單。

<?php
$url='';
include_once('../******htmldom/******_html_dom.php');
$html=file_get_html($url);
$title=$html->find('title');
echo $title[0]->plaintext;
?>

php抓取網頁的若干實現方式

網頁抓取 PHP實現網頁爬蟲方式小結

網頁抓取 PHP實現網頁爬蟲方式小結

php抓取網頁

php抓取網頁的若干實現方式

網頁抓取 PHP實現網頁爬蟲方式小結

網頁抓取 PHP實現網頁爬蟲方式小結

php抓取網頁

相關推薦