教程集 > 脚本编程 > php > 正文 php利用file_get_contents批量采集网站内容

php利用file_get_contents批量采集网站内容

发布时间：2016-10-08 编辑：jiaochengji.com

教程集为您提供php利用file，get，contents批量采集网站内容等资源，欢迎您收藏本站，我们将为您提供最新的php利用file，get，contents批量采集网站内容资源

file_get_contents函数是一个可以读取本地与远程服务器文件的函数了，下面我们就来介绍利用file_get_contents做一个小的采集功能。

最近发现了一个“小气”的学习站点。网站内容竟然不让复制，这样搞，怎么让我们考试的时候弄小抄，难不成要一个字一个字的去打不成。所幸的是咱是搞技术的，这点问题还是难不倒的，你不让复制刚好，这下我还懒得麻烦呢。直接搞个脚本把这一课的内容全扒取下来看岂不更方便。

说搞就搞，先是看源代码。不过网页禁止了右键，点右键有如下提示：

$\'php利用file_get_contents批量采集网站内容\'$

这个倒不难，查看网页源代码的方法太多了，不知道的可以网上找找吧。查看到了，源代码，发现没找页面中的内容未在源代码中显示。接着拿出httpwatch抓包分析，在其中的另外一个链接里找到页面源代码，不过源代码是加密过的。如下：

$\'php利用file_get_contents批量采集网站内容\'$

不过这个加密有点菜，里面已经明明白的写着是base64加密了。这个解码并不难，linux系统自带的base64工具就能完成：

<pre>[root@web20 php]# base64 --helpUsage: base64 [OPTION] [FILE]Base64 encode or decode FILE, or standard input, to standard output.-w, --wrap=COLS Wrap encoded lines after COLS character (default 76).Use 0 to disable line wrapping.-d, --decode Decode data.-i, --ignore-garbage When decoding, ignore non-alphabet characters.--help Display this help and exit.--version Output version information and exit.如果[文件]缺省，或者[文件]为 - ，则读取标准输入。The data are encoded as described for the base64 alphabet in RFC 3548.Decoding require compliant input by default, use --ignore-<span>garbage to attempt to recover from non</span>-alphabet characters (such as newlines) in<span> the encoded stream</span>.</pre>

base64 -d 文件名就行了。不过解码后发现，解出的结果是url化的。得到的结果如下：

<pre> [识记]会计的涵义是什么

您可能感兴趣的文章：
php利用file_get_contents批量采集网站内容
 PHP采集网页图片保存到本地的示例代码
 PHP批量下载html与css中图片文件实例
 file_get_contents只读取网页的部分内容
 PHP file_get_contents采集程序开发教程详解
 php 获取远程网页内容简单函数
 php file_get_contents函数抓取页面信息的代码
 PHP数据采集程序采集天气网数据实例演示
 PHP采集器的简单示例代码
 php采集远程图片的思路与实现代码

上一篇：php 301 永久重定向之Apache与IIS 下一篇：php soap扩展开启与__soapCall使用问题

[关闭]

php利用file_get_contents批量采集网站内容

最近更新

浏览排行