用於讀取爬蟲後的檔案

# coding: utf-8
'''created on 2014-7-24
@author: administrator
'''import urllib2
from urllib2 import request
import re
import sys
def p(f):
print '%s.%s(): %s' % (f.__module__, f.__name__, f())
# 返回當前系統所使用的預設字元編碼
p(sys.getdefaultencoding)
req=request('')
req.add_header('user-agent', 'aa')
response = urllib2.urlopen(req)
html = response.read()
#print html
myitems = re.findall('(.*?)
',html,re.s)
print  myitems
for i in myitems:    #myitems是list集合
for j in range(len(i)):   #i是乙個集合    len(i)為2  j的值為0和1
print i[j]

當一起列印的時候 中文就會有亂碼，  分開迭代的讀取的話 就會正常列印中文，很奇怪

用於登陸後的資訊收集爬蟲

有時候我們需要的使用者資訊是要通過使用者登入後得到，這些資訊其實也可以通過爬蟲獲得。本demo中用到的2個外部庫 org.jsoup,jxl 用於寫excel 同樣以中南財經政法大學教務處學生個人資訊系統為例接下來就是檢視網頁原始碼，即找到傳遞賬號密碼的目的位址登陸進去後檢視所需調轉的網頁位址 ...

java讀取檔案後n行

public class readfile 讀取檔案最後n行根據換行符判斷當前的行數，使用統計來判斷當前讀取第n行 ps 輸出的list是倒敘，需要對list反轉輸出 param file 待檔案 param numread 讀取的行數 return list public static list...

爬蟲 python基礎練習（檔案的讀取）

codeing utf 8 time 2021 2 10 16 58 author foryou file demo7.py software pycharm f open test.txt w read方法，讀取指定的字元，開始時指定在檔案頭部，每執行一次，向後移動指定字元數開啟檔案 w寫模式 ...

用於讀取爬蟲後的檔案

用於登陸後的資訊收集爬蟲

java讀取檔案後n行

爬蟲 python基礎練習（檔案的讀取）

相關推薦