源**為:
from urllib.request import request, urlopen
import requests
import re
import time
def gethtml(url):
headers = # 設定虛擬headers資訊
request = request(url, headers=headers)
response = urlopen(request)
html = response.read().decode('utf-8')
return html
def write_to_file(content):
with open('duanzi.txt','a',encoding='utf=8') as f:
f.write(json.dumps(content,ensure_ascii=false)+'\n')
def gettext(pagenum=1):
text_list=
for page in range(1,pagenum+1):
url = '' + str(page)
html=gethtml(url)
time.sleep(1)
pattern = re.compile(
'(.*?).*?(.*?)',
re.s)
items = re.findall(pattern, html)
for each_items in text_list: # 迭代獲取每乙個網頁的每乙個段子資訊
for item in each_items:
count=0
for i in item: # 處理文字,加強閱讀效果
i = i.strip('\n') # 將'\n'去掉,避免多個換行符疊加
i = i.replace('
', '\n') #
是html中的用於段落的換行標籤,
# 為了保持原本的段落格式,所以需要在我們閱讀時替換成文字換行符'\n'
print(i)
count+=1
if count%3==0:
print('----' * 20)
if __name__ == '__main__':
try:
num=int(input('請輸入你想要爬取的頁面數量:'))
gettext(num)
except exception as e:
print("對不住,出錯了!")
爬取糗事百科段子
user bin env python coding utf 8 author holley file baike1.py datetime 4 12 2018 14 32 description import requests import re import csv from bs4 impor...
Scrapy 爬取糗事百科段子
1.python爬蟲實戰一之爬取糗事百科段子 2.在工作目錄建立myproject scrapy startproject myproject3.編寫 myproject myproject items.py coding utf 8 define here the models for your ...
爬取糗事百科,朗讀段子
一閒下來就不務正業了,寫個爬蟲,聽段子。額,mac自帶的語音朗讀,windows我就不知道啦,有興趣的可以去研究一下哈。環境 python 2.7 mac os 10.12 使用朗讀的 from subprocess import call call say hello pengge 當然了,聽起來...