Spider兩道題 ,沒全理解

#-*- conding:utf-8 -*-
from bs4 import beautifulsoup
import requests
import time
'''    1.需求分析
獲取：title = python 練習例項1
timu = 題目：有四個數字：1、2、3、4，能組成多少個互不相同且無重複數字的三位數？各是多少？
cxfx = 程式分析：可填在百位、十位、個位的數字都是1、2、3、4。組成所有的排列後再去 掉不滿足條件的排列。
code = 源**
2.原始碼分析
入口：  
1. 獲取所有的 a 標籤
find(id = 'content').find_all('a')
2. 獲取標題
find(id = 'content').h1
3. 獲取題目
find(id = 'content').find_all('p')[1]
4. 獲取程式分析
find(id = 'content').find_all('p')[2]
5. 獲取源**
find(class_ = 'hl-main').text
3.**實現
''''''
一、傳送請求獲取py100首頁源**
'''starturl = ''
headers = 
# 傳送請求
response = requests.get(starturl,headers = headers).content.decode('utf-8')
# print(response)
# 解析成 beautifulsoup
soup = beautifulsoup(response,'lxml')
# print(soup)
# 提取a 標籤
link = soup.select('#content a')
# a = 
num = 1
for i in link:
print('第道題'.format(num))
'''二、請求詳細頁面獲取內容
'''response2 = requests.get(''+i.attrs['href'],headers = headers).content.decode('utf-8')
# 解析
html = beautifulsoup(response2,'lxml')
#獲取標題
title = html.select('#content h1')[0].text
# 題目
timu = html.select('#content p')[1].text
# 獲取程式分析
cxfx = html.select('#content p')[2].text
# 源**
try:
code = html.select('.hl-main')[0].text
except:
code = html.select('pre')[0].text
'''儲存內容
'''with open('py100.txt','a+',encoding='utf-8') as file:
file.write(title+'\n'+timu+'\n'+cxfx+'\n'+code+'\n'+'='*50+'\n')
# time.sleep(1)
num+=1

#-*- conding:utf-8 -*-
from lxml import etree
import requests
import time
'''    1.需求分析
1.獲取每一篇帖子的標題
2.獲取每一篇帖子的內容
2.原始碼分析
入口：1. 獲取每一篇帖子的a鏈結
//div[@class='post_item_body']/h3/a[@href]
//div[@class='pager']/a[last()]/@href
//div[@class='pager']/a[last()]/text()
2.獲取標題
//div[@class='post_item_body']/h3/a/text()
3.獲取內容
string(//div[@id='cnblogs_post_body'])
3.**實現
''''''
一、請求首頁帖子鏈結
'''straturl = ''
headers = 
# 起始頁碼
page = 1
while true:
# 請求首頁原始碼
response = requests.get(straturl,headers = headers).text
# 解析
html = etree.html(response)
# 提取a標籤中的鏈結以
link = html.xpath("//div[@class='post_item_body']/h3/a/@href")
nextpage = html.xpath("//div[@class='pager']/a[last()]/@href")
nextpagetext = html.xpath("//div[@class='pager']/a[last()]/text()")
'''二、獲取帖子詳細內容
'''# 累加器
num = 1
for i in link:
print('第頁第篇帖子'.format(page,num))
# 請求帖子的內容
response_info = requests.get(i,headers = headers).text
# 解析
html_info = etree.html(response_info)
# print(html_info)
# 提取標題
title = html_info.xpath("//a[@id='cb_post_title_url']/text()")[0]
# 提取內容
content = html_info.xpath("string(//div[@id='cnblogs_post_body'])")
'''儲存檔案
'''with open('cnblogs.txt','a+',encoding='utf-8') as file:
file.write(title+'\n'+content+'='*50+'\n')
time.sleep(0.5)
num+=1
if nextpagetext[0] == 'next >':
straturl = ''+nextpage[0]
page+=1
time.sleep(1)

兩道選擇題

教授面帶微笑，走進教室，對我們說我受一家機構委託，來做一項問卷調查，請同學們幫個忙。一聽這話，教室裡輕微的一陣議論開了，大學課堂本來枯燥，這下好玩多了。問卷表發下來，一看，只有兩道題。第一題他很愛她。她細細的瓜子臉，彎彎的娥眉，面色白皙，美麗動人。可是有一天，她不幸遇上了車禍，痊癒後，臉上留下幾...

兩道選擇題

記得大學一堂選修課上。教授面帶微笑，走進教室，對我們說我受一家機構委託，來做一項問卷調查，請同學們幫個忙。一聽這話，教室裡輕微的一陣議論開了，大學課堂本來枯燥，這下好玩多了。問卷表發下來，一看，只有兩道題。第一題他很愛她。她細細的瓜子臉，彎彎的娥眉，面色白皙，美麗動人。可是有一天，她不幸遇上了車...

兩道LIS經典題

題意某國為了防禦敵國的飛彈襲擊，發展出一種飛彈攔截系統。但是這種飛彈攔截系統有乙個缺陷雖然它的第一發炮彈能夠到達任意的高度，但是以後每一發炮彈都不能高於前一發的高度。某天，雷達捕捉到敵國的飛彈來襲。由於該系統還在試用階段，所以只有一套系統，因此有可能不能攔截所有的飛彈。輸入飛彈依次飛來的高度雷...

Spider兩道題 ,沒全理解

兩道選擇題

兩道選擇題

兩道LIS經典題

相關推薦