B10 爬蟲課程02

#使用方式：使用//獲取整個頁面當中的元素，然後寫標籤名，然後再寫謂詞進行提取。

//div[@clas='abc']

1./和//的區別：/只獲取直接子節點，//可以獲取子孫節點

2.contains:有時候某個屬性包含多個值，可以使用cntains

//div[contains(@class,'job_detail')]

3.謂詞的下標是從1開始

1.解析html字串：使用』lxml.etree.html』進行

htmlelement = etree.
html
(text)
print
(etree.
tostring
(htmlelement,encoding=
'utf-8').
decode
('utf-8'
))

2.解析html檔案：使用』lxml.etree.parse』進行，如果這個函式預設使用xml解析器，需要自己建立html解析器。

htmlelement = etree.
parse
('qingyunian.html'
)print
(etree.
tostring
(htmlelement,encoding=
'utf-8').
decode
('utf-8'
))

from lxml import etree
#解析慶餘年短評
def parse_qyn_file()
:    parser = etree.
htmlparser
(encoding=
'utf-8'
)    htmlelement = etree.
parse
('qingyunian.html'
,parser=parser)
print
(etree.
tostring
(htmlelement,encoding=
'utf-8').
decode
('utf-8'))
#解析拉鉤網頁
def parse_lagou_file()
:    parser = etree.
htmlparser
(encoding=
'utf-8'
)    htmlelement = etree.
parse
('lagou.html'
,parser=parser)
print
(etree.
tostring
(htmlelement,encoding=
'utf-8').
decode
('utf-8'))
if __name__ ==
'__main__'
:#parse_lagou_file()
#parse_text()
parse_qyn_file
()

from lxml import etree
parser = etree.htmlparser(encoding=
'utf-8'
)html = etree.parse(
"tencent.html",parser=parser)
#1.獲取所有的a標籤
#//a
#xpath返回的是乙個列表
alis = html.xpath(
"//a[@class='recruit-list-link']"
)for a in alis:
print(etree.tostring(a,encoding=
'utf-8'
).decode(
"utf-8"
))#2.獲取所有崗位名稱
alis = html.xpath(
"//h4"
)for a in alis:
print(etree.tostring(a,encoding=
'utf-8'
).decode(
"utf-8"
))#3.獲取所有職位資訊
alis = html.xpath(
"//p[@class='recruit-text']"
)for a in alis:
print(etree.tostring(a,encoding=
'utf-8'
).decode(
"utf-8"
))

#4.獲取所有的純文字資訊
alis = html.
xpath
("//a[@class='recruit-list-link']"
)positions =
for a in alis:
#在某個標籤下，再執行xpath函式，獲取子孫元素，那麼應該在//前加乙個點，代表在當前元素下獲取
title = a.
xpath
(".//h4[@class='recruit-title']/text()"
)    daihao = a.
xpath
("./p[1]//span[1]/text()"
)    address = a.
xpath
("./p[1]//span[2]/text()"
)    category = a.
xpath
("./p[1]//span[3]/text()"
)    time = a.
xpath
("./p[1]//span[5]/text()"
)    needs = a.
xpath
("./p[2]/text()"
)      
position =
positions.
(position)
print
(positions)

#寫入excel**

#寫入excel**
import pandas as pd
datadf = pd.
dataframe
(positions)
datadf.
to_excel
('result.xlsx'
,sheet_name=
'pachong_cc'
)# 匯出excel

7 10 矩陣A乘以B 10 分

7 10 矩陣a乘以b 10 分給定兩個矩陣a和b，要求你計算它們的乘積矩陣ab。需要注意的是，只有規模匹配的矩陣才可以相乘。即若a有r a行 ca列，b有rb 行 c b列，則只有c a與r b 相等時，兩個矩陣才能相乘。輸入格式輸入先後給出兩個矩陣a和b。對於每個矩陣，首先在一行中給出其行數...

L1 037 A除以B 10分 C語言

真的是簡單題哈給定兩個絕對值不超過100的整數a和b，要求你按照 a b 商的格式輸出結果。輸入格式輸入在第一行給出兩個整數a和b 100 a,b 100 數字間以空格分隔。輸出格式在一行中輸出結果如果分母是正數，則輸出 a b 商如果分母是負數，則要用括號把分母括起來輸出如果分母為零...

基礎爬蟲系列課程授課內容0 爬蟲的基本原理

什麼是爬蟲？簡單地說，爬蟲就是請求並提取資料的一種自動化程式。爬蟲的基本流程 1 向伺服器發起請求通過http庫向目標站點發起請求，即傳送乙個request，請求可以包含額外的headers等資訊，等待伺服器的響應。2 獲取響應內容得到的內容可能是html，可以用正規表示式網頁解析庫進行解析...

B10 爬蟲課程02

7 10 矩陣A乘以B 10 分

L1 037 A除以B 10分 C語言

基礎爬蟲系列課程授課內容0 爬蟲的基本原理

相關推薦