智聯招聘的python崗位資料結巴分詞（一）

如何獲取資料點選這裡

資料樣式大概這樣。然後下面我分析的是工作要求也就是那邊的綠框那一列。

import jieba.posseg as psg #posseg模組可以獲取詞性

datapath=os.path.join(os.getcwd(),"all_results.csv")

with open(datapath,'r',newline='',encoding='utf-8') as csvfile:

# rows=csv.reader(csvfile)

# headers = next(rows)

# for i ,row in enumerate(rows):

# if i%50==0:

# print("正在處理第{}行資料".format(i))

# job_required=row[8]

# job_requirednew=job_required.strip().replace(" ","")

rows=csv.dictreader(csvfile)

result_list=[row['job_description'].strip().replace('\xa0','').replace('\r\n','') for row in rows]

info_attr = [(x.word,x.flag) for x in psg.cut(''.join(result_list)) if len(x.word) >= 2] # 這裡的x.word為詞本身，x.flag為詞性

with open('out.txt','w+') as f:

for x in info_attr:

f.write('\t\n'.format(x[0],x[1]))

執行完上面的程式得到的檔案結構如下

python 爬取智聯招聘

乙個爬取智聯的乙個小爬蟲 python版本 python3.7 依賴模組 selenium pyquery 廢話少說，上 from selenium import webdriver from selenium.webdriver.chrome.options import options from...

python爬取智聯招聘資訊

importrandom importre fromtimeimportsleep importrequests fromtqdmimporttqdm importuser agents importcsv defget page city,keyword,page 構造請求位址 paras 完整網...

python爬取智聯招聘資訊

分享今天寫的乙個爬取智聯招聘資訊的爬蟲，使用了requests和re模組，沒有寫注釋，但是都比較簡單，不是太難，這是爬取的資訊 coding utf 8 import requests import re from itertools import izip from json import du...

智聯招聘的python崗位資料結巴分詞（一）

python 爬取智聯招聘

python爬取智聯招聘資訊

python爬取智聯招聘資訊

相關推薦