練習介紹
要求:請使用多協程和佇列,爬取時光網電視劇集top100的資料(劇名、導演、主演和簡介),並用csv模組將資料儲存下來。
目的:1.練習掌握gevent的用法
2.練習掌握queue的用法
1from gevent import
monkey
2monkey.patch_all()34
from bs4 import
beautifulsoup
5import
gevent,requests,csv
6from gevent.queue import
queue
78 url_list = ['
']9for i in range(2,11):
index-{}.html
'.format(i))
1112 work =queue()
1314
for url in
url_list:
15work.put_nowait(url)
1617
defpachong():
18while
notwork.empty():
19 url =work.get_nowait()
20 res =requests.get(url)
21 items = beautifulsoup(res.text,'
html.parser
').find_all('
div',class_='
mov_con')
22for item in
items:
23 title = item.find('h2'
).text.strip()
24 director = '
null
'25 actor = '
null
'26 remarks = '
null
'27 tag_ps = item.find_all('p'
)28for tag_p in
tag_ps:
29if tag_p.text[:2] == '導演'
:30 director = tag_p.text[3:].strip()
31elif tag_p.text[:2] == '主演'
:32 actor = tag_p.text[3:].strip().replace('
\t','')33
elif tag_p['
class']:
34 remarks =tag_p.text.strip()
35 with open('
top100.csv
','a
',newline='',encoding='
utf-8-sig
') as csv_file:
36 writer =csv.writer(csv_file)
37writer.writerow([title,director,actor,remarks])
3839 task_list =
4041
for x in range(3):
42 task =gevent.spawn(pachong)
4344
45 with open('
top100.csv
','w
',newline='',encoding='
utf-8-sig
') as csv_file:
46 writer =csv.writer(csv_file)
47 writer.writerow(['
電視劇集名
','導演
','主演
','簡介'])
老師的**