python 網路爬蟲與資料庫

這是乙個簡單的爬取豆瓣電影top250的**,爬去了每一條電影的18個維度的資料,並且將他們儲存在本地的mysql資料庫中.

詳細**如下.

requests :請求網頁,獲取網頁資料

lxml:使用xpath語法快速解析網頁資料

# -*- coding: utf-8 -*-
"""created on tue jan 22 20:55:02 2019
@author: tide1
"""import requests
from lxml import etree
import re
import time 
import pymysql
import numpy as np
'''1.資料庫操作
'''#cursor
conn=pymysql.connect(host='localhost',user='dns',
passwd='123456',db='mydb',port=3306,charset='utf8') #
cursor=conn.cursor()  # 游標
cursor.execute("drop table if exists douban_movie")
sql = """create table douban_movie (
movie_name  text,
director text,
writers text,  
actors text,
style text,
country text,
language text,
release_times text,
time text,
anthor_name text,
score text,
num_comments text,
five_star text,
four_star text,
three_star text,
two_star text,
one_star text,
better text)default charset = utf8;"""
cursor.execute(sql)
#資料儲存
def tosql(a):
cursor.execute('insert into douban_movie(movie_name,\
director,writers,actors,style,country,language,release_times,\
time,anthor_name,score,num_comments,five_star,four_star,three_star,two_star,one_star,better)\
values(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)',(a[0],a[1],a[2],a[3],a[4],a[5],a[6],a[7],a[8],a[9],a[10],a[11],a[12],a[13],a[14],a[15],a[16],a[17]))
'''2.爬取操作
'''#複製 user-agent,偽chrome裝瀏覽器
headers = '.format(str(i)) for i in range(0,250,25)]
count=0
for url in urls:
get_movie_url(url)
time.sleep(10+np.random.normal(5))
count+=1
print(count)
conn.commit()

python 資料庫爬蟲

python3 和 pip3 安裝安裝 selenium 配置驅動的環境變數，或者將驅動放到已經配置好的資料夾中，類似 window 的 cmd的目錄 window c windows system32 linux usr bin usr local bin 安裝 pyquery 安裝pymysq...

java網路爬蟲與mysql資料庫（一）

一.什麼是網路爬蟲網路爬蟲指按照一定的規則模擬人工登入網頁的方式自動抓取網路上的程式。簡單的說，就是講你上網所看到頁面上的內容獲取下來，並進行儲存。網路爬蟲的爬行策略分為深度優先和廣度優先。如下圖是深度優先的一種遍歷方式是a到b到d 到e 到c到 f abdecf 而寬度優先的遍歷方式abcd...

Python與資料庫

step1 連線資料庫 step2 建立游標物件 step3 對資料庫進行增刪改查 step4 關閉游標 step5 關閉連線游標是系統為使用者開設的乙個資料緩衝區，存放sql語句的執行結果。每個游標區都有乙個名字。使用者可以用sql語句逐一從游標中獲取記錄，並賦給主變數，交由主語言進一步處理。概述...

python 網路爬蟲 與資料庫

python 資料庫 爬蟲

java網路爬蟲與mysql資料庫（一）

Python與資料庫

相關推薦

python 網路爬蟲與資料庫

python 資料庫爬蟲