lxml xpath 爬取並正常顯示中文內容

import
osimport
lxml
from urllib2 import urlopen #
mac#
from urllib.request import request, urlopen # win
from lxml import
etree
hfile = urlopen('
').read()
tree =etree.html(hfile)
strs = tree.xpath( "
//title")
strs =strs[0]
#strs = (etree.tostring(strs)) # 不能正常顯示中文
strs = (etree.tostring(strs, encoding = "
utf-8
", pretty_print = true, method = "
html
")) #
可以正常顯示中文
print (strs)

如果不在tostring函式中正確配置的話，會列印出：

&#
30334;度一下，你就知道

而正確的應該是：

本文**grandyang

python爬取並計算成績

模擬登入後抓取成績，計算績點。coding utf 8 import urllib import urllib2 import cookielib import reimport string 績點運算 class sdu 類的初始化 def init self 登入url self.loginur...

爬取天氣資訊並郵件傳送

直接上 usr bin env python coding utf 8 from urllib.request import urlopen from pyquery import pyquery as pq import smtplib from email.mime.text import mi...

模擬登陸並爬取Github

因為崔前輩給出的執行有誤，略作修改和簡化了。書上例題，不做介紹。import requests from lxml import etree class login object def init self self.headers 登陸位址 self.login url post請求位址 sel...

lxml xpath 爬取並正常顯示中文內容

python爬取並計算成績

爬取天氣資訊並郵件傳送

模擬登陸並爬取Github

相關推薦