python指令碼爬取字型檔案的實現方法

這篇文章主要給大家介紹了利用python指令碼爬取字型檔案的實現方法，文中分享了爬取兩個不同**的示例**，相信對大家具有一定的參考價值，需要的朋友們下面來一起看看吧。

前言實現方法

#coding:utf-8
import urllib2,cookielib,sys,re,os,zipfile
import numpy as np
#**登陸
html=html.replace('\n',' ')#將所有的回車去掉，因為正規表示式是單行匹配。。。。。。
urls=re.findall(r'(.*?)',html)
for i in urls:
url,inner=i
if not re.findall(r'download ',inner)== and re.findall(r'offsite',inner)== and url not in items:
for i in xrange(15):
host=''+str(i*50)+'?filter%5bdownload%5d=local'
search(host)
if not os.path.exists('ttf'):
os.mkdir('ttf')
os.chdir('ttf')
def unzip(rawfile,outputdir):
if zipfile.is_zipfile(rawfile):
print 'yes'
fz=zipfile.zipfile(rawfile,'r')
for files in fz.namelist():
print(files) #列印zip歸檔中目錄
fz.extract(files,outputdir)#解壓縮檔案
else:
print 'no'
for i in items: 
print i
request=urllib2.request(''+i)
response=urllib2.urlopen(request)
html=response.read()
name=i.split('/')[-1]+'.zip'
f=open(name,'w')
f.write(html)
f.close()#檔案記得關閉，否則下面unzip會出錯
unzip(name,'./')
os.remove(name)
os.listdir(os.getcwd())
os.chdir('../')
files=os.listdir('ttf/')
for i in files:#刪除無用檔案
if not (i.split('.')[-1]=='ttf' or i.split('.')[-1]=='otf'):
if os.path.isdir(i):
os.removedirs('ttf/'+i)
else:
os.remove('ttf/'+i)
print len(os.listdir('ttf/'))

搞到了2000+個字型，種類也挺多的，蠻好。

print(files) #列印zip歸檔中目錄

fz.extract(files,outputdir)

else:

print 'no'

for i in items:

print i

request=urllib2.request(i)

response=urllib2.urlopen(request)

html=response.read()

name=i.split('=')[-1]+'.zip'

f=open(name,'w')

f.write(html)

f.close()

unzip(name,'./')

os.remove(name)

print os.listdir(os.getcwd())

for root ,dire,fis in os.walk('./'):#遞迴遍歷資料夾

for i in fis:

if not (i.split('.')[-1]=='ttf' or i.split('.')[-1]=='otf'):

os.remove(root+i)

print i

for i in os.listdir('./'):

if os.path.isdir(i):

os.rmdir(i)

os.chdir('../')總體操作跟之前的差不多，跑了幾十分鐘下了4000多的字型。

總結

Python實現的爬取豆瓣電影資訊功能案例

本案例的任務為，爬取豆瓣電影top250的電影資訊包括序號電影名稱導演和主演評分以及經典台詞並將資訊作為字典形式儲存進txt檔案。這裡只用到requests庫，沒有用到beautifulsoup庫 step1 首先獲取每一頁的源用requests.get函式獲取，為了防止請求錯誤，使用t...

python實現單詞的簡單爬取

因為不需要登入之類的操作，可以說，這是對爬蟲初學者來說最簡單最基礎的乙個案例了，由於之後要用到這裡就簡單學習記錄一下。爬取目標金山詞霸的四六級詞彙我們可以很容易看到四個選項，六級就不列出來了。很容易拿到，且規律這麼明顯，所以說很容易。我們每次爬取單詞的時候僅需對這四個詞庫隨機選取即可。我們選擇四...

Window CMD 指令碼中模擬陣列的實現方法

cmd指令碼中只有變數的概念，沒有陣列的概念，為了實現類似陣列的功能，需要進行變數巢狀使用，並使用for命令將輸出轉換為內部變數值。set services0 openldap slapd set services1 mongodb set services2 solr5 set services3...

python指令碼爬取字型檔案的實現方法

Python實現的爬取豆瓣電影資訊功能案例

python實現單詞的簡單爬取

Window CMD 指令碼中模擬陣列的實現方法

相關推薦