爬取雙色球歷史資料
需要的包:beautifulsoup install
指令碼如下 執行後生成的資料json處理後存在data檔案中
# -*- coding: utf-8 -*-
import urllib
import re
import json
from bs4 import beautifulsoup
li =
for year in range(2003, 2015):
print year
htmlcon = urllib.urlopen('' % year)
html = htmlcon.read()
htmlcon.close()
soup = beautifulsoup(html)
table_html_set = soup.findall(id='draw_list')
num_tuple_list =
for table_html in table_html_set:
tr_html_set = table_html.findall('tr')
for tr_html in tr_html_set:
span_html_set = tr_html.findall('span', attrs=)
num_tuple = tuple([int(x.text) for x in span_html_set])
if num_tuple:
print "count: %s" % len(num_tuple_list)
li.extend(num_tuple_list)
fl = open('data', 'w')
fl.write(json.dumps(li))
fl.close()
取data資料的指令碼 排序處理存入ticket.txt 可以用stop斷點分析資料用 省的每次都用指令碼跑
import json
try:
from ipython import embed as stop
except importerror:
from pdb import set_trace as stop
fl = open('data')
li_json = fl.read()
fl.close()
li = json.loads(li_json)
li = [tuple(x) for x in li]
li.sort(lambda x,y:cmp(x,y))
fl = open('ticket.txt', 'w')
for item in li:
line = ",".join([str(x) for x in item])
fl.writelines("%s\n" % line)
fl.close()
爬取雙色球開獎
爬取雙色球開獎資訊 實驗目的 了解http.cookiejar和cookie,了解如何獲取瀏覽器的header。實驗要求 掌握如何獲取瀏覽器headers的方法。本實驗需要外網連線。實驗原理 http.cookiejar簡介 有的 特別像社交 需要登陸才能抓取到 的資料,那麼光模擬瀏覽器請求 內容是...
python爬取雙色球以往開獎號碼
爬取雙色球開獎號碼 import requests import re defcrawl twoball page 10001 爬取網頁路徑 url str page shtml reponse requests.get url html reponse.text 檢測是否有這期 notfound ...
python爬取雙色球每期中獎號碼(原創)
print 資料庫開始連線 conn mysql.connector.connect host 127.0.0.1 user root passwd 123456 db test cursor conn.cursor print 資料庫已經連線 num 0 for j in range 3000,1...