python獲取所有鏈結儲存到資料表並依次開啟

python獲取網頁上所有鏈結，並儲存到sqlite3資料表中，並用瀏覽器開啟。如果該錶已存在，則直接從表中讀取鏈結並開啟。

表名中去掉開頭"http://", 結尾"/", 埠號，替換中間字元".", "/"為"_"

用到的python庫：

sgmllib，urllib網頁有關

re正規表示式

sqlite3資料表

subprocess子程序

#!/usr/bin/env python

#-*-coding:utf-8 -*-

from sgmllib import sgmlparser

import urllib,re

import sys, os, string, time

import sqlite3

import subprocess, signal

class urllist(sgmlparser):

def reset(self):

self.urls=

sgmlparser.reset(self)

def start_a(self,attrs):

href=[v for k,v in attrs if k=='href']

if href:

self.urls.extend(href)

def get_urls(url):

try:

usock=urllib.urlopen(url)

except:

print "get url except "+url

return

result=

parser=urllist()

parser.feed(usock.read())

usock.close()

parser.close()

urls=parser.urls

for url in urls:

if len(re.findall(r'^http://',url))>0 or len(re.findall(r'^../../',url))>0: #指定正規表示式

Python爬蟲爬蟲獲取資料儲存到檔案

本篇文章繼續介紹另外兩種方式來實現python爬蟲獲取資料，並將python獲取的資料儲存到檔案中。說明一下我的環境是python3.7，本地環境是python2.x的可能需要改部分用python3.x環境的沒問題。coding utf 8 import urllib.requestimpor...

Python爬蟲（二）爬蟲獲取資料儲存到檔案

本篇文章繼續介紹另外兩種方式來實現python爬蟲獲取資料，並將python獲取的資料儲存到檔案中。說明一下我的環境是python3.7，本地環境是python2.x的可能需要改部分用python3.x環境的沒問題。coding utf 8 import urllib.request impo...

python資料儲存到檔案

1 使用open與print進行資料儲存到檔案 filename列表形式檔名 def write file filename try for item name in filename out file open item name,w 寫模式開啟檔案，並賦值至檔案物件 data this is i...

python獲取所有鏈結儲存到資料表並依次開啟

Python爬蟲 爬蟲獲取資料儲存到檔案

Python爬蟲（二） 爬蟲獲取資料儲存到檔案

python資料儲存到檔案

相關推薦

Python爬蟲爬蟲獲取資料儲存到檔案

Python爬蟲（二）爬蟲獲取資料儲存到檔案