python 部落格URL列表校驗

在構建了部落格列表之後，要做一些校驗，比如判斷有沒有重的或漏的，就要把url全部提取出來。

**：

import re
fp = open('d:\\in.txt', 'r',encoding="utf-8")
html = fp.read()
all_url = re.findall('',str(html),re.ignorecase)
all_url = list(set(all_url))
fp = open('d:\\csdn.txt', 'w')
s=0for each in all_url:
fp.write(each+'\n')
s=s+1
print(s)

只要把含所有部落格url的正文內容複製貼上到in.txt中，執行程式即可。

得到所有url之後還可以用excel排序，用beyond compare比較差異。

正則校驗url

http s?複雜點為 var urlregex http https w w w u4e00 u9fa5 或 http https w w w w 當然這三種寫法不夠嚴謹，與等明顯錯誤的url依然能匹配成功。下面是比較嚴謹的一些寫法涉及對http,https協議，網域名稱，ip，port的校驗...

python 構建部落格列表

在部落格全部備份到本地的情況下，如何根據本地檔案構建部落格列表呢？import re,os import urllib.request path d 我的檔案部落格備份 2020年3月7日 path2 tmp path path path2 filenames os.listdir path s ...

python 多執行緒訪問url列表中位址

import requests import threading import time import queue as queue url列表，這裡是虛構的,現實情況這個列表裡有大量的url link list start time.time class mythread threading.th...

python 部落格URL列表校驗

正則校驗url

python 構建部落格列表

python 多執行緒訪問url列表中位址

相關推薦