python中定時執行爬蟲檔案方法

我們爬取資料的時候，經常會希望定時執行爬蟲，一般在凌晨的時候執行，那樣掛伺服器上就會減輕很大的負荷，所以我們就需要定時的任務，本文使用了scrapy框架，然後定時執行爬蟲**的方法。

宣告：此方法不一定是最好的方法，但是肯定是能達到效果的方法。

網上有很多的**介紹，最多的就是下面的方法：

import time
from scrapy.cmdline import execute
import sys
import os
import datetime
def dosth():
execute(['scrapy','crawl','lcp'])
# 想幾點更新,定時到幾點
def time_ti(h=20, m=24):
while true:
now = datetime.datetime.now()
# print(now.hour, now.minute)
if now.hour == h and now.minute == m:
dosth()
# 每隔60秒檢測一次
time.sleep(60)
dosth()

但是上面的方法執行一次就結束了，原因可能是cmdline是scrapy框架自帶的，開啟的是乙個主線程，任務完成主線程撤銷，所以就執行一次就結束了，完全達不到我們的要求。

所以下面的方法是最簡單的方法。

import time
import sys
import os
import datetime
def dingshi():
while true:
os.system("scrapy crawl lcp")#lcp是我們爬蟲的**名字哦
time.sleep(60) 
dingshi()

我們直接就是在linux伺服器上執行一次這個dingshi()的方法就可以了。

python實現scrapy定時執行爬蟲

專案需要程式能夠放在超算中心定時執行，於是針對scrapy寫了乙個定時爬蟲的程式main.py，直接放在scrapy的儲存的目錄中就能設定時間定時多次執行。最簡單的方法直接使用timer類 import time import os while true os.system scrapy cra...

crontab定時執行python檔案

首先在目錄下建立python檔案crontest.py usr bin python encoding utf 8 import os outfilename outcid.txt outfile open outfilename,w outfile.write this is crontab c...

Python爬蟲 scrapy定時執行的指令碼

由於伺服器的crontab莫名掛掉了，還沒找到解決的辦法，於是找了另乙個方法原理 1個程序多個子程序 scrapy程序將以下檔案放入scrapy專案中任意位置即可 from multiprocessing import process from scrapy import cmdline i...

python中定時執行爬蟲檔案方法

python實現scrapy定時執行爬蟲

crontab定時執行python檔案

Python爬蟲 scrapy定時執行的指令碼

相關推薦