新聞資訊是通過爬蟲獲取,使用scrapy框架進行爬蟲任務;使用airflow工作流監控平台對爬蟲任務進行管理、監控(可使用celeryexecutor分布式,也可使用localexecutor多程序進行資料採集)。以下主要是對airflow的安裝和配置。
目前使用的系統環境為centos linux release 7.4.1708 (core)
,linux
版本的核心linux version 3.10.0-693.2.2e17.x86_64
.
1、執行命令安裝
cd /opt
sh anaconda3-5.2.0-linux-x86_64.sh
(按回車鍵,直到出現》 輸入yes)
/opt/anaconda3
(安裝目錄)
2、配置環境變數
echo "export path=/opt/anaconda3/bin:$path" >> /etc/profile
source /etc/profile
mysql作為airflow資料庫,主要是記錄airflow資訊;
redis作為celery的broker和backend(也可以用rabbitmq),如果不使用celeryexecutor則不需要redis配置。
通過anaconda
安裝虛擬環境news_push
/opt/anaconda3/bin/conda create -y --name news_push python=3.6.5
airflow安裝、配置
mysql -uroot -p
回車後輸入密碼
create user 'airflow'@'localhost' identified by 'airflow';
create database airflow;
grant all privileges on airflow.* to 'airflow'@'localhost' identified by 'airflow';
flush privileges;
修改airflow配置檔案
vim /opt/newspush/airflow/airflow.cfg
修改內容為:
安裝celery支援及celeryde redis元件
pip install airflow[celery]
pip install celery[redis]
安裝mysql-python
yum install mysql-python
pip install pymysql==0.7.1
如果pymysql版本為0.8.0或以上則會有警告:
再次初始化/opt/anaconda3/envs/news_push/lib/python3.6/site-packages/pymysql/cursors.py:170: warning: (1300, "invalid utf8mb4 chara
result =self._query(query)
airflow initdb
錯誤解決
traceback (most recent call last):
file "/opt/anaconda3/envs/news_push/bin/airflow", line
17, in
from airflow import configuration
file "/opt/anaconda3/envs/news_push/lib/python3.6/site-packages/airflow/__init__.py", line
30, in
from airflow import settings
file "/opt/anaconda3/envs/news_push/lib/python3.6/site-packages/airflow/settings.py", line
159, in
configure_orm()
file "/opt/anaconda3/envs/news_push/lib/python3.6/site-packages/airflow/settings.py", line
147, in configure_orm
engine = create_engine(sql_alchemy_conn, **engine_args)
file "/opt/anaconda3/envs/news_push/lib/python3.6/site-packages/sqlalchemy/engine/__init__.py", line
424, in create_engine
return strategy.create(*args, **kwargs)
file "/opt/anaconda3/envs/news_push/lib/python3.6/site-packages/sqlalchemy/engine/strategies.py", line
81, in
create
dbapi = dialect_cls.dbapi(**dbapi_args)
file "/opt/anaconda3/envs/news_push/lib/python3.6/site-packages/sqlalchemy/dialects/mysql/mysqldb.py", line
102, in dbapi
return __import__('mysqldb')
modulenotfounderror: no module named 'mysqldb'
vim /opt/anaconda3/envs/news_push/lib/python3.6/site-packages/sqlalchemy/dialects/mysql/mysqldb.py
(最後一行錯誤資訊.py檔案路徑)
在**開頭增加
import pymysql
pymysql.install_as_mysqldb()
再次初始化
airflow initdb
airflow啟動及測試
airflow使用
airflow安裝啟動
airflow框架下支援celery的問題
Airflow安裝部署
新聞資訊是通過爬蟲獲取,使用scrapy框架進行爬蟲任務 使用airflow工作流監控平台對爬蟲任務進行管理 監控 可使用celeryexecutor分布式,也可使用localexecutor多程序進行資料採集 以下主要是對airflow的安裝和配置。目前使用的系統環境為centos linux r...
airflow分布式部署(二)mysql安裝
airflow生產環境一般以mysql作為元資料庫,所以需要安裝mysql 通過rpm安裝 rpm ivh mysql community release el7 5 noarch.rpm 安裝mysql yuminstall mysql server 授權chown r mysql mysql v...
airflow排程安裝
1.安裝gcc yum install gcc y 後續安裝airflow如果不成功,可以再次執行,它會更新包 2.安裝setuptools4.環境配置 安裝依賴的環境 yum y install zlib devel bzip2 devel openssl devel ncurses devel ...