python資料去重並存入redis資料庫

import redis
import os
import re
# redis資料庫連線
r = redis.strictredis(host=
'localhost'
, port=
6379
, db=0)
# 利用redis total(set) 去重
defadd()
:# os.getcwd 返回表示當前工作目錄的字串
path = os.getcwd()+
"/data/source_word/"
# os.listdir(path) 返回乙個列表，該列表包含了 path 中所有檔案與目錄的名稱。
# 該列表按任意順序排列，並且不包含特殊條目 '.' 和 '..'，即使它們確實在目錄中存在。
word_file_list = os.listdir(path)
# open(file, mode='r', buffering=-1, encoding=none, errors=none, 
# newline=none, closefd=true, opener=none)
# 開啟 file 並返回對應的 file object。如果該檔案不能開啟，則觸發 oserror。
for word_file in word_file_list:
f =open
(path +
"/"+ word_file)
file_lines_list = f.readlines(
)for line in file_lines_list:
line.encode(
'utf-8'
)# re.findall(pattern, string, flags=0)
# 對 string 返回乙個不重複的 pattern 的匹配列表，
# string 從左到右進行掃瞄，匹配按找到的順序返回。
# 如果樣式裡存在一到多個組，就返回乙個組合列表；就是乙個元組的列表（如果樣式裡有超過乙個組合的話）。
# 空匹配也會包含在結果裡。
# [\u4e00-\u9fa5] 表示匹配所有中文
line = re.findall(
'[\u4e00-\u9fa5]'
, line)
line =
''.join(
[_ for _ in line]
)# redis sadd 命令將乙個或多個成員元素加入到集合中，已經存在於集合的成員元素將被忽略。
r.sadd(
'total'
, line)
f.close(
)# 匯出成訓練語料
defoutput_review
(review_file_path)
:    members = r.smembers(
'total'
)    f =
open
(review_file_path,
'w')
for member in members:
# fileobject.writelines( [ str ])
# writelines() 方法用於向檔案中寫入一串行的字串。
f.writelines(m)
f.close(
)data(
)review_file_path =
"/home/data"
+"/review.txt"
output_review(review_file_path)
r.close(
)

Python提取json資料並存入csv

import json import csvwith open e 道路資料.json encoding utf 8 as f json file json.load f 選取json表中features中的內容 arr json file features 宣告6列的二維空列表用於儲存資料 csv...

Python 生成啟用碼並存入MySQL

引入pymysql import pymysql 連線資料庫 conn pymysql.connect user root password password database test cursor conn.cursor 執行 mysql語句 cursor.execute create tabl...

定時抓取資料並存入資料庫

其實，這部分主要是實現定時抓取資料的程式，資料的抓取以及儲存程式已寫從tushare獲取歷史資料抓取交易日周一到周五資料，定時為每天的15 30抓取，其中主要使用到了schedule模組用於定時執行任務如下 import schedule import time from datetim...

python資料去重並存入redis資料庫

Python提取json資料並存入csv

Python 生成啟用碼並存入MySQL

定時抓取資料並存入資料庫

相關推薦