1、bike sharing demand
kaggle:
目的:根據日期、時間、天氣、溫度等特徵,**自行車的租借量
處理:1、將日期(含年月日時分秒)提取出年,月, 星期幾,以及小時
2、season, weather都是類別標記的,利用啞變數編碼
演算法模型選取:
回歸問題:1、randomforestregressor
2、gradientboostingregressor
# -*- coding: utf-8 -*-import csvview codeimport pandas as pd
import numpy as np
import matplotlib.pyplot as plt
train = pd.read_csv('
data/train.csv')
test = pd.read_csv('
data/test.csv')
# 選取特徵值
selected_features = ['
datetime
', '
season
', '
holiday',
'workingday
', '
weather
', '
temp
', '
atemp
', '
humidity
', '
windspeed']
#x_train =train[selected_features]
y_train = train["
count"]
result = test["
datetime"]
# 特徵值處理
month =pd.datetimeindex(train.datetime).month
day =pd.datetimeindex(train.datetime).dayofweek
hour =pd.datetimeindex(train.datetime).hour
season =pd.get_dummies(train.season)
weather =pd.get_dummies(train.weather)
x_train = pd.concat([season, weather], axis=1
)x_test = pd.concat([pd.get_dummies(test.season), pd.get_dummies(test.weather)], axis=1
)x_train[
'month
'] =month
x_test[
'month
'] =pd.datetimeindex(test.datetime).month
x_train[
'day
'] =day
x_test[
'day
'] =pd.datetimeindex(test.datetime).dayofweek
x_train[
'hour
'] =hour
x_test[
'hour
'] =pd.datetimeindex(test.datetime).hour
x_train[
'holiday
'] = train['
holiday']
x_test[
'holiday
'] = test['
holiday']
x_train[
'workingday
'] = train['
workingday']
x_test[
'workingday
'] = test['
workingday']
x_train[
'temp
'] = train['
temp']
x_test[
'temp
'] = test['
temp']
x_train[
'humidity
'] = train['
humidity']
x_test[
'humidity
'] = test['
humidity']
x_train[
'windspeed
'] = train['
windspeed']
x_test[
'windspeed
'] = test['
windspeed']
from sklearn.ensemble import *clf = gradientboostingregressor(n_estimators=200, max_depth=3
)clf.fit(x_train, y_train)
result =clf.predict(x_test)
result =np.expm1(result)
df=pd.dataframe()
df.to_csv('
results1.csv
', index = false, columns=['
datetime
','count'])
from sklearn.ensemble import randomforestregressor
gbr =randomforestregressor()
gbr.fit(x_train, y_train)
y_predict = gbr.predict(x_test).astype(int
)df = pd.dataframe()
df.to_csv('
result2.csv
', index=false, columns=['
datetime
', '
count'])
#predictions_file = open("
randomforestregssor.csv
", "wb"
)#open_file_object =csv.writer(predictions_file)
#open_file_object.writerow([
"datetime
", "
count"])
#open_file_object.writerows(
zip(res_time, y_predict))
2、daily news for stock market prediction
通過歷史資料:包含每日點選率最高的25條新聞,與當日**漲跌,來**未來**漲跌
方法一:
1、將25條新聞合併成一篇新聞,然後對每個單詞做預處理(去掉特殊字元,含數字的單詞,刪除停詞,變成小寫,取詞幹),然後用tf-idf提取特徵,用svm訓練
2、用word2vec提取特徵
具體實現:
3、
Kaggle競賽記錄
比賽 planet understanding the amazon from space這個比賽是乙個遙感影象識別,但是主辦方也提供了jpg,由於對遙感影象識別不熟悉,而且遙感影象資料太大不好處理,所以本次比賽使用的是jpg資料。這個比賽是乙個多標籤的分類問題,一共有17個類別,每張可以有乙個或者...
kaggle三個入門競賽教程
1.titanic 泰坦尼克之災 中文教程 邏輯回歸應用之kaggle泰坦尼克之災 英文教程 an interactive data science tutorial 2.house prices advanced regression techniques 房價 中文教程 kaggle競賽 201...
演算法競賽入門筆記整理
判斷是否為素數 int is prime int n 字串格式轉換sprintf函式 sprintf 儲存的字串,輸出格式控制符 要儲存的對應格式資料 c 需要指定標頭檔案的輸入輸出流和命名空間後,才能使用cin等函式 includeusing namespace std 宣告靜態常量可以用 con...