R的清洗工作

# find columns with na or nan
# tran is the dataframe name
# find completed obs. w/ all variables
tran.comp
# class and statistics value of the dataset in overall
summary(tran)

> class(dataname&columnname)

# 唯一項的數量

> levels(dataname&columnname)

# 唯一項的數字的數量

**excel中預設儲存日期格式問題**

as.date(-5252, origin="1900-01-01")

# convert character into date

使用base r可以將其轉換為類的posixct物件，但是這會在時間上新增乙個日期：

id
time
dfas.posixct(df$time,format="%h:%m:%s")
[1] "2012-08-20 00:00:01 cest" "2012-08-20 01:02:00 cest"
[3] "2012-08-20 09:30:01 cest" "2012-08-20 14:15:25 cest"

install.packages("dplyr")

dataset: cd

> distinct(cd)
# a tibble: 4,000 x 13

11 資料清洗工作

author nimo ding 資料分析師的資料清洗佔據了80 的時間。沒有高質量的資料，就沒有高質量的資料探勘，而清洗是高質量資料的一道保障。養成資料審核的習慣非常重要。資料清洗規則完全合一完整性單條資料是否存在空值，統計的字段是否完善。全面性觀察某一列的全部數值，看平均值最大值最小...

R工作目錄

說明斜體為r語言 01確保r軟體處於可使用狀態 r is free software and comes with absolutely no warranty.you are welcome to redistribute it under certain conditions.type lic...

關於R語言字元型資料清洗問題

最近做乙個關於投資者是否再次投資的專案，需要針對客戶匯出的資料進行清洗後建模分析，我目前選擇的模型是xgboost，貌似資料必須全是numeric。資料結構如下在這個裡面，我們需要做的是將第一列裡面的是替換為1，第四列第七列第八列的字元也替換為數字。具體需求如下平台標籤替換 0 na ...

R的清洗工作

11 資料清洗工作

R工作目錄

關於R語言字元型資料清洗問題

相關推薦