理解R語言中的factor

2021-08-10 16:32:20 字數 2491 閱讀 7442

1.

2.

在r語言中,因子(factor)表示的是乙個編號或者乙個等級,即,乙個點。例如,人的個數可以是1,2,3,4……那麼因子就包括,1,2,3,4…..還有描述協變數水平時,會用到高、中、低,也是因子,因為這些都是乙個點。與之區別的向量,是乙個連續性的值,例如,數值中有1,1.1,1.2……可以作為數值來計算,而因子則不可以。簡單通俗來講:因子是乙個點,向量是乙個有方向的範圍。在r中,如果把數字作為因子,那麼在匯入資料之後,需要將向量轉換為因子(factor),而因子在整個計算過程中不再作為數值,而是乙個」符號」而已。

以例項進行解釋和說明

data <- c(1,2,2,3,1,2,3,3,1,2,3,3,1)  

> data

[1] 122

3123

3123

31 > fdata <- factor(data)

> fdata

[1] 122

3123

3123

31levels: 123

> class(fdata)

[1] "factor"

> class(data)

[1] "numeric"

#factor()函式將原來的數值型的向量轉化為了factor型別。factor型別的向量中有levels的概念。levels就是factor中的所有元素的集合(沒有重複)。我們可以發現levels就是factor中元素排除重複後且字元化的結果。因為levels的元素都是character。

> levels(fdata)

[1] "1"

"2""3"

#我們可以在factor生成時,通過labels向量來指定levels,繼續上面的程式:

> rdata <- factor(data,labels=c("i","ii","iii"))

> rdata

[1] i ii ii iii i ii iii iii i ii iii iii i

levels: i ii iii

> rdata <- factor(data,labels=c("e","ee","eee"))

> rdata

[1] e ee ee eee e ee eee eee e ee eee eee e

levels: e ee eee

#factors可以指定資料的順序

> mons <- c("march","april","january","november","january", "september","october","september","november","august", "january","november","november","february","may","august", "july","december","august","august","september","november", "february","april")

> mons <- factor(mons)

> mons

[1] march april january november january

[6] september october september november august

[11] january november november february may

[16] august july december august august

[21] september november february april

11 levels: april august december february ... september

> table(mons)

mons

april august december february january

24123

july march may november october

11151

september

3#顯然月份是有順序的,我們可以為factor指定順序

mons = factor(mons,levels=c("january","february","march","april","may","june","july","august","september","october","november","december"),ordered=true)

> table(mons)

mons

january february march april may

32121

june july august september october

01431

november december

51

R語言factor型別轉numeric

r 語言中為了進行資料分析,比如回歸分析,這時候對於資料 中的factor型別的資料會帶來弊端,比如對因子的每乙個資料都進行一次回歸,這樣就顯得很複雜,且違背了我們的初衷,需要把factor轉換為numeric格式。factor不能直接轉換為numeric格式,它會按照因子的大小順序依次取值1,2,...

R語言 因子的構造 factor函式

參考內容 教程一,非數值型變數 類別變數和順序變數 在r語言中稱為因子,也稱為因子型變數。因子型變數內的所有非重複值,被稱為因子水平 levels 建立因子 在r語言中可以使用factor 函式和gl 函式來建立因子變數。1 使用factor 函式 factor 函式的語法格式為 f factor ...

R語言中的引號

aa this is an example.1 this is an example.bb this is an example.1 this is an example.identical aa,bb 1 true anne s home 1 anne s home anne s home 1 a...