語言相關係數顯著性 R語言的相關性分析

x1：r語言內建資料集iris的前4列。 x2：r語言內建資料集state.x77

x = iris[,-5]x2 = state.x77

state.x77列名的含義：population：截至2023年7月1日的人口估計income：人均收入(1974)illiteracy：文盲率(2023年，佔人口百分比)life exp：預期壽命(1969-71年)murder：每10萬人的**和非過失殺人率(1976)hs grad 高中畢業生百分比(1970)frost：首都或大城市中最低溫度低於冰點(1931-1960)的平均天數area：土地面積(平方英里)

使用cov()函式計算。

關於協方差：cov(x,y)>0,表示x、y的變化為正趨勢，<0為負趨勢，＝0為無趨勢。協方差對資料的變化範圍敏感，無法反應變化趨勢的強弱和離散程度，但它是一些高階分析的基石。

3.1計算相關性係數

cor函式可計算三種相關性係數：pearson,kendall和spearman,預設是pearson。pearson是引數檢驗，需要兩個向量均服從正態分佈。另外兩個為非引數檢驗。

cor(x$sepal.length,x$petal.length)#> [1] 0.8717538cor(x$sepal.length,x$petal.length,method = "kendall")#> [1] 0.7185159cor(x$sepal.length,x$petal.length,method = "spearman")#> [1] 0.8818981

-輸入值為乙個數值型資料框/矩陣

可見，計算的結果是x的4個變數(4列)兩兩之間的相關性。

3.2 相關係數的顯著性檢驗

cor.test(x$sepal.length,x$petal.length)#> #>  pearson's product-moment correlation#> #> data:  x$sepal.length and x$petal.length#> t = 21.646, df = 148, p-value  alternative hypothesis: true correlation is not equal to 0#> 95 percent confidence interval:#>  0.8270363 0.9055080#> sample estimates:#>       cor #> 0.8717538cor.test(x$sepal.length,x$petal.length,method = "kendall")#> #>  kendall's rank correlation tau#> #> data:  x$sepal.length and x$petal.length#> z = 12.647, p-value  alternative hypothesis: true tau is not equal to 0#> sample estimates:#>       tau #> 0.7185159cor.test(x$sepal.length,x$petal.length,method = "spearman")#> warning in cor.test.default(x$sepal.length, x$petal.length, method =#> "spearman"): cannot compute exact p-value with ties#> #>  spearman's rank correlation rho#> #> data:  x$sepal.length and x$petal.length#> s = 66429, p-value  alternative hypothesis: true rho is not equal to 0#> sample estimates:#>       rho #> 0.8818981

cor.test函式還有乙個alternative引數，表示單邊/雙邊檢驗。有三個取值：「two.sided」(雙邊檢驗)，「less」，「greater」。相關性係數大於0時，應使用greater；小於0時，應使用less；如果不指定，則預設「two.sided」。

即在控制乙個或多個其他變數時，兩個變數之間的相互關係。(這裡的變數都應是連續型變數)

控制某個變數，指的是排除該變數的影響。被控制的變數稱為條件變數。

使用ggm::pcor()函式來計算。用法為：pcor(u, s)。

舉個栗子

cor(x2[,1],x2[,3])#> [1] 0.1076224cor(x2[,2],x2[,3])#> [1] -0.4370752

相關係數約為0.1和-4.3。控制其中乙個變數計算另乙個變數的影響，結果則不同。

#install.packages("ggm")library(ggm)#在控制收入的條件下，人口數量對文盲率的影響pcor(c(1,3,2),cov(x2))#> [1] 0.2257943#在控制人口的條件下，收入對文盲率的影響pcor(c(2,3,1),cov(x2))#> [1] -0.4725271

偏相關係數為0.2和-0.47，相比原來，絕對值大了一些。

pcor(c(1,5,2,3),cov(x2))#> [1] 0.3621683

偏相關性的顯著性檢驗

pcor.test(pcor(c(2,3,1),cov(x2)),q=3,n=50)#> $tval#> [1] -3.596675#> #> $df#> [1] 45#> #> $pvalue#> [1] 0.0007972922

用法為：pcor.test(r, q, n)

r是偏相關性計算結果，q是變數數，n是樣本數，在幫助文件中有描述。

語言相關係數顯著性 R語言的相關性分析

語言顯著性矩陣 R語言總結

皮爾森相關係數皮爾森相關係數的計算

斯皮爾曼相關係數範圍資料的相關係數

語言相關係數顯著性 R語言的相關性分析

語言顯著性矩陣 R語言總結

皮爾森相關係數 皮爾森相關係數的計算

斯皮爾曼相關係數範圍 資料的相關係數

相關推薦

皮爾森相關係數皮爾森相關係數的計算

斯皮爾曼相關係數範圍資料的相關係數