PageRank的模擬計算

2021-06-26 03:15:08 字數 3172 閱讀 8224

源於stanford在coursera上的mining massive datasets課程。做week 1裡的作業中要求做一些和pagerank有關的計算。

#include #include #include #include #include #include #include #include #include #include #include #include #include const double convergence_criteria = 1e-6;

double beta;

int num_vertices;

std::mapvertex_index;

std::vectorindex_vertex;

using graph = boost::adjacency_list;

graph connection_graph;

boost::numeric::ublas::matrixm;

boost::numeric::ublas::vectorv, v1, vtax;

void getbeta(const std::string& betaarg)

void mapvertexindex(const std::string& vertices_line)

num_vertices = index_vertex.size();

}void loadgraph(const std::string& filename)

std::string vertex_declaration;

std::getline(ifs, vertex_declaration);

mapvertexindex(vertex_declaration);

std::string v1, v2;

while(ifs>>v1>>v2)

}void displayinput() );

std::cout<(

std::cout, "\n"),

(decltype(*edge_iters.first) edge) );

}void generatematrix()

double degree_weight = 1.0 / degree;

auto adjacent_vertices_iter = boost::adjacent_vertices(

vertex, connection_graph);

for(auto v1 = adjacent_vertices_iter.first;

v1 != adjacent_vertices_iter.second; ++v1)

}}void initialguess()

inline bool checkconvergence()

void solve() {

int n_iter = 0;

do {

v1 = beta * boost::numeric::ublas::prod(m, v) + vtax;

v.swap(v1);

std::cout<<"iteration "<<(++n_iter)<<": "<

a b c

a ba c

b cc a

第一行為所有的頂點名,第二行起為頂點->頂點的有向邊。

執行例子:

~/p/m/pagerank> ./a.out 0.8 q3

beta: 0.8

vertices: a b c

edges:

a -> b

a -> c

b -> c

c -> a

iteration 1: [3](0.333333,0.2,0.466667)

iteration 2: [3](0.44,0.2,0.36)

iteration 3: [3](0.354667,0.242667,0.402667)

iteration 4: [3](0.3888,0.208533,0.402667)

iteration 5: [3](0.3888,0.222187,0.389013)

iteration 6: [3](0.377877,0.222187,0.399936)

iteration 7: [3](0.386615,0.217818,0.395567)

iteration 8: [3](0.38312,0.221313,0.395567)

iteration 9: [3](0.38312,0.219915,0.396965)

iteration 10: [3](0.384239,0.219915,0.395847)

iteration 11: [3](0.383344,0.220362,0.396294)

iteration 12: [3](0.383702,0.220004,0.396294)

iteration 13: [3](0.383702,0.220147,0.396151)

iteration 14: [3](0.383587,0.220147,0.396265)

iteration 15: [3](0.383679,0.220102,0.396219)

iteration 16: [3](0.383642,0.220138,0.396219)

iteration 17: [3](0.383642,0.220124,0.396234)

iteration 18: [3](0.383654,0.220124,0.396222)

iteration 19: [3](0.383645,0.220128,0.396227)

iteration 20: [3](0.383648,0.220125,0.396227)

iteration 21: [3](0.383648,0.220126,0.396226)

iteration 22: [3](0.383647,0.220126,0.396227)

iteration 23: [3](0.383648,0.220126,0.396226)

iteration 24: [3](0.383648,0.220126,0.396226)

a = 0.383648

b = 0.220126

c = 0.396226

當然求解矩陣方程的方法實在是太多了,迭代未必是最好的。

由PageRank想到的

首先來看看什麼是pagerank pagerank 技術 通過對由超過 50,000 萬個變數和 20 億個詞彙組成的方程進行計算,pagerank 能夠對網頁的重要性做出客觀的評價。pagerank 並不計算直接鏈結的數量,而是將從網頁 a 指向網頁 b 的鏈結解釋為由網頁 a 對網頁 b 所投的...

MapReduce實現的PageRank原理

pagerank手工計算得出的值見帖子 這個值有助於我們驗證下面mr計算是不是正確 首先假設有兩個節點a和b 原始矩陣如tiger老師的幻燈片第九頁 a 1 網頁1和2儲存在節點a上 網頁3和4儲存在節點b上 由於a在a上很容易計算1和2的出鏈 根據mr的本地運算的思想,網頁1和2的處理必在a上完成...

簡單PageRank的理解

右邊是其有向圖的矩陣表示,mij 1表示有i指向j的邊 我們把a,b,c比喻為三個水池,假設只要有某個水池向外的箭頭那麼其中的水會全部流出,這樣最後其最後剩餘的水量 權值 只取決於流入量。這樣我們就將乙個網頁的等級計算轉換為其他鏈結指向他的權值運算。上圖第一行描述了a的流出,其流向b的概率為m12 ...