詞向量之載入word2vec和glove

1 google用word2vec預訓練了300維的新聞語料的詞向量googlenews-vecctors-negative300.bin，解壓後3.39個g。

可以用gensim載入進來，但是需要記憶體足夠大。

#載入google訓練的詞向量
import gensim
model = gensim.models.keyedvectors.load_word2vec_format('googlenews-vectors-negative300.bin',binary=true)
print(model['love'])

2 用glove預訓練的詞向量也可以用gensim載入進來，只是在載入之前要多做一步操作，**參考。

glove300維的詞向量有5.25個g。

# 用gensim開啟glove詞向量需要在向量的開頭增加一行：所有的單詞數 詞向量的維度
import gensim
import os
import shutil
import hashlib
from sys import platform
#計算行數，就是單詞數
def getfilelinenums(filename):
f = open(filename, 'r')
count = 0
for line in f:
count += 1
return count
#linux或者windows下開啟詞向量檔案，在開始增加一行
def prepend_line(infile, outfile, line):
with open(infile, 'r') as old:
with open(outfile, 'w') as new:
new.write(str(line) + "\n")
shutil.copyfileobj(old, new)
def prepend_slow(infile, outfile, line):
with open(infile, 'r') as fin:
with open(outfile, 'w') as fout:
fout.write(line + "\n")
for line in fin:
fout.write(line)
def load(filename):
num_lines = getfilelinenums(filename)
gensim_file = 'glove_model.txt'
gensim_first_line = "{} {}".format(num_lines, 300)
# prepends the line.
if platform == "linux" or platform == "linux2":
prepend_line(filename, gensim_file, gensim_first_line)
else:
prepend_slow(filename, gensim_file, gensim_first_line)
model = gensim.models.keyedvectors.load_word2vec_format(gensim_file)
load('glove.840b.300d.txt')

生成的glove_model.txt就是可以直接用gensim開啟的模型。

關於word2vec和詞向量的理解

1 如何產生詞向量 word embedding 原始語料某個漢字的one hot編碼向量 1 v 和word2vec模型中間引數矩陣w v n 相乘得到詞向量 1 n n 2 訓練得到的word2vec 模型的意義是什麼？3 弊端 word2vec 與一般的共現計數不同，word2vec主要來單...

word2vec 過程理解詞向量的獲取

網上有很多這方面的資源，詳細各位都能夠對於word2vec了解了大概，這裡只講講個人的理解，目的通過對於乙個神經網路的訓練，得到每個詞對應的乙個向量表達基於這個神經網路，是基於語言模型，即給定t個詞的字串s，計算s是自然語言的概率p w1，w2,wt 而構建的，更直白點，就是通過輸入wi的上下...

利用Word2Vec訓練詞向量過程

先明確一點，選用不同的詞向量作為模型的初始值，效果的差異非常大！那麼怎麼產生乙個好的詞向量呢？參看文章 1 英文的詞向量可以訓練，也可以用google訓練好的那個模型。2 但是中文就不行了，沒有乙個公布乙個權威的詞向量。語料對詞向量的影響比模型的影響要重要得多得多得多重要的事說三遍很多都提到語...

詞向量之載入word2vec和glove

關於word2vec和詞向量的理解

word2vec 過程理解 詞向量的獲取

利用Word2Vec訓練詞向量過程

相關推薦

word2vec 過程理解詞向量的獲取