NLP（五）文字分類

1、svm

2、樸素貝葉斯

3、lda

本文給出**和使用中的效果

1、svm

svm做文字分類準確率並不高, 而且耗時，訓練集少時推薦邏輯回歸

from sklearn.feature_extraction.text import tfidfvectorizer
from sklearn.svm import svc
tfidf   = tfidfvectorizer()
# x_train格式: ['字串1'，'字串2']
matrix = tfidf.fit_transform(x_train).toarray()
svm = svc()
svm.fit(x_train, y_train)

2、樸素貝葉斯

在大資料預選賽中使用過，可是泛化能力沒有邏輯回歸好，但準確很高而且快

from sklearn.*****_bayes import multinomialnb
mu = multinomialnb(alpha=2)
# x_train輸入格式和上面一樣
mu.fit(x_train, y_train)

3、lda

#這裡用的是情感分類的資料集，設定topic為2， 把詞頻統計出來矩陣儲存在npy也行的。
import lda
x = np.genfromtxt("datasets/cnews/vocab.txt", skip_header=1, dtype = np.int)
model = lda.lda(random_state=1, n_topics=2, n_iter=1000)
model.fit(x)```

NLP 文字分類思路

github部落格傳送門 csdn部落格傳送門載入詞嵌入矩陣一般情況為字典形式載入任務資料一般情況為字串形式我喜歡程式設計或者 i love play computer 對載入的所有任務資料求乙個最大字串長度以便後面將所有資料填充至一樣的長度將每條資料以每個樣本的形式存入列表我在家...

NLP 中文文字分類詳細

實現如下customprocessor class customprocessor dataprocessor def get train examples self,data dir return self.create examples self.read tsv os.path.join da...

2020 12 13 NLP 中文短文本分類

nlp 中文短文本分類 wordcloud 製作詞云 import jieba import pandas as pd import numpy as np from scipy.misc import imread from wordcloud import wordcloud,imagecolo...

NLP（五）文字分類

NLP 文字分類思路

NLP 中文文字分類 詳細

2020 12 13 NLP 中文短文本分類

相關推薦

NLP 中文文字分類詳細