學習參考書:
nltk.set_proxy("**.com:80")
nltk.download()
2. 使用sents(fileid)函式時候出現:resource 'tokenizers/punkt/english.pickle' not found. please use the nltk ********** to obtain the resource:
import nltk
nltk.download()
3. 語料corpus元素獲取函式
from nltk.corpus import webtext
webtext.fileids() #得到語料中所有檔案的id集合
webtext.raw(fileid) #給定檔案的所有字元集合
webtext.words(fileid) #所有單詞集合
webtext.sents(fileid) #所有句子集合
example
description
fileids()
the files of the corpus
fileids([categories])
the files of the corpus corresponding to these categories
categories()
the categories of the corpus
categories([fileids])
the categories of the corpus corresponding to these files
raw()
the raw content of the corpus
raw(fileids=[f1,f2,f3])
the raw content of the specified files
raw(categories=[c1,c2])
the raw content of the specified categories
words()
the words of the whole corpus
words(fileids=[f1,f2,f3])
the words of the specified fileids
words(categories=[c1,c2])
the words of the specified categories
sents()
the sentences of the whole corpus
sents(fileids=[f1,f2,f3])
the sentences of the specified fileids
sents(categories=[c1,c2])
the sentences of the specified categories
abspath(fileid)
the location of the given file on disk
encoding(fileid)
the encoding of the file (if known)
open(fileid)
open a stream for reading the given corpus file
root()
the path to the root of locally installed corpus
readme()
the contents of the readme file of the corpus
4.文字處理的一些常用函式
假若text是單詞集合的列表
len(text) #單詞個數
set(text) #去重
sorted(text) #排序
text.count('a') #數給定的單詞的個數
text.index('a') #給定單詞首次出現的位置
freqdist(text) #單詞及頻率,keys()為單詞,*[key]得到值
freqdist(text).plot(50,cumulative=true) #畫累積圖
bigrams(text) #所有的相鄰二元組
text.collocations() #找文字中頻繁相鄰二元組
text.concordance("word") #找給定單詞出現的位置及上下文
text.similar("word") #找和給定單詞語境相似的所有單詞
text.common_context("a「,"b") #找兩個單詞相似的上下文語境
text.dispersion_plot(['a','b','c',...]) #單詞在文字中的位置分布比較圖
text.generate() #隨機產生一段文字
nltk's conditional frequency distributions: commonly-used methods and idioms for defining,accessing, and visualizing a conditional frequency distribution.of counters.
example
description
cfdist = conditionalfreqdist(pairs)
create a conditional frequency distribution from a list of pairs
cfdist.conditions()
alphabetically sorted list of conditions
cfdist[condition]
the frequency distribution for this condition
cfdist[condition][sample]
frequency for the given sample for this condition
cfdist.tabulate()
tabulate the conditional frequency distribution
cfdist.tabulate(samples, conditions)
tabulation limited to the specified samples and conditions
cfdist.plot()
graphical plot of the conditional frequency distribution
cfdist.plot(samples, conditions)
graphical plot limited to the specified samples and conditions
cfdist1 < cfdist2
test if samples in cfdist1
occur less frequently than in cfdist2
to be continued
NLTK學習筆記
學習參考書 nltk.set proxy com 80 nltk.download 2.使用sents fileid 函式時候出現 resource tokenizers punkt english.pickle not found.please use the nltk to obtain the...
NLTK學習筆記
學習參考書 nltk.set proxy com 80 nltk.download 2.使用sents fileid 函式時候出現 resource tokenizers punkt english.pickle not found.please use the nltk to obtain the...
NLTK 學習筆記(2)
pos速查表 標記含義 例子adj 形容詞new,good,high,special,big,local adv副詞 really,already,still,early,now cnj連詞 and,or,but,if,while,although det限定詞 the,a,some,most,ev...