10個Python字串處理技巧和竅門（1）

追求文字分析路徑，但不知道從**開始？嘗試使用此字串處理入門，首先了解在基本級別上使用python操縱和處理字串的知識。

自然語言處理和文字分析是當前研究和應用的熱門領域。這些領域需要各種特定的技能和概念，在進行有意義的練習之前，需要透徹理解。但是，在此之前，必須進行基本的字串操作和處理。

我認為，有兩種不同型別的廣泛的計算字串處理技能需要掌握。首先是正規表示式，這是一種基於模式的文字匹配方法。

另一種獨特的計算字串處理技能是能夠利用給定程式語言的標準庫進行基本的字串操作。因此，本文是簡短的python字串處理入門，適用於那些追求更深入的文字分析職業的人。

請注意，有意義的文字分析超出了字串處理的範圍，這些更高階技術的核心可能不需要您經常自己操作文字。但是，文字資料預處理是成功的文字分析專案的重要且耗時的部分，這些上述字串處理技巧在這裡將是無價的。從根本上理解文字的計算處理在概念上對於理解更高階的文字分析技術也非常重要。

以下許多示例都使用python標準庫字串模組，因此方便參考是乙個好主意。

剝離空格是基本字串處理要求。您可以使用lstrip()方法（左）去除前導空格，使用rstrip()（右）去除尾隨空格，並使用去除前導和尾隨strip()。

s = '   this is a sentence with whitespace.       \n'
print('strip leading whitespace: {}'.format(s.lstrip()))
print('strip trailing whitespace: {}'.format(s.rstrip()))
print('strip all whitespace: {}'.format(s.strip()))

strip leading whitespace: this is a sentence with whitespace. strip trailing whitespace: this is a sentence with whitespace.

strip all whitespace: this is a sentence with whitespace.

有興趣剝離除空格以外的其他字元嗎？相同的方法很有用，可以通過傳入要剝離的字元來使用。

s = 'this is a sentence with unwanted characters.aaaaaaaa'
print('strip unwanted characters: {}'.format(s.rstrip('a')))

strip unwanted characters: this is a sentence with unwanted characters.

format()如有必要，請不要忘記檢視字串文件。

（將字串拆分為較小的子字串列表通常是有用的，並且在python中使用該split()方法可以輕鬆實現。

s = 'mooc is a fantastic resource'
print(s.split())

['mooc', 'is', 'a', 'fantastic', 'resource']

預設情況下，split()在空白處分割，但也可以傳入其他字串行。

s = 'these,words,are,separated,by,comma'
print('\',\' separated split -> {}'.format(s.split(',')))
s = 'abacbdebfgbhhgbabddba'
print('\'b\' separated split -> {}'.format(s.split('b')))

',' separated split -> ['these', 'words', 'are', 'separated', 'by', 'comma']
'b' separated split -> ['a', 'ac', 'de', 'fg', 'hhg', 'a', 'dd', 'a']

需要上述操作的對立面嗎？您可以使用join()方法將列表元素字串連線到python中的單個字串中。

s = [ 'mooc'，'is'，'a'，'fantastic'，'resource' ] 
print（''. join（s））

mooc is a fantastic resource

那不是事實！並且，如果您想將列表元素之間使用空格以外的其他內容進行聯接？這件事可能有點陌生，但也很容易完成。

s = ['eleven', 'mike', 'dustin', 'lucas', 'will']
print(' and '.join(s))

eleven and mike and dustin and lucas and will

python沒有內建的字串反向方法。但是，由於可以像列表一樣對字串進行切片，因此可以以與列表元素可以反轉的簡潔方式進行反轉。

s = 'kaggle'
print('the reverse of kaggle is {}'.format(s[::-1]))

the reverse of kdnuggets is: elggak

大小寫之間的轉換可以用upper()，lower()和swapcase()方法。

s = 'kaggle'
print('\'kaggle\' as uppercase: {}'.format(s.upper()))
print('\'kaggle\' as lowercase: {}'.format(s.lower()))

'kaggle' as uppercase: kaggle

'kaggle' as lowercase: kaggle

10個Python字串處理技巧和竅門（1）

Python 字串處理

Python字串處理

Python字串處理

相關推薦