python爬取自如房間資訊二

主要是針對自如房價的爬取。以下**對房價進行處理，將裡面的數字提取出來，然後用knn最近鄰演算法去對上的資料進行分類。

import sys
import cv2
import numpy as np
#######   training part    ############### 
samples = np.loadtxt('generalsamples.data',np.float32)
responses = np.loadtxt('generalresponses.data',np.float32)
responses = responses.reshape((responses.size,1))
model = cv2.ml.knearest_create()
model.train(samples,cv2.ml.row_sample,responses)
def getnum(path):
im = cv2.imread(path)
out = np.zeros(im.shape,np.uint8)
gray = cv2.cvtcolor(im,cv2.color_bgr2gray)
#預處理一下
for i in range(gray.__len__()):
for j in range(gray[0].__len__()):
if gray[i][j] == 0:
gray[i][j] == 255
else:
gray[i][j] == 0
thresh = cv2.adaptivethreshold(gray,255,1,1,11,2)
count = 0 
numbers = 
for cnt in contours:
if cv2.contourarea(cnt)>80:
[x,y,w,h] = cv2.boundingrect(cnt)
if  h>25:
cv2.rectangle(im,(x,y),(x+w,y+h),(0,255,0),2)
roi = thresh[y:y+h,x:x+w]
roismall = cv2.resize(roi,(30,30))
roismall = roismall.reshape((1,900))
roismall = np.float32(roismall)
retval, results, neigh_resp, dists = model.findnearest(roismall, k = 1)
string = str(int((results[0][0])))
cv2.puttext(out,string,(x,y+h),0,1,(0,255,0))
count += 1
if count == 10:
break
return numbers
# numbers = getnum('1.png')

generalresponses.data

generalsamples.data

python爬取自如房子資訊，價格是個坑

前一陣換房子，找房子真的是太麻煩了，自如自動篩選的功能沒有自己手動篩選符合心意，所以打算都爬下來，自己看。看到上面那個那個對應的畫素擷取真是太坑，要是每次都這麼獲取，獲取的東西真的是太多了，但是這也是乙個方法。利用ocr將轉換成文字price string。將畫素位置和price string的數字...

爬取自己的csdn訪問排名資訊

爬取自己部落格的訪問量,積分,排名的資訊學python不久,前乙個月看見了一篇爬取csdn的文章,一直想自己試試,今天總算完成了乙個比較low的版本了吧 from requests import import os import time from bs4 import beautifulsoup...

Python 爬取網頁資訊

對於本次學習爬蟲中的一些總結 1.要熟練掌握基礎知識，包括一些基礎的語法 2.正規表示式的正確使用，建議學習北理工的python爬蟲課程 3.先寫大框架再新增小的功能解析 4.對程式異常處理要熟練，盡量使用try.excep結構 5.對於列表字串資料的基本使用到位，比如增刪改查等 6.思路必須清晰 ...

python爬取自如房間資訊 二

python爬取自如房子資訊，價格是個坑

爬取自己的csdn訪問排名資訊

Python 爬取網頁資訊

相關推薦

python爬取自如房間資訊二