現在的許多手寫字型識別**都是基於已有的mnist手寫字型資料集進行的,而kaggle需要用到**上給出的資料集並生成測試集的輸出用於提交。這裡選擇keras搭建卷積網路進行識別,可以直接生成測試集的結果,最終結果識別率大概97%左右的樣子。
# -*- coding: utf-8 -*-
"""created on tue jun 6 19:07:10 2017
@author: administrator
"""from keras.models import sequential
from keras.layers import dense, dropout, activation, flatten
from keras.layers import convolution2d, maxpooling2d
from keras.utils import np_utils
import os
import pandas as pd
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
from keras import backend as k
import tensorflow as tf
# 全域性變數
batch_size = 100
nb_classes = 10
epochs = 20
# input image dimensions
img_rows, img_cols = 28, 28
# number of convolutional filters to use
nb_filters = 32
# size of pooling area for max pooling
pool_size = (2, 2)
# convolution kernel size
kernel_size = (3, 3)
inputfile='f:/data/kaggle/mnist/train.csv'
inputfile2= 'f:/data/kaggle/mnist/test.csv'
outputfile= 'f:/data/kaggle/mnist/test_label.csv'
pwd = os.getcwd()
os.chdir(os.path.dirname(inputfile))
train= pd.read_csv(os.path.basename(inputfile)) #從訓練資料檔案讀取資料
os.chdir(pwd)
pwd = os.getcwd()
os.chdir(os.path.dirname(inputfile))
test= pd.read_csv(os.path.basename(inputfile2)) #從測試資料檔案讀取資料
os.chdir(pwd)
x_train=train.iloc[:,1:785] #得到特徵資料
y_train=train['label']
y_train = np_utils.to_categorical(y_train, 10)
mnist=input_data.read_data_sets("mnist_data/",one_hot=true) #匯入資料
x_test=mnist.test.images
y_test=mnist.test.labels
# 根據不同的backend定下不同的格式
if k.image_dim_ordering() == 'th':
x_train=np.array(x_train)
test=np.array(test)
x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
input_shape = (1, img_rows, img_cols)
test = test.reshape(test.shape[0], 1, img_rows, img_cols)
else:
x_train=np.array(x_train)
test=np.array(test)
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
test = test.reshape(test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
test = test.astype('float32')
x_train /= 255
x_test /= 255
test/=255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
print(test.shape[0], 'testouput samples')
model=sequential()#model initial
model.add(convolution2d(nb_filters, (kernel_size[0], kernel_size[1]),
padding='same',
input_shape=input_shape)) # 卷積層1
model.add(activation('relu')) #啟用層
model.add(convolution2d(nb_filters, (kernel_size[0], kernel_size[1]))) #卷積層2
model.add(activation('relu')) #啟用層
model.add(maxpooling2d(pool_size=pool_size)) #池化層
model.add(dropout(0.25)) #神經元隨機失活
model.add(flatten()) #拉成一維資料
model.add(dense(128)) #全連線層1
model.add(activation('relu')) #啟用層
model.add(dropout(0.5)) #隨機失活
model.add(dense(nb_classes)) #全連線層2
model.add(activation('softmax')) #softmax評分
#編譯模型
model.compile(loss='categorical_crossentropy',
optimizer='adadelta',
metrics=['accuracy'])
#訓練模型
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs,verbose=1)
model.predict(x_test)
#評估模型
score = model.evaluate(x_test, y_test, verbose=0)
print('test score:', score[0])
print('test accuracy:', score[1])
y_test=model.predict(test)
sess=tf.interactivesession()
y_test=sess.run(tf.arg_max(y_test,1))
y_test=pd.dataframe(y_test)
y_test.to_csv(outputfile)
keras實現手寫字型識別
losses損失函式 optimizers優化目標函式,比如sgd datasets常用資料集,比如mnist models序貫模型,比如sequential layers神經網路中的層,比如全連線層dense activations啟用函式 import keras 匯入keras from ke...
用word模仿手寫字型
首先我們來看一下效果圖 咳咳,是不是很有feel!將手寫字型解壓到資料夾下 開啟控制面板,搜尋字型,然後進入資料夾 然後將之前資料夾下的ttf檔案拖拽進去就可以了。接下來開啟word!開啟檔案 選項 信任中心 信任中心設定中選擇該選項 開啟檢視 巨集 新建 手寫字型 sub 手寫字型 手寫字型 巨集...
深度學習 tensorflow識別手寫字型
我們依舊以mnist手寫字型資料集,來看看我們如何使用tensorflow來實現mlp。import tensorflow as tf import tensorflow.examples.tutorials.mnist.input data as input data mnist input da...