用卷积神经网络检测脸部关键点的教程（一）

时间：02-13 来源：eetrend 点击：

FTRAIN = '~/data/kaggle-facial-keypoint-detection/training.csv'
FTEST = '~/data/kaggle-facial-keypoint-detection/test.csv'

def load(test=False, cols=None):
"""Loads data from FTEST if *test* is True, otherwise from FTRAIN.
Pass a list of *cols* if you're only interested in a subset of the
target columns.
"""
fname = FTEST if test else FTRAIN
df = read_csv(os.path.expanduser(fname)) # load pandas dataframe

# The Image column has pixel values separated by space; convert
# the values to numpy arrays:
df['Image'] = df['Image'].apply(lambda im: np.fromstring(im, sep=' '))

if cols: # get a subset of columns
df = df[list(cols) + ['Image']]

print(df.count()) # prints the number of values for each column
df = df.dropna() # drop all rows that have missing values in them

X = np.vstack(df['Image'].values) / 255. # scale pixel values to [0, 1]
X = X.astype(np.float32)

if not test: # only FTRAIN has any target columns
y = df[df.columns[:-1]].values
y = (y - 48) / 48 # scale target coordinates to [-1, 1]
X, y = shuffle(X, y, random_state=42) # shuffle train data
y = y.astype(np.float32)
else:
y = None

return X, y

X, y = load()
print("X.shape == {}; X.min == {:.3f}; X.max == {:.3f}".format(
X.shape, X.min(), X.max()))
print("y.shape == {}; y.min == {:.3f}; y.max == {:.3f}".format(
y.shape, y.min(), y.max()))

你没有必要看懂这个函数的每一个细节。但让我们看看上面的脚本输出：
$ python kfkd.py
left_eye_center_x 7034
left_eye_center_y 7034
right_eye_center_x 7032
right_eye_center_y 7032
left_eye_inner_corner_x 2266
left_eye_inner_corner_y 2266
left_eye_outer_corner_x 2263
left_eye_outer_corner_y 2263
right_eye_inner_corner_x 2264
right_eye_inner_corner_y 2264
…
mouth_right_corner_x 2267
mouth_right_corner_y 2267
mouth_center_top_lip_x 2272
mouth_center_top_lip_y 2272
mouth_center_bottom_lip_x 7014
mouth_center_bottom_lip_y 7014
Image 7044
dtype: int64
X.shape == (2140, 9216); X.min == 0.000; X.max == 1.000
y.shape == (2140, 30); y.min == -0.920; y.max == 0.996

首先，它打印出了CSV文件中所有列的列表以及每个列的可用值的数量。因此，虽然我们有一个图像的训练数据中的所有行，我们对于mouth_right_corner_x只有个2,267的值等等。

load()返回一个元组(X，y)，其中y是目标矩阵。 y的形状是n×m的，其中n是具有所有m个关键点的数据集中的样本数。删除具有缺失值的所有行是这行代码的功能：
df = df.dropna() # drop all rows that have missing values in them

这个脚本输出的y.shape == (2140, 30)告诉我们，在数据集中只有2140个图像有着所有30个目标值。

一开始，我们将仅训练这2140个样本。这使得我们比样本具有更多的输入大小（9,216）; 过度拟合可能成为一个问题。当然，抛弃70％的训练数据也是一个坏主意。但是目前就这样，我们将在后面谈论。

第一个模型：一个单隐层

现在我们已经完成了加载数据的工作，让我们使用Lasagne并创建一个带有一个隐藏层的神经网络。我们将从代码开始：
# add to kfkd.py
from lasagne import layers
from lasagne.updates import nesterov_momentum
from nolearn.lasagne import NeuralNet

net1 = NeuralNet(
layers=[ # three layers: one hidden layer
('input', layers.InputLayer),
('hidden', layers.DenseLayer),
('output', layers.DenseLayer),
],
# layer parameters:
input_shape=(None, 9216), # 96x96 input pixels per batch
hidden_num_units=100, # number of units in hidden layer
output_nonlinearity=None, # output layer uses identity function
output_num_units=30, # 30 target values

# optimization method:
update=nesterov_momentum,
update_learning_rate=0.01,
update_momentum=0.9,

上一篇：Board从入门到精通系列（六）
下一篇：基于Dragonboard 410c服务器系列之web局域网服务器搭建

卷积神经网络 GPU CPU 相关文章：

栏目分类