本文介绍了Python Numpy错误:ValueError:使用序列设置数组元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试构建类似于theano logistic_sgd.py实现中提供的mnist.pkl.gz的数据集.以下是我的代码段.

import numpy as np
import csv
from PIL import Image
import gzip, cPickle
import theano
from theano import tensor as T

def load_dir_data(csv_file=""):
    print(" reading: %s" %csv_file)
    dataset=[]
    labels=[]

    cr=csv.reader(open(csv_file,"rb"))
    for row in cr:
        print row[0], row[1]
        try:
            image=Image.open(row[0]+'.jpg').convert('LA')
            pixels=[f[0] for f in list(image.getdata())]
            dataset.append(pixels)
            labels.append(row[1])
            del image
        except:
            print("image not found")
    ret_val=np.array(dataset,dtype=theano.config.floatX)
    return ret_val,np.array(labels).astype(float)


def generate_pkl_file(csv_file=""):
    Data, y =load_dir_data(csv_file)
    train_set_x = Data[:1500]
    val_set_x = Data[1501:1750]
    test_set_x = Data[1751:1900]
    train_set_y = y[:1500]
    val_set_y = y[1501:1750]
    test_set_y = y[1751:1900]
    # Divided dataset into 3 parts. I had 2000 images.

    train_set = train_set_x, train_set_y
    val_set = val_set_x, val_set_y
    test_set = test_set_x, val_set_y

    dataset = [train_set, val_set, test_set]

    f = gzip.open('file.pkl.gz','wb')
    cPickle.dump(dataset, f, protocol=2)
    f.close()


if __name__=='__main__':
    generate_pkl_file("trainLabels.csv")

错误消息:追溯(最近一次通话):

  File "convert_dataset_pkl_file.py", line 50, in <module>
    generate_pkl_file("trainLabels.csv")
  File "convert_dataset_pkl_file.py", line 29, in generate_pkl_file
    Data, y =load_dir_data(csv_file)
  File "convert_dataset_pkl_file.py", line 24, in load_dir_data
    ret_val=np.array(dataset,dtype=theano.config.floatX)
ValueError: setting an array element with a sequence.


csv文件包含两个字段..图像名称,分类标签在python解释器中运行此命令时,它似乎对我有用.如下.

--------- python解释器输出----------

image=Image.open('sample.jpg').convert('LA')
pixels=[f[0] for f in list(image.getdata())]
dataset=[]
dataset.append(pixels)
dataset.append(pixels)
dataset.append(pixels)
dataset.append(pixels)
dataset.append(pixels)
b=numpy.array(dataset,dtype=theano.config.floatX)
b
array([[ 2.,  0.,  0., ...,  0.,  0.,  0.],
       [ 2.,  0.,  0., ...,  0.,  0.,  0.],
       [ 2.,  0.,  0., ...,  0.,  0.,  0.],
       [ 2.,  0.,  0., ...,  0.,  0.,  0.],
       [ 2.,  0.,  0., ...,  0.,  0.,  0.]])

即使我运行相同的指令集(从逻辑上),当我运行sample.py时,我也会遇到valueError:设置具有序列的数组元素..我试图了解这种行为..任何帮助都将是非常有用的.

解决方案

问题可能与.

您正在尝试创建一个像素值矩阵,每个图像一行.但是每张图像的大小都不同,因此每一行中的像素数也不同.

您不能在numpy中创建锯齿状"的浮点型数组-每行的长度必须相同.

您需要将每行填充到最大图像的长度.

I am trying to build a dataset similar to mnist.pkl.gz provided in theano logistic_sgd.py implementation. Following is my code snippet.

import numpy as np
import csv
from PIL import Image
import gzip, cPickle
import theano
from theano import tensor as T

def load_dir_data(csv_file=""):
    print(" reading: %s" %csv_file)
    dataset=[]
    labels=[]

    cr=csv.reader(open(csv_file,"rb"))
    for row in cr:
        print row[0], row[1]
        try:
            image=Image.open(row[0]+'.jpg').convert('LA')
            pixels=[f[0] for f in list(image.getdata())]
            dataset.append(pixels)
            labels.append(row[1])
            del image
        except:
            print("image not found")
    ret_val=np.array(dataset,dtype=theano.config.floatX)
    return ret_val,np.array(labels).astype(float)


def generate_pkl_file(csv_file=""):
    Data, y =load_dir_data(csv_file)
    train_set_x = Data[:1500]
    val_set_x = Data[1501:1750]
    test_set_x = Data[1751:1900]
    train_set_y = y[:1500]
    val_set_y = y[1501:1750]
    test_set_y = y[1751:1900]
    # Divided dataset into 3 parts. I had 2000 images.

    train_set = train_set_x, train_set_y
    val_set = val_set_x, val_set_y
    test_set = test_set_x, val_set_y

    dataset = [train_set, val_set, test_set]

    f = gzip.open('file.pkl.gz','wb')
    cPickle.dump(dataset, f, protocol=2)
    f.close()


if __name__=='__main__':
    generate_pkl_file("trainLabels.csv")

Error Message:Traceback (most recent call last):

  File "convert_dataset_pkl_file.py", line 50, in <module>
    generate_pkl_file("trainLabels.csv")
  File "convert_dataset_pkl_file.py", line 29, in generate_pkl_file
    Data, y =load_dir_data(csv_file)
  File "convert_dataset_pkl_file.py", line 24, in load_dir_data
    ret_val=np.array(dataset,dtype=theano.config.floatX)
ValueError: setting an array element with a sequence.


csv file contains two fields.. image name, classification labelwhen is run this in python interpreter, it seems to be working for me.. as follows.. I dont get error saying setting an array element with a sequence here..

---------python interpreter output----------

image=Image.open('sample.jpg').convert('LA')
pixels=[f[0] for f in list(image.getdata())]
dataset=[]
dataset.append(pixels)
dataset.append(pixels)
dataset.append(pixels)
dataset.append(pixels)
dataset.append(pixels)
b=numpy.array(dataset,dtype=theano.config.floatX)
b
array([[ 2.,  0.,  0., ...,  0.,  0.,  0.],
       [ 2.,  0.,  0., ...,  0.,  0.,  0.],
       [ 2.,  0.,  0., ...,  0.,  0.,  0.],
       [ 2.,  0.,  0., ...,  0.,  0.,  0.],
       [ 2.,  0.,  0., ...,  0.,  0.,  0.]])

Even though i am running same set of instruction (logically), when i run sample.py, i get valueError: setting an array element with a sequence.. I trying to understand this behavior.. any help would be great..

解决方案

The problem is probably similar to that of this question.

You're trying to create a matrix of pixel values with a row per image. But each image has a different size so the number of pixels in each row is different.

You can't create a "jagged" float typed array in numpy -- every row must be of the same length.

You'll need to pad each row to the length of the largest image.

这篇关于Python Numpy错误:ValueError:使用序列设置数组元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 10:01