本文介绍了为什么Bonobo的CsvReader()方法产生元组而不是字典?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我似乎无法在Bonobo ETL库中获得CsvReader来生成除元组以外的任何内容.该文档似乎表明它应该产生字典而不是元组,但是尝试一下,我似乎无法让它通过除元组以外的任何东西.我真的很想访问附加到每个值的列名.它会引发错误,提示在传递时会出现列名,但是在我定义的transform方法中,只有值本身可用.

I can't seem to get the CsvReader in the Bonobo ETL library to yield anything other than tuples. The documentation seems to indicate that it should be yielding dicts and not tuples but try as I might I can't seem to get it to pass anything other than tuples. I'd really like to have access to the column names attached to each value. It throws an error that suggests the column names are present when passed but in the transform method I have defined, only the values themselves are available.

import bonobo


def printer(*csv):
    print(csv)


def get_graph(**options):
    graph = bonobo.Graph()
    graph.add_chain(
        bonobo.CsvReader('csv.txt'),
        printer
    )
    return graph


def get_services(**options):
    return {}


if __name__ == '__main__':
    parser = bonobo.get_argument_parser()
    with bonobo.parse_args(parser) as options:
        bonobo.run(get_graph(**options), services=get_services(**options))

它与打印机方法的参数有关吗?我知道 * csv 作为参数将可迭代的参数解包,但任何其他可能的参数声明都将引发typeError.

Does it have something to do with the arguments of the printer method? I understand that *csv as the argument unpacks the arguments of an iterable but any other possible declaration of arguments just throws a typeError.

有什么建议吗?最好避免完全使用内置的Bonobo CsvReader,而仅创建使用DictReader或类似方法的提取方法?

Any suggestions? Would it be better to avoid using the built in Bonobo CsvReader completely and just create an extract method that uses DictReader or something?

这是使用 * csv 以外的任何内容作为printer()的参数引发的错误.

Here is the error that gets thrown using anything other than *csv as the argument to printer().

CRIT | 0002 | bonobo.execution.contexts.base←[90m:←[39m←[90m│←[39mTraceback(最近一次通话过去):←[90m│←[39m文件"X:\ Programming \ pyWarehouse \ warehouse_env \ lib \ site-packages \ bonobo \ config \ processors.py",第102行,in 致电←[90m│←[39m边界= self._bind(_input)←[90m│←[39m文件"X:\ Programming \ pyWarehouse \ warehouse_env \ lib \ site-packages \ bonobo \ config \ processors.py",第89行,在_bind←[90m│←[39m return bind(* self.args,* _input,** self.kwargs)←[90m│←[39m文件"C:\ Users \ Accounting Admin \ AppData \ Local \ Programs \ Python \ Python37-32 \ lib \ inspect.py",b行3002印←[90m│←[39m返回args [0] ._ bind(args [1:],kwargs)←[90m│←[39m文件"C:\ Users \ Accounting Admin \ AppData \ Local \ Programs \ Python \ Python37-32 \ lib \ inspect.py",第2923行,位于_系结←[90m│←[39m从无引发TypeError('太多位置参数')←[90m├←[39m←[100m←[97m TypeError←] 39m←[49m←[97mtoo许多位置参数←[39m←[90m│←[39m]上述异常是以下异常的直接原因:←[90m│←[39mTraceback(最近一次通话过去):←[90m│←[39m文件"X:\ Programming \ pyWarehouse \ warehouse_env \ lib \ site-packages \ bonobo \ execution \ contexts \ node.py",行102,循环中←[90m│←[39m self.step()←[90m│←[39m文件"X:\ Programming \ pyWarehouse \ warehouse_env \ lib \ site-packages \ bonobo \ execution \ contexts \ node.py",行132,在步←[90m│←[39m结果= self._stack(input_bag)←[90m│←[39m文件"X:\ Programming \ pyWarehouse \ warehouse_env \ lib \ site-packages \ bonobo \ config \ processors.py",第112行,in 致电摘自←[90m│←[39m))←[90m└←[39m←[100m←[97m bonobo.errors.UnrecoverableTypeError←[39m←[49m←[97m不会绑定到节点签名.精氨酸:()输入:包(id ='1',name ='Alice',age = '20',height = '62',weight ='120.6')夸格斯:{}签名:(csv)←[39m

CRIT|0002|bonobo.execution.contexts.base←[90m:←[39m←[90m│ ←[39mTraceback (most recent call last):←[90m│ ←[39m File "X:\Programming\pyWarehouse\warehouse_env\lib\site-packages\bonobo\config\processors.py", line 102, in call←[90m│ ←[39m bound = self._bind(_input)←[90m│ ←[39m File "X:\Programming\pyWarehouse\warehouse_env\lib\site-packages\bonobo\config\processors.py", line 89, in _bind←[90m│ ←[39m return bind(*self.args, *_input, **self.kwargs)←[90m│ ←[39m File "C:\Users\Accounting Admin\AppData\Local\Programs\Python\Python37-32\lib\inspect.py", line 3002, in bind←[90m│ ←[39m return args[0]._bind(args[1:], kwargs)←[90m│ ←[39m File "C:\Users\Accounting Admin\AppData\Local\Programs\Python\Python37-32\lib\inspect.py", line 2923, in _bind←[90m│ ←[39m raise TypeError('too many positional arguments') from None←[90m├←[39m←[100m←[97m TypeError ←[39m←[49m ←[97mtoo many positional arguments←[39m←[90m│ ←[39mThe above exception was the direct cause of the following exception:←[90m│ ←[39mTraceback (most recent call last):←[90m│ ←[39m File "X:\Programming\pyWarehouse\warehouse_env\lib\site-packages\bonobo\execution\contexts\node.py", line102, in loop←[90m│ ←[39m self.step()←[90m│ ←[39m File "X:\Programming\pyWarehouse\warehouse_env\lib\site-packages\bonobo\execution\contexts\node.py", line132, in step←[90m│ ←[39m results = self._stack(input_bag)←[90m│ ←[39m File "X:\Programming\pyWarehouse\warehouse_env\lib\site-packages\bonobo\config\processors.py", line 112, in call←[90m│ ←[39m )) from exc←[90m└←[39m←[100m←[97m bonobo.errors.UnrecoverableTypeError ←[39m←[49m ←[97mInput of does not bind to the node signature. Args: () Input: Bag(id='1', name='Alice',age='20', height='62', weight='120.6') Kwargs: {} Signature: (csv)←[39m

推荐答案

文档可能存在问题,但是CsvReader实际上由于某种简单的原因而产生了某种元组(实际上,它与namedtuples非常相似):在python3.5中产生dict会导致字段顺序更改,而简单的csvread-> csvwrite会以不可复制的方式更改字段顺序.

There may be an issue with documentation, but the CsvReader is indeed yielding some kind of tuples (in fact, something very similar to namedtuples) for one simple reason: yielding dicts in python3.5 would result in field order change, and a simple csvread->csvwrite would change field order in a non reproductible way.

如果要检索原始"输入(即元组对象,未扩展为args),则可以使用@use_raw_input装饰器.

If you want to retrieve the "raw" input (aka the tuple object, not expanded to args), you can use the @use_raw_input decorator.

from bonobo.config import use_raw_input

@use_raw_input
def some_node(row):
    for f in row._fields:
        ...

如果您知道期望的字段是明确的,则可以使用关键字参数来进行选择.

Another option if you know the expected fields is to be explicit, using keyword arguments.

def some_node(id, name, value):
    ...

希望有帮助.

这篇关于为什么Bonobo的CsvReader()方法产生元组而不是字典?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-12 11:43