本文介绍了为什么Bonobo的CsvReader()方法产生元组而不是字典?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我似乎无法在Bonobo ETL库中获得CsvReader来生成除元组以外的任何内容。该文档似乎表明它应该产生字典而不是元组,但是尝试一下,我似乎无法让它通过除元组以外的任何其他内容。我真的很想访问每个值所附的列名。会引发错误,提示在传递时会出现列名,但是在我定义的转换方法中,只有值本身可用。

I can't seem to get the CsvReader in the Bonobo ETL library to yield anything other than tuples. The documentation seems to indicate that it should be yielding dicts and not tuples but try as I might I can't seem to get it to pass anything other than tuples. I'd really like to have access to the column names attached to each value. It throws an error that suggests the column names are present when passed but in the transform method I have defined, only the values themselves are available.

import bonobo


def printer(*csv):
    print(csv)


def get_graph(**options):
    graph = bonobo.Graph()
    graph.add_chain(
        bonobo.CsvReader('csv.txt'),
        printer
    )
    return graph


def get_services(**options):
    return {}


if __name__ == '__main__':
    parser = bonobo.get_argument_parser()
    with bonobo.parse_args(parser) as options:
        bonobo.run(get_graph(**options), services=get_services(**options))

它与打印机方法的参数有关吗?我知道 * csv 作为参数将可迭代的参数解包,但任何其他可能的参数声明都将引发typeError。

Does it have something to do with the arguments of the printer method? I understand that *csv as the argument unpacks the arguments of an iterable but any other possible declaration of arguments just throws a typeError.

有什么建议吗?最好避免完全使用内置的Bonobo CsvReader,而只创建使用DictReader或类似方法的提取方法?

Any suggestions? Would it be better to avoid using the built in Bonobo CsvReader completely and just create an extract method that uses DictReader or something?

编辑:这是使用 * csv 以外的其他参数作为printer()的参数引发的错误

Here is the error that gets thrown using anything other than *csv as the argument to printer().

CRIT | 0002 | bonobo.execution.contexts.base←[90m:←[39m
←[90m│←[39mTraceback( ):
←[90m│←[39m文件 X:\Programming\pyWarehouse\warehouse_env\lib\site-packages\bonobo\config\processors.py,第102行, i
n 通话
←[90m│←[39m界限= self._bind(_input)
←[90m│←[39m File X:\Programming \pyWarehouse\warehouse_env\lib\site-packages\bonobo\config\processors.py,第89行,在
_bind
中←[90m│←[39m return bind( * self.args,* _ input,** self.kwargs)
←[90m│←[39m文件 C:\Users\Accounting Admin\AppData\Local\Programs\Python\ Python37-32\lib\inspect.py,第3002行,在b
ind
←[90m│←[39m return args [0] ._ bind(args [1:],kwargs)
←[90m│←[39m文件 C:\Users\Accounting Admin\AppData\Local\Programs\Python\Python37-32\lib\inspect .py,第2923行,在_
中,绑定
←[90m│←[39m从无
←[90m├←[39m←[ 100m←[97m TypeError←[39m←[49m←[97mtoo许多位置参数←] [39m
←[90m│←[39m上述异常是以下异常的直接原因:
←[90m│ ←[39mTraceback(最近通话):
←[90m│←[39m File X:\Programming\pyWarehouse\warehouse_env\lib\site-packages\bonobo\execution\ contexts\node.py,行
102,循环中
←[90m│←[39m self.step()
←[90m│←[39m文件 X:\编程\pyWarehouse\warehouse_env\lib\site-packages\bonobo\execution\contexts linenode.py,行
132,在步骤
中←[90m│←[ 39m结果= self._stack(input_bag)
←[90m│←[39m文件 X:\Programming\pyWarehouse\warehouse_env\lib\site-packages\bonobo\config\processors .py,第112行,i
n 通话
←[90m│←[39m))from exc
←[90m└←[39m←[100m← [97m bonobo.errors.UnrecoverableTypeError←[39m←[49 m←[do
的97mInput未绑定到节点签名。
Args:()
输入:Bag(id ='1',name ='Alice',age = '20',height = '62',weight ='120.6')
Kwargs:{}
签名:(csv)←[39m

CRIT|0002|bonobo.execution.contexts.base←[90m:←[39m←[90m│ ←[39mTraceback (most recent call last):←[90m│ ←[39m File "X:\Programming\pyWarehouse\warehouse_env\lib\site-packages\bonobo\config\processors.py", line 102, in call←[90m│ ←[39m bound = self._bind(_input)←[90m│ ←[39m File "X:\Programming\pyWarehouse\warehouse_env\lib\site-packages\bonobo\config\processors.py", line 89, in _bind←[90m│ ←[39m return bind(*self.args, *_input, **self.kwargs)←[90m│ ←[39m File "C:\Users\Accounting Admin\AppData\Local\Programs\Python\Python37-32\lib\inspect.py", line 3002, in bind←[90m│ ←[39m return args[0]._bind(args[1:], kwargs)←[90m│ ←[39m File "C:\Users\Accounting Admin\AppData\Local\Programs\Python\Python37-32\lib\inspect.py", line 2923, in _bind←[90m│ ←[39m raise TypeError('too many positional arguments') from None←[90m├←[39m←[100m←[97m TypeError ←[39m←[49m ←[97mtoo many positional arguments←[39m←[90m│ ←[39mThe above exception was the direct cause of the following exception:←[90m│ ←[39mTraceback (most recent call last):←[90m│ ←[39m File "X:\Programming\pyWarehouse\warehouse_env\lib\site-packages\bonobo\execution\contexts\node.py", line102, in loop←[90m│ ←[39m self.step()←[90m│ ←[39m File "X:\Programming\pyWarehouse\warehouse_env\lib\site-packages\bonobo\execution\contexts\node.py", line132, in step←[90m│ ←[39m results = self._stack(input_bag)←[90m│ ←[39m File "X:\Programming\pyWarehouse\warehouse_env\lib\site-packages\bonobo\config\processors.py", line 112, in call←[90m│ ←[39m )) from exc←[90m└←[39m←[100m←[97m bonobo.errors.UnrecoverableTypeError ←[39m←[49m ←[97mInput of does not bind to the node signature. Args: () Input: Bag(id='1', name='Alice',age='20', height='62', weight='120.6') Kwargs: {} Signature: (csv)←[39m

推荐答案

文档可能存在问题,但CsvReader确实会产生某种元组(实际上,它与namedtuples非常相似)是出于一个简单的原因:在python3.5中产生dict将导致字段顺序更改,而简单的csvread-> csvwrite会更改字段中的字段顺序。

There may be an issue with documentation, but the CsvReader is indeed yielding some kind of tuples (in fact, something very similar to namedtuples) for one simple reason: yielding dicts in python3.5 would result in field order change, and a simple csvread->csvwrite would change field order in a non reproductible way.

如果要检索原始输入(即元组对象,未扩展为args),则可以使用@use_raw_input装饰器。 / p>

If you want to retrieve the "raw" input (aka the tuple object, not expanded to args), you can use the @use_raw_input decorator.

from bonobo.config import use_raw_input

@use_raw_input
def some_node(row):
    for f in row._fields:
        ...

如果您知道期望的字段是明确的,则使用关键字参数作为另一种选择。

Another option if you know the expected fields is to be explicit, using keyword arguments.

def some_node(id, name, value):
    ...

希望有帮助。

这篇关于为什么Bonobo的CsvReader()方法产生元组而不是字典?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-12 11:43