我有多个合并并排序的.xlsx银行帐户对帐单,以便手动比较和匹配我的帐户之间进行的任何交易。目的是解析在将所有交易导入到GnuCash中之前进行的所有交易,以便在我的账户记录中不会出现重复的交易记录。

到目前为止,在手动比较合并的.xlsx文件之后,我已经制作了一个bash脚本来解析由我创建的.csv文件。 sed和awk的组合用于生成导入GnuCash时所需的.qif文件。

我希望自动进行交易的手动比较需要一些帮助,因为我发现自己无法使用sed或awk成功解析匹配的交易。

我的银行对帐单格式如下:

Date recorded;Date occurred;Verification number;Memo;Amount;Balance
2014-04-25;2014-04-25;5629374859;MORTGAGE;-5000;3470.69
2014-04-25;2014-04-25;5284690010;SAVINGS;-200;8470.69
2014-04-25;2014-04-25;5284690010;SAVINGS;-1730;8670.69
2014-04-25;2014-04-25;3788765004S;SALARY;10000.69;10400.69
2014-04-24;2014-04-24;5484384195;PHARMACY /14-04-23;-79;400
2014-04-23;2014-04-22;5434473478;GAS STATION/14-04-22;-521;479
2014-04-23;2014-04-22;5487473797;GROCERY STORE/14-04-22;-661;1000


将所有银行对帐单合并并排序后,我添加了带有源银行对帐单帐号的列:

Date recorded;Date occurred;Verification number;Memo;Amount;Balance;Source account
2014-04-25;2014-04-25;5629374859;MORTGAGE;-5000;3470.69;123456789
2014-04-25;2014-04-25;5629374859;MORTGAGE;5000;10000;543219876              # Merged from my second accounts statement
2014-04-25;2014-04-25;5284690010;SAVINGS;-200;8470.69;123456789
2014-04-25;2014-04-25;5284690010;SAVINGS;200;1930;987654321                 # Merged from my third accounts statement
2014-04-25;2014-04-25;5284690010;SAVINGS;-1730;8670.69;123456789
2014-04-25;2014-04-25;5284690010;SAVINGS;1730;1730;987654321                # Merged from my third accounts statement
2014-04-25;2014-04-25;3788765004S;SALARY;10000.69;10400.69;123456789
2014-04-24;2014-04-24;5484384195;PHARMACY /14-04-23;-79;400;123456789
2014-04-23;2014-04-22;5434473478;GAS STATION/14-04-22;-521;479;123456789
2014-04-23;2014-04-22;5487473797;GROCERY STORE/14-04-22;-661;1000;123456789


我需要实现的帮助是使用合并的银行对帐单分析文件,以便找到我的帐户之间的交易。记录日期,发生日期,验证号,备注和金额(在比较两行时不考虑负金额符号)列的任何交易(文件中的行)都应按以下方式处理:1)将源帐户交易行保留在文件中,2 )将带有目标帐户帐号的新列(“目标帐户”)添加到源帐户交易行3)从文件中删除目标帐户交易行。

例如,这是一个匹配项:

Date recorded;Date occurred;Verification number;Memo;Amount;Balance;Source account
2014-04-25;2014-04-25;5629374859;MORTGAGE;-5000;3470.69;123456789           # Source account
2014-04-25;2014-04-25;5629374859;MORTGAGE;5000;10000;543219876              # Destination account


处理了产生交易的这两行后,文件中的输出应为:

Date recorded;Date occurred;Verification number;Memo;Amount;Balance;Source account;Destination account
2014-04-25;2014-04-25;5629374859;MORTGAGE;-5000;3470.69;123456789;543219876


在我的合并银行帐户对帐单示例中的所有交易都已处理之后,最终输出应为包含以下行的文件:

Date recorded;Date occurred;Verification number;Memo;Amount;Balance;Source account;Destination account
2014-04-25;2014-04-25;5629374859;MORTGAGE;-5000;3470.69;123456789;543219876
2014-04-25;2014-04-25;5284690010;SAVINGS;-200;8470.69;123456789;987654321
2014-04-25;2014-04-25;5284690010;SAVINGS;-1730;8670.69;123456789;987654321
2014-04-25;2014-04-25;3788765004S;SALARY;10000.69;10400.69;123456789;
2014-04-24;2014-04-24;5484384195;PHARMACY /14-04-23;-79;400;123456789;
2014-04-23;2014-04-22;5434473478;GAS STATION/14-04-22;-521;479;123456789;
2014-04-23;2014-04-22;5487473797;GROCERY STORE/14-04-22;-661;1000;123456789;


注意:这四笔交易不是我的帐户之间的交易-它们应保留在文件中,并在添加的“目标帐户”列中留空。

2014-04-25;2014-04-25;3788765004S;SALARY;10000.69;10400.69;123456789;
2014-04-24;2014-04-24;5484384195;PHARMACY /14-04-23;-79;400;123456789;
2014-04-23;2014-04-22;5434473478;GAS STATION/14-04-22;-521;479;123456789;
2014-04-23;2014-04-22;5487473797;GROCERY STORE/14-04-22;-661;1000;123456789;


任何使用与我当前bash脚本兼容的工具的解决方案(或者使用pythons pandas库的解决方案?)将不胜感激!

最佳答案

我认为这会处理您更新的问题中描述的交易记录。

它首先从输入的csv文件创建类型为defaultdict(list)的字典,该字典具有基于为事务匹配描述的标准的密钥。具有相同密钥的所有事务都存储在关联的list中。

然后,它以成对方式浏览为每个键收集的交易列表,并从中创建合并的交易记录,并在其中添加了第二笔交易的源帐户中的附加目标帐户字段。然后,将创建的每个合并的事务记录写入输出csv文件。

未配对的事务将简单地合并为具有空目标字段的记录。仅当两个金额的符号不同时,配对的交易才会合并,否则,它们将被视为两个未配对的交易,如前所述。

from collections import defaultdict, namedtuple
import csv
from itertools import imap, izip_longest

# a couple of utility string conversion functions
def rename(name):
    """ Convert csv column name to a valid namedtuple fieldname which must be a
    valid Python identifier. Not exhaustive, but good enough for the headers
    shown (and is reversable, see below).
    """
    return name.lower().replace(' ', '_')

def undo_rename(name):
    """ Convert munged namedtuple fieldname back to a csv column name. """
    return name.replace('_', ' ').capitalize()

banktrans_filename = 'banktrans.csv'
banktrans_merged_filename = 'banktransmerged.csv'
DELIMITER = ';'
matched_trans = defaultdict(list)

with open(banktrans_filename, 'rb') as banktrans_file:
    reader = csv.reader(banktrans_file, delimiter=DELIMITER)
    # create namedtuple fieldnames from csv header row
    fieldnames = [rename(columname) for columname in next(reader)]
    Transaction = namedtuple('Transaction', fieldnames)
    for transact in imap(Transaction._make, reader):
        match_key = (transact.date_recorded, transact.date_occurred,
                     transact.verification_number, transact.memo,
                     # disregard any leading minus sign in amount field
                     transact.amount[transact.amount.startswith('-'):])
        matched_trans[match_key].append(transact)

with open(banktrans_merged_filename, 'wb') as banktrans_merged_file:
    writer = csv.writer(banktrans_merged_file, delimiter=DELIMITER)
    # merged tranactions have an additonal fieldname at the end
    mergedfieldnames = fieldnames + [rename('Destination account')]
    MergedTransaction = namedtuple('MergedTransaction', mergedfieldnames)
    # write header row
    writer.writerow([undo_rename(fieldname) for fieldname in mergedfieldnames])
    # merge pairs of matched transactions
    for match_key, transacts in sorted(matched_trans.items()):
        for trans_pair in izip_longest(*([iter(transacts)]*2)):
            if trans_pair[1] is None:  # unmatched trans, copy & add empty col
                merged_transact = MergedTransaction._make(trans_pair[0] + ('',))
            elif (trans_pair[0].amount.startswith('-') ==
                  trans_pair[1].amount.startswith('-')):  # amts have same sign?
                # records shouldn't be merged, treat as two unmatched trans
                merged_transact = MergedTransaction._make(trans_pair[0] + ('',))
                writer.writerow(merged_transact)
                merged_transact = MergedTransaction._make(trans_pair[1] + ('',))
                writer.writerow(merged_transact)
                continue  # skip remainder of loop
            else:  # merge pair by making source of the second the dest account
                merged_transact = MergedTransaction._make(
                    trans_pair[0] + (trans_pair[1].source_account,))
            writer.writerow(merged_transact)
print('merged transactions saved to file: ' + repr(banktrans_merged_filename))


结果输出文件的内容:

Date recorded;Date occurred;Verification number;Memo;Amount;Balance;Source account;Destination account
2014-04-23;2014-04-22;5434473478;GAS STATION/14-04-22;-521;479;123456789;
2014-04-23;2014-04-22;5487473797;GROCERY STORE/14-04-22;-661;1000;123456789;
2014-04-24;2014-04-24;5484384195;PHARMACY /14-04-23;-79;400;123456789;
2014-04-25;2014-04-25;3788765004S;SALARY;10000.69;10400.69;123456789;
2014-04-25;2014-04-25;5284690010;SAVINGS;-1730;8670.69;123456789;987654321
2014-04-25;2014-04-25;5284690010;SAVINGS;-200;8470.69;123456789;987654321
2014-04-25;2014-04-25;5629374859;MORTGAGE;-5000;3470.69;123456789;543219876

关于python - 比较xlsx或csv文件格式的银行帐户对帐单,以查找任何匹配的交易,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/27851673/

10-16 22:54