我是Python和Pandas的新手,所以如果有人可以帮助我,我将非常高兴。我的问题如下:

如果我有一个.txt文件,其中包含一组作为字符串的反应(R1,R2 ...)。每个反应都有化合物(A,B,C,D ...),它们的化学计量系数分别为(1、2、3 ...),例如:

R1: A + 2B + C <=> D

R2: A + B <=> C

我该如何在python中以化学计量矩阵的形式创建数据框(化合物以行X反应以列的形式),如下所示:

  R1 R2
A -1 -1
B -2 -1
C -1  1
D  1  0


观察:方程式左侧的化合物应具有负化学计量值,而右侧的化合物应为正化学计量值

谢谢= D

最佳答案

尝试这个:

import pandas as pd
import re  # regular expressions

def coeff_comp(s):
    # Separate stoichiometric coefficient and compound
    result = re.search('(?P<coeff>\d*)(?P<comp>.*)', s)
    coeff = result.group('coeff')
    comp = result.group('comp')
    if not coeff:
        coeff = '1'                          # coefficient=1 if it is missing
    return comp, int(coeff)

equations = ['R1: A + 2B + C <=> D', 'R2: A + B <=> C']  # some test data
reactions_dict = {}                          # results dictionary

for equation in equations:
    compounds = {}                           # dict -> compound: coeff
    eq = equation.replace(' ', '')
    r_id, reaction = eq.split(':')           # separate id from chem reaction
    lhs, rhs = reaction.split('<=>')         # split left and right hand side
    reagents = lhs.split('+')                # get list of reagents
    products = rhs.split('+')                # get list of products
    for reagent in reagents:
        comp, coeff = coeff_comp(reagent)
        compounds[comp] = - coeff            # negative on lhs
    for product in products:
        comp, coeff = coeff_comp(product)
        compounds[comp] = coeff              # positive on rhs
    reactions_dict[r_id] = compounds

# insert dict into DataFrame, replace NaN with 0, let values be int
df = pd.DataFrame(reactions_dict).fillna(value=0).astype(int)


输出看起来像

   R1  R2
A  -1  -1
B  -2  -1
C  -1   1
D   1   0

08-20 02:36