我的脚本输出为年份,以及该年份的一篇文章的字数:

abcd
2013
118
2014
23
xyz
2013
1
2014
45


我想将每年作为新列添加到仅包含单词的现有数据框中。

预期产量:

Terms 2013  2014  2015
abc   118   76    90
xyz   23    0     36


我的脚本的输入是一个csv文件:

Terms
xyz
abc
efg


我写的脚本是:

df = pd.read_csv('a.csv', header = None)

for row in df.itertuples():
    term = (str(row[1]))
    u = "http: term=%s&mindate=%d/01/01&maxdate=%d/12/31"
    print(term)
    startYear = 2013
    endYear = 2018

for year in range(startYear, endYear+1):
    url = u % (term.replace(" ", "+"), year, year)
    page = urllib.request.urlopen(url).read()
    doc = ET.XML(page)
    count = doc.find("Count").text
    print(year)
    print(count)


df.head是:

                         0
0           1,2,3-triazole
1  16s rrna gene amplicons


任何帮助将不胜感激,在此先感谢!

最佳答案

这样的事情应该做到这一点:

#!/usr/bin/env python

def mkdf(filename):
    def combine(term, l):
        d = {"term": term}
        d.update(dict(zip(l[::2], l[1::2])))
        return d

    term = None
    other = []
    with open(filename) as I:
        n = 0
        for line in I:
            line = line.strip()
            try:
                int(line)
            except Exception as e:
                # not an int
                if term:    # if we have one, create the record
                     yield combine(term, other)

                term = line
                other = []
                n = 0
            else:
                if n > 0:
                    other.append(line)
            n += 1

        # and the last one
        yield combine(term, other)

if __name__ == "__main__":
    import pandas as pd
    import sys

    df = pd.DataFrame([r for r in mkdf(sys.argv[1])])
    print(df)


用法:python scriptname.py / tmp / IN(或其他包含您数据的文件)

输出:

  2013 2014  term
0  118   23  abcd
1    1   45   xyz

关于python - 在pandas数据框中写入行并将其附加到现有数据框中,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/50965980/

10-09 18:16
查看更多