我的脚本输出为年份,以及该年份的一篇文章的字数:
abcd
2013
118
2014
23
xyz
2013
1
2014
45
我想将每年作为新列添加到仅包含单词的现有数据框中。
预期产量:
Terms 2013 2014 2015
abc 118 76 90
xyz 23 0 36
我的脚本的输入是一个csv文件:
Terms
xyz
abc
efg
我写的脚本是:
df = pd.read_csv('a.csv', header = None)
for row in df.itertuples():
term = (str(row[1]))
u = "http: term=%s&mindate=%d/01/01&maxdate=%d/12/31"
print(term)
startYear = 2013
endYear = 2018
for year in range(startYear, endYear+1):
url = u % (term.replace(" ", "+"), year, year)
page = urllib.request.urlopen(url).read()
doc = ET.XML(page)
count = doc.find("Count").text
print(year)
print(count)
df.head
是: 0
0 1,2,3-triazole
1 16s rrna gene amplicons
任何帮助将不胜感激,在此先感谢!
最佳答案
这样的事情应该做到这一点:
#!/usr/bin/env python
def mkdf(filename):
def combine(term, l):
d = {"term": term}
d.update(dict(zip(l[::2], l[1::2])))
return d
term = None
other = []
with open(filename) as I:
n = 0
for line in I:
line = line.strip()
try:
int(line)
except Exception as e:
# not an int
if term: # if we have one, create the record
yield combine(term, other)
term = line
other = []
n = 0
else:
if n > 0:
other.append(line)
n += 1
# and the last one
yield combine(term, other)
if __name__ == "__main__":
import pandas as pd
import sys
df = pd.DataFrame([r for r in mkdf(sys.argv[1])])
print(df)
用法:python scriptname.py / tmp / IN(或其他包含您数据的文件)
输出:
2013 2014 term
0 118 23 abcd
1 1 45 xyz
关于python - 在pandas数据框中写入行并将其附加到现有数据框中,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/50965980/