问题描述
我有一个数据框,每个人都有多个记录.我想在python中为每个人枚举序列中的记录.基本上我想在下表中创建序列"列:
I have a dataframe of individuals who each have multiple records. I want to enumerate the record in the sequence for each individual in python. Essentially I would like to create the 'sequence' column in the following table:
patient date sequence
145 20Jun2009 1
145 24Jun2009 2
145 15Jul2009 3
582 09Feb2008 1
582 21Feb2008 2
987 14Mar2010 1
987 02May2010 2
987 12May2010 3
这与 here,但我在 python 中工作并且无法实现 sql 解决方案.我怀疑我可以使用带有可迭代计数的 groupby 语句,但到目前为止还没有成功.谢谢!
This is essentially the same question as here, but I am working in python and unable to implement the sql solution. I suspect I can use a groupby statement with an iterable count, but have so far been unsuccessful. Thanks!
推荐答案
问题是如何对多列数据进行排序.
The question is how do I sort on multiple columns of data.
一个简单的技巧是使用 key
参数到 排序 函数.
One simple trick is to use the key
parameter to the sorted function.
您将按由数组的列构建的字符串进行排序.
You'll be sorting by a string built from the columns of the array.
rows = ...# your source data
def date_to_sortable_string(date):
# use datetime package to convert string to sortable date.
pass
# Assume x[0] === patient_id and x[1] === encounter date
# Sort by patient_id and date
rows_sorted = sorted(rows, key=lambda x: "%0.5d-%s" % (x[0], date_to_sortable_string(x[1])))
for row in rows_sorted:
print row
这篇关于使用python为组中的每个元素添加序列号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!