问题描述
我正在使用Django 1.9。我有一个Django表,它按组织按月按原始值和百分位数表示特定度量的值:
class MeasureValue(models.Model):
org = models.ForeignKey(Org,null = True,blank = True)
month = models.DateField()
calc_value = models.FloatField(null = True,blank = True)
percentile = models.FloatField(null = True,blank = True)
通常每月大约有10,000个。我的问题是我是否可以加快在模型上设置值的过程。
当前,我通过使用Django过滤查询检索一个月的所有度量值,将其转换为熊猫数据框,然后使用scipy的 rankdata 设置等级和百分位数。我这样做是因为pandas和 rankdata
是高效的,能够忽略空值,并且能够以我想要的方式处理重复的值,所以我对此方法感到满意:
记录= MeasureValue.objects.filter(month = month).values()
df = pd.DataFrame .from_records(records)
//使用calc_value设置每行的百分位数,使用scipy的rankdata
i,df.iterrows( ):
mv = MeasureValue.objects.get(org = row.org,month = month)
如果(row.percentile为None)或np.isnan(row.percentile):
row.percentile =无
mv.percentile = row.percentile
mv.save()
这毫不奇怪地很慢。是否有任何有效的Django方法来提高速度,只需编写一个数据库即可,而不是成千上万个?我已经检查了文档,但看不到一个。
原子事务可以减少在循环中花费的时间:
<$来自django.db的p $ p>
导入带有事务.atomic()的事务
,其中$ i $,$ d $。 mv = MeasureValue.objects.get(org = row.org,month = month)
if(row.percentile为None)或np.isnan(row.percentile):
#if它已经是None,为什么将其设置为None?
row.percentile =无
mv.percentile = row.percentile
mv.save()
Django的默认行为是在自动提交模式下运行。除非事务处于活动状态,否则每个查询都会立即提交到数据库。
通过和transaction.atomic()所有插入都分组为一个事务。提交事务所需的时间在所有随附的插入语句中摊销,因此每个插入语句的时间大大减少。
I am using Django 1.9. I have a Django table that represents the value of a particular measure, by organisation by month, with raw values and percentiles:
class MeasureValue(models.Model):
org = models.ForeignKey(Org, null=True, blank=True)
month = models.DateField()
calc_value = models.FloatField(null=True, blank=True)
percentile = models.FloatField(null=True, blank=True)
There are typically 10,000 or so per month. My question is about whether I can speed up the process of setting values on the models.
Currently, I calculate percentiles by retrieving all the measurevalues for a month using a Django filter query, converting it to a pandas dataframe, and then using scipy's rankdata
to set ranks and percentiles. I do this because pandas and rankdata
are efficient, able to ignore null values, and able to handle repeated values in the way that I want, so I'm happy with this method:
records = MeasureValue.objects.filter(month=month).values()
df = pd.DataFrame.from_records(records)
// use calc_value to set percentile on each row, using scipy's rankdata
However, I then need to retrieve each percentile value from the dataframe, and set it back onto the model instances. Right now I do this by iterating over the dataframe's rows, and updating each instance:
for i, row in df.iterrows():
mv = MeasureValue.objects.get(org=row.org, month=month)
if (row.percentile is None) or np.isnan(row.percentile):
row.percentile = None
mv.percentile = row.percentile
mv.save()
This is unsurprisingly quite slow. Is there any efficient Django way to speed it up, by making a single database write rather than tens of thousands? I have checked the documentation, but can't see one.
解决方案 Atomic transactions can reduce the time spent in the loop:
from django.db import transaction
with transaction.atomic():
for i, row in df.iterrows():
mv = MeasureValue.objects.get(org=row.org, month=month)
if (row.percentile is None) or np.isnan(row.percentile):
# if it's already None, why set it to None?
row.percentile = None
mv.percentile = row.percentile
mv.save()
Django’s default behavior is to run in autocommit mode. Each query is immediately committed to the database, unless a transaction is actives.
By using with transaction.atomic()
all the inserts are grouped into a single transaction. The time needed to commit the transaction is amortized over all the enclosed insert statements and so the time per insert statement is greatly reduced.
这篇关于在Django中一次更新多个对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!