问题描述
我有不同长度的时间序列数据.我想基于DTW距离进行聚类,但是找不到与此有关的蚂蚁库. sklearn
给出直截了当的错误,而tslearn kmeans给出了错误的答案.
I have time series data of different length of series. I want to cluster based upon DTW distance but could not find ant library regarding it. sklearn
give straight error while tslearn kmeans gave wrong answer.
我的问题是解决是否用零填充,但不确定在群集时填充时间序列数据是否正确.
My problem is solving if I pad it with zeros but I am not sure if this is correct to pad time-series data while clustering.
欢迎提出有关时间序列数据的其他聚类技术的建议.
The suggestion about other clustering technique about time series data are welcomed.
max_length = 0
for i in train_1:
if(len(i)>max_length):
max_length = len(i)
print(max_length)
train_1 = sequence.pad_sequences(train_1, maxlen=max_length)
km3 = TimeSeriesKMeans(n_clusters = 4, metric="dtw",verbose = False,random_state = 0).fit(train_1)
print(km3.labels_)
推荐答案
您可以尝试定制k-means(聚类算法)或其他方法.可以在sklearn库中轻松获得源代码.填充实际上不是一个很好的选择,因为它将改变问题本身.您也可以使用tslearn和pyclustering(用于最佳聚类)作为替代方案,但请记住使用DTW距离而不是欧几里得距离.
You can try custom made k-means(clustering algorithm) or other. Source code is easily available at the sklearn library. Padding is really not a great option as it will change the question problem itself. You can also use tslearn and pyclustering(for optimal clusters) as an alternative, but remember to use DTW distance rather than Euclidean distance.
这篇关于聚类不同长度的时间序列数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!