

我正在使用Mike Bostock的库来过滤和排序大型数据集。我的问题:鉴于具有多个维度的数据集,我如何一次对多个维度进行排序?

I'm using Mike Bostock's crossfilter library to filter and sort large datasets. My problem: Given a dataset with multiple dimensions, how can I sort on more than one dimension at a time?


    { cat: "A", val:1 },
    { cat: "B", val:2 },
    { cat: "A", val:11 },
    { cat: "B", val:5 },
    { cat: "A", val:3 },
    { cat: "B", val:2 },
    { cat: "A", val:11 },
    { cat: "B", val:100 }

所需输出的示例,按 cat,val (升序):

Example of desired output, sorting by cat, val (ascending):

    { cat: "A", val:1 },
    { cat: "A", val:3 },
    { cat: "A", val:11 },
    { cat: "A", val:11 },
    { cat: "B", val:2 },
    { cat: "B", val:2 },
    { cat: "B", val:5 },
    { cat: "B", val:100 }


The approach I've used thus far is to use string concatenation on the desired dimensions:

var combos = cf.dimension(function(d) { return d.cat + '|' + d.val; });

这适用于多个基于字符串的维度,但不适用于数字维度,因为它是不是一种自然的排序('4'> '11')。我想我可以在数字上使用零填充来完成这项工作,但是对于大型数据集来说这可能会变得昂贵,所以我宁愿避免使用它。 有没有其他方法可以在这里工作,使用crossfilter?

This works fine with multiple string-based dimensions, but won't work with numeric dimensions, as it's not a natural sort ('4' > '11' ). I think I could make this work with zero-padding on the numbers, but this could get expensive for a large dataset, so I'd prefer to avoid it. Is there another way that might work here, using crossfilter?


Bonus points for any solution that allows different dimensions to have different sort directions (ascending/descending).

澄清:是的,我可能需要切换到原生 Array.sort 实施。但是使用crossfilter的重点在于它非常非常快,特别是对于大型数据集,它以一种使重复排序更快的方式缓存排序顺序。所以我真的在这里寻找一个基于crossfilter的答案。

Clarification: Yes, I may need to switch to a native Array.sort implementation. But the whole point of using crossfilter is that it's very, very fast, especially for large datasets, and it caches sort order in a way that makes repeated sorts even faster. So I'm really looking for a crossfilter-based answer here.



Here's what I ended up doing:

  • 我仍然在单个新维度上使用字符串连接,但是

  • 我将度量转换为使用crossfilter获取最小值/最大值之前的正数,可比较的十进制数:

  • I still use string concatenation on a single new dimension, but
  • I convert the measure to a positive, comparable decimal before turning it into a string, using crossfilter to get the min/max:

var vals = cf.dimension(function(d) { return d.val }),
    min = vals.bottom(1)[0].val,
    offset =  min < 0 ? Math.abs(min) : 0,
    max = vals.top(1)[0].val + offset,
    valAccessor = function(d) {
        // offset ensures positive numbers, fraction ensures sort order
        return ((d.val + offset) / max).toFixed(8);
    combos = cf.dimension(function(d) { 
        return d.cat + '|' + valAccessor(d); 


这样做的好处是可以正确处理负数 - 据我所知,零填充是不可能的。它似乎同样快。缺点是它需要在数字列上创建一个新维度,但在我的情况下,我通常要求在任何情况下。

This has the advantage of handling negative numbers properly - not possible with zero-padding, as far as I can tell. It seems to be just as fast. The downside is that it requires creating a new dimension on the numeric column, but in my case I usually require that in any case.


09-21 06:24