我有几个字符串集合(约100k个元素)。它们是从相同的网络源初始化的:
List<String> source = Arrays.asList("cat Tom", "dog Woof", "bird Carry", "fish Silent", "mice Dr Black");
List<String> list1 = new ArrayList<>(source);
List<String> list2 = new ArrayList<>(source);
List<String> list3 = new ArrayList<>(source);
在他们的一生中,他们可以独立进行修改:
list1.add("raccoon George");
list2.set(2, "bird Frank"); // Different bird
list3.remove(2); // No birds!
这些集合位于不同的JVM中,但是具有相同的(共享的)种子:
String seed = UUID.randomUUID().toString();
我需要一些排序/混洗方法(或Comparator)来随机化这些集合:
magicSort(list1, seed); // JVM #1
magicSort(list2, seed); // JVM #2
magicSort(list3, seed); // JVM #3
要接收一致/可重复的输出,如下所示:
[mice Dr Black, bird Carry, raccoon George, cat Tom, dog Woof, fish Silent]
[mice Dr Black, bird Frank, cat Tom, dog Woof, fish Silent]
[mice Dr Black, cat Tom, dog Woof, fish Silent]
这不起作用:
Collections.shuffle()中的this answer-因为它不提供对具有相同前缀的项目的亲和力;
保持
Map<String, UUID>
中的随机顺序-因为(a)集合的动态性质和(b)分布式设置。任何帮助将不胜感激
编辑#1
这不起作用:
sha1
,md5
和类似的哈希算法-因为它们不提供对具有相同前缀的项目的亲和力-bird Carry
和bird Frank
可以放置得彼此远离。编辑#2
附加条件:具有相同前缀的项目应分组在一起(相似性),例如所有的鸟都放在老鼠和浣熊之间。
最佳答案
非常感谢注释中的哈希提示!
我最终通过伪随机哈希(与Fisher-Yates随机播放类似)重新映射了字符顺序。其余代码是从String.compareTo
复制粘贴的:
private void magicSort(List<String> list, String seed) {
list.sort(new ShuffleComparator(seed));
}
public class ShuffleComparator implements Comparator<String> {
private static final long MAGIC = 0x5DEECE66DL;
private final String seed;
public ShuffleComparator(String seed) {
this.seed = seed;
}
@Override
public int compare(String s1, String s2) {
int len1 = s1.length();
int len2 = s2.length();
int lim = Math.min(len1, len2);
int seedLen = seed.length();
long random = seed.hashCode();
for (int k = 0; k < lim; k++) {
random = random * MAGIC + seed.charAt(k % seedLen);
char c1 = s1.charAt(k);
char c2 = s2.charAt(k);
if (c1 != c2) {
return ((random % (c1 + 0xFFFF)) & 0xFFFF) -
((random % (c2 + 0xFFFF)) & 0xFFFF) > 0 ? 1 : -1;
}
}
return (random > 0 ? 1 : -1) * (len1 - len2);
}
}
结果:
您可以随时对收藏进行随机排序
具有相同前缀的项目被分组在一起(相似性)
JVM之间共享唯一的资源(字符串种子)
对性能的影响可以忽略不计
零GC
例子:
seed: b85a9119-3a98-4491-b503-fc2cfdc344f3
[raccoon George, bird Carry, mice Dr Black, fish Silent, cat Tom, dog Woof]
[bird Frank, mice Dr Black, fish Silent, cat Tom, dog Woof]
[mice Dr Black, fish Silent, cat Tom, dog Woof]
seed: d5378421-d0ea-4b17-80e9-1de6a163bf38
[bird Carry, dog Woof, fish Silent, raccoon George, mice Dr Black, cat Tom]
[bird Frank, dog Woof, fish Silent, mice Dr Black, cat Tom]
[dog Woof, fish Silent, mice Dr Black, cat Tom]
seed: a5f94f60-1207-4f83-9723-bbab5e3b8075
[cat Tom, raccoon George, mice Dr Black, bird Carry, dog Woof, fish Silent]
[cat Tom, mice Dr Black, bird Frank, dog Woof, fish Silent]
[cat Tom, mice Dr Black, dog Woof, fish Silent]
seed: 4b77a61f-c639-42fc-96b2-e1add9f359e9
[bird Carry, raccoon George, cat Tom, mice Dr Black, dog Woof, fish Silent]
[bird Frank, cat Tom, mice Dr Black, dog Woof, fish Silent]
[cat Tom, mice Dr Black, dog Woof, fish Silent]
seed: 2dd78e21-d4ad-4e8a-b139-5d35fe2b0481
[dog Woof, fish Silent, raccoon George, mice Dr Black, bird Carry, cat Tom]
[dog Woof, fish Silent, mice Dr Black, bird Frank, cat Tom]
[dog Woof, fish Silent, mice Dr Black, cat Tom]
编辑#1
对于测试,我还实现了基于哈希的比较器,因此可以在此处发布。万一您不需要亲和力(例如我的情况),但需要纯随机播放:
private void magicSort(List<String> list, String seed) {
list.sort(new HashComparator(seed, "SHA-1"));
}
public class HashComparator implements Comparator<String> {
private static final long MAGIC = 0x5DEECE66DL;
private final ThreadLocal<MessageDigest> messageDigest;
private final byte[] seed;
private final int seedHash;
public HashComparator(String seed, String algorithm) {
this.seed = seed.getBytes(ISO_8859_1);
this.seedHash = seed.hashCode();
this.messageDigest = ThreadLocal.withInitial(() -> {
try {
return MessageDigest.getInstance(algorithm);
} catch (NoSuchAlgorithmException e) {
throw new RuntimeException(e.getMessage(), e);
}
});
}
@Override
public int compare(String s1, String s2) {
if (s1.equals(s2)) {
return 0;
}
int diff = getSignature(s1).compareTo(getSignature(s2));
if (diff != 0) {
return diff;
}
long random = seedHash;
random = random * MAGIC + s1.hashCode();
random = random * MAGIC + s2.hashCode();
return ((random & 1) == 0 ? 1 : -1) * s1.compareTo(s2);
}
private ByteBuffer getSignature(String s) {
MessageDigest digest = messageDigest.get();
digest.reset();
digest.update(seed);
for (int i = 0, size = s.length(); i < size; i++) {
char c = s.charAt(i);
digest.update((byte) (c >> 8));
digest.update((byte) c);
}
return ByteBuffer.wrap(digest.digest());
}
}
结果:
您可以随时对收藏进行随机排序
纯随机播放(前缀相同的项目没有亲和力)
JVM之间共享唯一的资源(字符串种子)
例子:
8d465fde-9310-4b6c-9709-5cc395deb45e
[raccoon George, mice Dr Black, dog Woof, fish Silent, cat Tom, bird Carry]
[bird Frank, mice Dr Black, dog Woof, fish Silent, cat Tom]
[mice Dr Black, dog Woof, fish Silent, cat Tom]
6daa90dd-f7e7-4615-a45c-fefb0de10c20
[bird Carry, mice Dr Black, fish Silent, cat Tom, dog Woof, raccoon George]
[mice Dr Black, bird Frank, fish Silent, cat Tom, dog Woof]
[mice Dr Black, fish Silent, cat Tom, dog Woof]
9f30b259-c8d1-41f5-8221-40b17cb04260
[cat Tom, mice Dr Black, raccoon George, dog Woof, fish Silent, bird Carry]
[bird Frank, cat Tom, mice Dr Black, dog Woof, fish Silent]
[cat Tom, mice Dr Black, dog Woof, fish Silent]
94b6710f-3232-453d-8009-6d81658b2cca
[dog Woof, fish Silent, bird Carry, cat Tom, mice Dr Black, raccoon George]
[dog Woof, fish Silent, cat Tom, mice Dr Black, bird Frank]
[dog Woof, fish Silent, cat Tom, mice Dr Black]
关于java - 如何在分布式系统中以相同的方式随机化两个String集合?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/58920459/