我正在使用适用于Java的DynamoDB SDK在本地Dynamo DB中进行批量写入(约5.5k项)的POC。我知道每个批量写入操作不能超过25个写入操作,因此我将整个数据集分为每个25个项目的块。然后,我将这些块作为Executor框架中的可调用操作传递。不过,由于在100多秒钟内插入了5.5k记录,我的结果仍不令人满意。
我不确定该如何优化。在创建表时,我将WriteCapacityUnit设置为400(不确定我可以提供的最大值是多少),并对其进行了一些尝试,但没有任何区别。我也尝试过更改执行程序中的线程数。
这是执行批量写入操作的主要代码:
public static void main(String[] args) throws Exception {
AmazonDynamoDBClient client = new AmazonDynamoDBClient().withEndpoint("http://localhost:8000");
final AmazonDynamoDB aws = new AmazonDynamoDBClient(new BasicAWSCredentials("x", "y"));
aws.setEndpoint("http://localhost:8000");
JSONArray employees = readFromFile();
Iterator<JSONObject> iterator = employees.iterator();
List<WriteRequest> batchList = new ArrayList<WriteRequest>();
ExecutorService service = Executors.newFixedThreadPool(20);
List<BatchWriteItemRequest> listOfBatchItemsRequest = new ArrayList<>();
while(iterator.hasNext()) {
if (batchList.size() == 25) {
Map<String, List<WriteRequest>> batchTableRequests = new HashMap<String, List<WriteRequest>>();
batchTableRequests.put("Employee", batchList);
BatchWriteItemRequest batchWriteItemRequest = new BatchWriteItemRequest();
batchWriteItemRequest.setRequestItems(batchTableRequests);
listOfBatchItemsRequest.add(batchWriteItemRequest);
batchList = new ArrayList<WriteRequest>();
}
PutRequest putRequest = new PutRequest();
putRequest.setItem(ItemUtils.fromSimpleMap((Map) iterator.next()));
WriteRequest writeRequest = new WriteRequest();
writeRequest.setPutRequest(putRequest);
batchList.add(writeRequest);
}
StopWatch watch = new StopWatch();
watch.start();
List<Future<BatchWriteItemResult>> futureListOfResults = listOfBatchItemsRequest.stream().
map(batchItemsRequest -> service.submit(() -> aws.batchWriteItem(batchItemsRequest))).collect(Collectors.toList());
service.shutdown();
while(!service.isTerminated());
watch.stop();
System.out.println("Total time taken : " + watch.getTotalTimeSeconds());
}
}
这是用于创建dynamoDB表的代码:
public static void main(String[] args) throws Exception {
AmazonDynamoDBClient client = new AmazonDynamoDBClient().withEndpoint("http://localhost:8000");
DynamoDB dynamoDB = new DynamoDB(client);
String tableName = "Employee";
try {
System.out.println("Creating the table, wait...");
Table table = dynamoDB.createTable(tableName, Arrays.asList(new KeySchemaElement("ID", KeyType.HASH)
), Arrays.asList(new AttributeDefinition("ID", ScalarAttributeType.S)),
new ProvisionedThroughput(1000L, 1000L));
table.waitForActive();
System.out.println("Table created successfully. Status: " + table.getDescription().getTableStatus());
} catch (Exception e) {
System.err.println("Cannot create the table: ");
System.err.println(e.getMessage());
}
}
最佳答案
DynamoDB Local是为需要为DynamoDB进行脱机开发的开发人员提供的工具,并非针对规模或性能而设计。因此,它不适用于规模测试,并且如果您需要测试批量负载或其他高速工作负载,则最好使用真实表。在活动表上进行开发人员测试的实际成本通常非常小,因为在测试运行期间仅需要为高容量配置表即可。