问题描述
我正在设法进一步提高我的控制台应用程序的性能(已经完全运行).
I am trying to find way to further improve the performance of my console app (already fully working).
我有一个 CSV 文件,其中包含地址列表(大约 100k).我需要查询一个 Web API,其 POST 响应将是此类地址的地理坐标.然后我准备用地理坐标(经纬度)丰富的地址数据将一个 GeoJSON 文件写入文件系统.
I have a CSV file which contains a list of addresses (about 100k).I need to query a Web API whose POST response would be the geographical coordinates of such addresses. Then I am going to write a GeoJSON file to the file system with the address data enriched with geographical coordinates (latitude and longitude).
我当前的解决方案将数据分成 1000 条记录的批次,并使用 HttpClient(使用 .NET Standard 2.0 的带有控制台应用程序和类库的 .NET core 3.1)向 Web API 发送异步 POST 请求.GeoJSON 是我的 DTO 类.
My current solution splits the data into batches of 1000 records and sends Async POST requests to the Web API using HttpClient (.NET core 3.1 with console app and class library using .NET Standard 2.0).GeoJSON is my DTO class.
public class GeoJSON
{
public string Locality { get; set; }
public string Street { get; set; }
public string StreetNumber { get; set; }
public string ZIP { get; set; }
public string Latitude { get; set; }
public string Longitude { get; set; }
}
public static async Task<List<GeoJSON>> GetAddressesInParallel(List<GeoJSON> geos)
{
//calculating number of batches based on my batchsize (1000)
int numberOfBatches = (int)Math.Ceiling((double)geos.Count() / batchSize);
for (int i = 0; i < numberOfBatches; i++)
{
var currentIds = geos.Skip(i * batchSize).Take(batchSize);
var tasks = currentIds.Select(id => SendPOSTAsync(id));
geoJSONs.AddRange(await Task.WhenAll(tasks));
}
return geoJSONs;
}
我的异步 POST 方法如下所示:
My Async POST method looks like this:
public static async Task<GeoJSON> SendPOSTAsync(GeoJSON geo)
{
string payload = JsonConvert.SerializeObject(geo);
HttpContent c = new StringContent(payload, Encoding.UTF8, "application/json");
using HttpResponseMessage response = await client.PostAsync(URL, c).ConfigureAwait(false);
if (response.IsSuccessStatusCode)
{
var address = JsonConvert.DeserializeObject<GeoJSON>(await response.Content.ReadAsStringAsync());
geo.Latitude = address.Latitude;
geo.Longitude = address.Longitude;
}
return geo;
}
Web API 作为自托管 x86 应用程序在我的本地计算机上运行.整个应用程序在不到 30 秒内结束.最耗时的部分是 Async POST 部分(大约 25 秒).Web API 只为每个帖子获取一个地址,否则我会在一个请求中发送多个地址.
The Web API runs on my local machine as Self Hosted x86 application.The whole application ends in less than 30s.The most time consuming part is the Async POST part (about 25s).The Web API takes only one address for each post, otherwise I'd have sent multiple addresses in one request.
关于如何针对 Web API 提高请求性能的任何想法?
Any ideas on how to improve performance of the request against the Web API?
推荐答案
批处理方法的一个潜在问题是单个延迟响应可能会延迟整个批处理的完成.这可能不是实际问题,因为您正在调用的 Web 服务可能具有非常一致的响应时间,但在任何情况下,您都可以尝试另一种方法,允许在不使用批处理的情况下控制并发性.下面的示例使用 TPL 数据流 库,它内置于 .NET Core 平台中,可用作 用于 .NET Framework 的包:
A potential problem of your batching approach is that a single delayed response may delay the completion of a whole batch. It may not be an actual problem because the web service you are calling may have very consistent response times, but in any case you could try an alternative approach that allows controlling the concurrency without the use of batching. The example bellow uses the TPL Dataflow library, which is built-in the .NET Core platform and available as a package for .NET Framework:
public static async Task<List<GeoJSON>> GetAddressesInParallel(List<GeoJSON> geos)
{
var block = new ActionBlock<GeoJSON>(async item =>
{
await SendPOSTAsync(item);
}, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 1000
});
foreach (var item in geos)
{
await block.SendAsync(item);
}
block.Complete();
await block.Completion;
return geos;
}
您的 SendPOSTAsync
方法只返回作为参数接收的相同 GeoJSON
,因此 GetAddressesInParallel
也可以返回相同的 List
作为参数接收.
Your SendPOSTAsync
method just returns the same GeoJSON
that receives as argument, so the GetAddressesInParallel
can also return the same List<GeoJSON>
that receives as argument.
ActionBlock
是库中可用的最简单的块.它只是为每个项目执行同步或异步操作,允许在其他选项中配置 MaxDegreeOfParallelism
.您还可以尝试将您的工作流程拆分为多个块,然后将它们链接在一起以形成一个管道.例如:
The ActionBlock
is the simplest of the blocks available in the library. It just executes a sync or async action for every item, allowing the configuration of the MaxDegreeOfParallelism
among other options. You could also try splitting your workflow in multiple blocks, and then link them together to form a pipeline. For example:
TransformBlock
将GeoJSON
对象序列化为 JSON.TransformBlock
发出 HTTP 请求.ActionBlock
反序列化 HTTP 响应并使用接收到的值更新GeoJSON
对象.
TransformBlock<GeoJSON, (GeoJSON, string)>
that serializes theGeoJSON
objects to JSON.TransformBlock<(GeoJSON, string), (GeoJSON, string)>
that makes the HTTP requests.ActionBlock<(GeoJSON, string)>
that deserializes the HTTP responses and updates theGeoJSON
objects with the received values.
这样的安排将允许您微调每个块的MaxDegreeOfParallelism
,并有望达到最佳性能.
Such an arrangement would allow you to fine-tune the MaxDegreeOfParallelism
of each block, and hopefully achieve the optimal performance.
这篇关于使用 C# HttpClient 提高 Async Post 的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!