本文介绍了如何在多个脚本的批处理中使用 Roslyn C# 脚本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写多线程解决方案,用于将数据从不同来源传输到中央数据库.解决方案,一般来说,分为两部分:

I am writing multi-threaded solution that will be used for transferring data from different sources to a central database. Solution, in general, has two parts:

  1. 单线程导入引擎
  2. 在线程中调用导入引擎的多线程客户端.

为了尽量减少自定义开发,我使用 Roslyn 脚本.此功能通过导入引擎项目中的 Nuget 包管理器启用.每次导入都被定义为将输入表(具有输入字段集合)转换为目标表(再次具有目标字段集合).

In order to minimize custom development I am using Roslyn scripting. This feature is enabled with Nuget Package manager in Import engine project.Every import is defined as transformation of input table – that has collection of input fields – to destination table – again with collection of destination fields.

此处使用脚本引擎来允许输入和输出之间的自定义转换.对于每个输入/输出对,都有带有自定义脚本的文本字段.下面是用于脚本初始化的简化代码:

Scripting engine is used here to allow custom transformation between input and output. For every input/output pair there is text field with custom script. Here is simplified code used for script initialization:

//Instance of class passed to script engine
_ScriptHost = new ScriptHost_Import();

if (Script != "") //Here we have script fetched from DB as text
{
  try
  {
    //We are creating script object …
    ScriptObject = CSharpScript.Create<string>(Script, globalsType: typeof(ScriptHost_Import));
    //… and we are compiling it upfront to save time since this might be invoked multiple times.
    ScriptObject.Compile();
    IsScriptCompiled = true;
  }
  catch
  {
    IsScriptCompiled = false;
  }
}

稍后我们将调用此脚本:

Later we will invoke this script with:

async Task<string> RunScript()
{
    return (await ScriptObject.RunAsync(_ScriptHost)).ReturnValue.ToString();
}

因此,在导入定义初始化之后,我们可能有任意数量的输入/输出对描述以及脚本对象,在定义脚本的情况下,每对内存占用增加大约 50 MB.类似的使用模式适用于在将目标行存储到数据库之前对其进行验证(每个字段可能有多个用于检查数据有效性的脚本).

So, after import definition initialization, where we might have any number of input/output pair description along with script object, memory foot print increases approximately 50 MB per pair where scripting is defined.Similar usage pattern is applied to validation of destination rows before storing it to a DB (every field might have several scripts that are used to check validity of data).

总而言之,适度转换/验证脚本的典型内存占用为每个线程 200 MB.如果我们需要调用多个线程,内存使用量会非常高,99% 将用于脚本编写.如果导入引擎包含在基于 WCF 的中间层中(我这样做了),我们很快就会发现内存不足"问题.

All in all, typical memory footprint with modest transformation/validation scripting is 200 MB per thread. If we need to invoke several threads, memory usage will be very high and 99% will be used for scripting.If Import engine is enclosed in WCF based middle layer (which I did) quickly we stumble upon "Insufficient memory" problem.

明显的解决方案是让一个脚本实例根据需要(输入/输出转换、验证或其他)以某种方式将代码执行分派到脚本内的特定函数.IE.我们将使用 SCRIPT_ID 作为全局参数传递给脚本引擎,而不是每个字段的脚本文本.在脚本中的某个地方,我们需要切换到将执行并返回适当值的代码的特定部分.

Obvious solution would be to have one scripting instance that would somehow dispatch code execution to specific function inside the script depending on the need (input/output transformation, validation or something else). I.e. instead of script text for every field we will have SCRIPT_ID that will be passed as global parameter to script engine. Somewhere in script we need to switch to specific portion of code that would execute and return appropriate value.

这种解决方案的好处应该是更好的内存使用.缺点是脚本维护从使用它的特定点删除.

Benefit of such solution should be considerably better memory usage. Drawback the fact that script maintenance is removed from specific point where it is used.

在实施此更改之前,我想听听有关此解决方案的意见以及对不同方法的建议.

Before implementing this change, I would like to hear opinions about this solution and suggestions for different approach.

推荐答案

看起来 - 使用脚本执行任务可能是一种浪费 - 您使用了许多应用程序层并且内存已满.

As it seems - using scripting for the mission might be a wasteful overkill - you use many application layers and the memory gets full.

其他解决方案:

  • 您如何与数据库交互?您可以根据自己的需要操作查询本身,而无需为此编写整个脚本.
  • 如何使用泛型?有足够的 T 来满足您的需求:

  • How do you interface with the DB? you can manipulate the query itself according to your needs instead of writing a whole script for that.
  • How about using Generics? with enough T's to fit your needs:

公共类 ImportEngine

使用元组(这很像使用泛型)

Using Tuples (which is pretty much like using generics)

但是如果您仍然认为脚本是适合您的工具,我发现可以通过在您的应用程序中运行脚本工作来降低脚本的内存使用量(而不是使用 RunAsync),您可以这样做从 RunAsync 中提取逻辑,并重新使用它,而不是在 RunAsync 里面做繁重和浪费内存的工作.下面是一个例子:

But if you still think scripts is the right tool for you, I found that the memory usage of scripts can be lowered by running the script work inside your application, (and not with RunAsync), you can do this be getting back from RunAsync the logic, and re-use it, instead of doing the work inside the heavy and memory wasteful RunAsync. Here is an example:

而不是简单的(脚本字符串):

Instead of simply (the script string):

DoSomeWork();

您可以这样做(IHaveWork 是您应用程序中定义的一个接口,只有一个方法 Work):

You can do this (IHaveWork is an interface defined in you app, with only one method Work):

public class ScriptWork : IHaveWork
{
    Work()
    {
        DoSomeWork();
    }
}
return new ScriptWork();

通过这种方式,您只在短时间内调用繁重的 RunAsync,它会返回一个可以在您的应用程序中重用的工作程序(您当然可以通过向 Work 方法添加参数并从您的应用程序继承逻辑来扩展它)应用程序等等...)

This way you call the heavy RunAsync only for short period, and it is returning a worker that you can re-use inside your application (and you can of course extend this by adding parameters to the Work method and inherit logic from your application and so on...).

该模式还打破了您的应用程序和脚本之间的隔离,因此您可以轻松地从脚本中提供和获取数据.

The pattern also breaking the isolation between your app and the script, so you can easily give and get data from the script.

一些快速基准:

此代码:

    static void Main(string[] args)
    {
        Console.WriteLine("Compiling");
        string code = "System.Threading.Thread.SpinWait(100000000);  System.Console.WriteLine(\" Script end\");";
        List<Script<object>> scripts = Enumerable.Range(0, 50).Select(num =>
             CSharpScript.Create(code, ScriptOptions.Default.WithReferences(typeof(Control).Assembly))).ToList();

        GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced); // for fair-play

        for (int i = 0; i < 10; i++)
            Task.WaitAll(scripts.Select(script => script.RunAsync()).ToArray());
    }

在我的环境中消耗大约 600MB(只是在 ScriptOption 中引用了 System.Windows.Form 来调整脚本的大小).它重用了 Script - 它不会在第二次调用 RunAsync 时消耗更多内存.

Consumes about ~600MB in my environment (just referenced the System.Windows.Form in the ScriptOption for sizing the scripts).It reuse the Script<object> - it's not consuming more memory on second call to RunAsync.

但我们可以做得更好:

    static void Main(string[] args)
    {
        Console.WriteLine("Compiling");
        string code = "return () => { System.Threading.Thread.SpinWait(100000000);  System.Console.WriteLine(\" Script end\"); };";

        List<Action> scripts = Enumerable.Range(0, 50).Select(async num =>
            await CSharpScript.EvaluateAsync<Action>(code, ScriptOptions.Default.WithReferences(typeof(Control).Assembly))).Select(t => t.Result).ToList();

        GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);

        for (int i = 0; i < 10; i++)
            Task.WaitAll(scripts.Select(script => Task.Run(script)).ToArray());
    }

在这个脚本中,我简化了我提出的返回 Action 对象的解决方案,但我认为性能影响很小(但在实际实现中,我真的认为你应该使用自己的接口使其灵活).

In this script, I'm simplifying a bit the solution I proposed to returning Action object, but i think the performance impact is small (but on real implementations I really think you should use your own interface to make it flexible).

当脚本运行时,您可以看到内存急剧增加到 ~240MB,但是在我调用垃圾收集器之后(出​​于演示目的,我对之前的代码也做了同样的事情),内存使用量又回落了到~30MB.它也更快.

When the script is running, you can see a steep rise in memory to ~240MB, but after I'm calling the garbage collector (for demonstration purpose, and I did the same on the previous code) the memory usage drops back to ~30MB. It also faster.

这篇关于如何在多个脚本的批处理中使用 Roslyn C# 脚本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-03 05:01