问题描述
我前段时间偶然发现了 node.js 并且非常喜欢它.但很快我发现它严重缺乏执行 CPU 密集型任务的能力.所以,我开始使用谷歌搜索并得到这些解决问题的答案:Fibers、Webworkers 和 Threads (thread-a-gogo).现在使用哪个是一个混乱,其中一个肯定需要使用 - 毕竟拥有一个只擅长 IO 而没有别的服务器的目的是什么?需要建议!
I stumbled over node.js sometime ago and like it a lot. But soon I found out that it lacked badly the ability to perform CPU-intensive tasks. So, I started googling and got these answers to solve the problem: Fibers, Webworkers and Threads (thread-a-gogo). Now which one to use is a confusion and one of them definitely needs to be used - afterall what's the purpose of having a server which is just good at IO and nothing else? Suggestions needed!
更新:
我在想一个方法来晚了;只需要对此提出建议.现在,我想到的是:让我们有一些线程(使用 thread_a_gogo 或 webworkers).现在,当我们需要更多时,我们可以创造更多.但是在创建过程中会有一些限制.(不是系统暗示的,但可能是因为开销).现在,当我们超过限制时,我们可以 fork 一个新节点,并开始在它上面创建线程.这样,它可以一直持续到我们达到某个极限(毕竟,进程也有很大的开销).当达到这个限制时,我们开始排队任务.每当一个线程空闲时,它就会被分配一个新任务.这样才能顺利进行.
I was thinking of a way off-late; just needing suggestions over it. Now, what I thought of was this: Let's have some threads (using thread_a_gogo or maybe webworkers). Now, when we need more of them, we can create more. But there will be some limit over the creation process. (not implied by the system but probably because of overhead). Now, when we exceed the limit, we can fork a new node, and start creating threads over it. This way, it can go on till we reach some limit (after all, processes too have a big overhead). When this limit is reached, we start queuing tasks. Whenever a thread becomes free, it will be assigned a new task. This way, it can go on smoothly.
所以,这就是我的想法.这个主意好不好?我对所有这些过程和线程的东西都有些陌生,所以在这方面没有任何专业知识.请分享您的意见.
So, that was what I thought of. Is this idea good? I am a bit new to all this process and threads stuff, so don't have any expertise in it. Please share your opinions.
谢谢.:)
推荐答案
Node 有一个完全不同的范式,一旦它被正确捕获,就更容易看到这种不同的解决问题的方式.您永远不需要在 Node 应用程序中使用多个线程 (1),因为您有不同的方式来做同样的事情.您创建多个进程;但它与例如 Apache Web Server 的 Prefork mpm 的工作方式非常不同.
Node has a completely different paradigm and once it is correctly captured, it is easier to see this different way of solving problems. You never need multiple threads in a Node application(1) because you have a different way of doing the same thing. You create multiple processes; but it is very very different than, for example how Apache Web Server's Prefork mpm does.
现在,让我们假设我们只有一个 CPU 内核,我们将开发一个应用程序(以 Node 的方式)来做一些工作.我们的工作是处理一个大文件,它的内容逐字节地运行.对于我们的软件来说,最好的方式是从文件的开头开始工作,一个字节一个字节地执行到最后.
For now, let's think that we have just one CPU core and we will develop an application (in Node's way) to do some work. Our job is to process a big file running over its contents byte-by-byte. The best way for our software is to start the work from the beginning of the file, follow it byte-by-byte to the end.
-- 嘿,哈桑,我想你要么是我祖父时代的新手,要么是非常老的学校!!!为什么不创建一些线程并使其更快?
-- Hey, Hasan, I suppose you are either a newbie or very old school from my Grandfather's time!!! Why don't you create some threads and make it much faster?
-- 哦,我们只有一个 CPU 内核.
-- Oh, we have only one CPU core.
——那又怎样?创建一些线程,让它更快!
-- So what? Create some threads man, make it faster!
-- 它不是那样工作的.如果我创建线程,我会让它变慢.因为我将为系统增加大量开销以在线程之间切换,试图给它们足够的时间,并在我的进程中尝试在这些线程之间进行通信.除了所有这些事实之外,我还必须考虑如何将单个工作分成可以并行完成的多个部分.
-- It does not work like that. If I create threads I will be making it slower. Because I will be adding a lot of overhead to the system for switching between threads, trying to give them a just amount of time, and inside my process, trying to communicate between these threads. In addition to all these facts, I will also have to think about how I will divide a single job into multiple pieces that can be done in parallel.
--好吧好吧,我看你很穷.让我们用我的电脑吧,它有 32 个内核!
-- Okay okay, I see you are poor. Let's use my computer, it has 32 cores!
-- 哇,亲爱的朋友,你真棒,非常感谢.我很感激!
-- Wow, you are awesome my dear friend, thank you very much. I appreciate it!
然后我们回去工作.多亏了这位富有的朋友,现在我们有了 32 个 cpu 内核.我们必须遵守的规则刚刚改变.现在我们想利用我们所获得的所有财富.
Then we turn back to work. Now we have 32 cpu cores thanks to our rich friend. Rules we have to abide have just changed. Now we want to utilize all this wealth we are given.
要使用多个内核,我们需要找到一种方法将我们的工作分成可以并行处理的部分.如果不是 Node,我们会为此使用线程;32 个线程,每个 CPU 核心一个.但是,由于我们有 Node,我们将创建 32 个 Node 进程.
To use multiple cores, we need to find a way to divide our work into pieces that we can handle in parallel. If it was not Node, we would use threads for this; 32 threads, one for each cpu core. However, since we have Node, we will create 32 Node processes.
线程可以很好地替代 Node 进程,甚至可能是更好的方法;但仅限于已经定义了工作并且我们可以完全控制如何处理它的特定类型的工作.除此之外,对于所有其他类型的工作来自外部的问题,我们无法控制并希望尽快回答,Node 的方式无疑是优越的.
Threads can be a good alternative to Node processes, maybe even a better way; but only in a specific kind of job where the work is already defined and we have complete control over how to handle it. Other than this, for every other kind of problem where the job comes from outside in a way we do not have control over and we want to answer as quickly as possible, Node's way is unarguably superior.
-- 嘿,Hasan,你还在单线程工作吗?你怎么了,伙计?我刚刚给了你你想要的.你再也没有借口了.创建线程,让它运行得更快.
-- Hey, Hasan, are you still working single-threaded? What is wrong with you, man? I have just provided you what you wanted. You have no excuses anymore. Create threads, make it run faster.
-- 我已将工作分成几部分,每个流程都将并行处理其中一个部分.
-- I have divided the work into pieces and every process will work on one of these pieces in parallel.
-- 为什么不创建线程?
-- Why don't you create threads?
-- 抱歉,我认为它不可用.如果你愿意,你可以带上你的电脑吗?
-- Sorry, I don't think it is usable. You can take your computer if you want?
-- 不好吧,我很酷,我只是不明白你为什么不使用线程?
-- No okay, I am cool, I just don't understand why you don't use threads?
--谢谢你的电脑.:) 我已经将工作分成几部分,并创建了并行处理这些部分的流程.所有 CPU 内核都将得到充分利用.我可以用线程而不是进程来做到这一点;但是 Node 有这种方式,我的老板 Parth Thakkar 希望我使用 Node.
-- Thank you for the computer. :) I already divided the work into pieces and I create processes to work on these pieces in parallel. All the CPU cores will be fully utilized. I could do this with threads instead of processes; but Node has this way and my boss Parth Thakkar wants me to use Node.
-- 好的,如果您需要另一台计算机,请告诉我.:p
-- Okay, let me know if you need another computer. :p
如果我创建了 33 个进程,而不是 32 个,操作系统的调度程序将暂停一个线程,启动另一个线程,在一些周期后暂停它,再次启动另一个线程......这是不必要的开销.我不想要这个.事实上,在具有 32 个内核的系统上,我什至不想创建恰好 32 个进程,31 个进程可以更好.因为不仅仅是我的应用程序可以在这个系统上运行.为其他东西留出一点空间会很好,特别是如果我们有 32 个房间.
If I create 33 processes, instead of 32, the operating system's scheduler will be pausing a thread, start the other one, pause it after some cycles, start the other one again... This is unnecessary overhead. I do not want it. In fact, on a system with 32 cores, I wouldn't even want to create exactly 32 processes, 31 can be nicer. Because it is not just my application that will work on this system. Leaving a little room for other things can be good, especially if we have 32 rooms.
我相信我们现在在充分利用处理器来处理CPU 密集型任务的问题上是一致的.
I believe we are on the same page now about fully utilizing processors for CPU-intensive tasks.
-- 嗯,哈桑,对不起,我有点嘲笑你.我相信我现在更了解你了.但是我仍然需要解释一些事情:运行数百个线程的所有嗡嗡声是什么?我到处都读到线程的创建速度比分叉进程快得多?你 fork 进程而不是线程,并且你认为它是你使用 Node 获得的最高值.那么 Node 不适合做这种工作吗?
-- Hmm, Hasan, I am sorry for mocking you a little. I believe I understand you better now. But there is still something I need an explanation for: What is all the buzz about running hundreds of threads? I read everywhere that threads are much faster to create and dumb than forking processes? You fork processes instead of threads and you think it is the highest you would get with Node. Then is Node not appropriate for this kind of work?
-- 不用担心,我也很酷.每个人都这么说,所以我想我已经听惯了.
-- No worries, I am cool, too. Everybody says these things so I think I am used to hearing them.
——所以?Node不适合这个?
-- So? Node is not good for this?
-- 尽管线程也可以,但 Node 对此非常有用.至于线程/进程创建开销;在你重复很多的事情上,每一毫秒都很重要.但是,我只创建了 32 个进程,而且会花费很少的时间.它只会发生一次.它不会有任何区别.
-- Node is perfectly good for this even though threads can be good too. As for thread/process creation overhead; on things that you repeat a lot, every millisecond counts. However, I create only 32 processes and it will take a tiny amount of time. It will happen only once. It will not make any difference.
-- 那我什么时候要创建数千个线程?
-- When do I want to create thousands of threads, then?
-- 您永远不想创建数千个线程.但是,在执行来自外部的工作的系统上,例如处理 HTTP 请求的 Web 服务器;如果您为每个请求使用一个线程,您将创建很多线程,其中很多.
-- You never want to create thousands of threads. However, on a system that is doing work that comes from outside, like a web server processing HTTP requests; if you are using a thread for each request, you will be creating a lot of threads, many of them.
-- Node 是不同的,但是?对吗?
-- Node is different, though? Right?
-- 是的,没错.这就是 Node 真正闪耀的地方.就像线程比进程轻得多,函数调用也比线程轻得多.节点调用函数,而不是创建线程.在 Web 服务器的示例中,每个传入请求都会导致函数调用.
-- Yes, exactly. This is where Node really shines. Like a thread is much lighter than a process, a function call is much lighter than a thread. Node calls functions, instead of creating threads. In the example of a web server, every incoming request causes a function call.
-- 嗯,有趣;但是如果您不使用多个线程,则只能同时运行一个函数.当大量请求同时到达 Web 服务器时,这如何工作?
-- Hmm, interesting; but you can only run one function at the same time if you are not using multiple threads. How can this work when a lot of requests arrive at the web server at the same time?
-- 关于函数如何运行,一次一个,而不是两个并行运行,您完全正确.我的意思是在单个进程中,一次只运行一个范围的代码.OS 调度程序不会来暂停这个函数并切换到另一个函数,除非它暂停进程给另一个进程,而不是我们进程中的另一个线程.(2)
-- You are perfectly right about how functions run, one at a time, never two in parallel. I mean in a single process, only one scope of code is running at a time. The OS Scheduler does not come and pause this function and switch to another one, unless it pauses the process to give time to another process, not another thread in our process. (2)
-- 那么一个进程如何一次处理2个请求?
-- Then how can a process handle 2 requests at a time?
-- 只要我们的系统有足够的资源(RAM、网络等),一个进程就可以一次处理数以万计的请求.这些功能的运行方式是关键区别.
-- A process can handle tens of thousands of requests at a time as long as our system has enough resources (RAM, Network, etc.). How those functions run is THE KEY DIFFERENCE.
-- 嗯,我现在应该兴奋吗?
-- Hmm, should I be excited now?
-- 也许 :) Node 在队列上运行一个循环.在这个队列中是我们的工作,即我们开始处理传入请求的调用.这里最重要的一点是我们设计函数运行的方式.我们没有开始处理请求并让调用者等待我们完成工作,而是在完成可接受的工作量后迅速结束我们的功能.当我们需要等待另一个组件做一些工作并返回一个值时,我们不再等待,而是简单地完成我们的函数,将其余的工作添加到队列中.
-- Maybe :) Node runs a loop over a queue. In this queue are our jobs, i.e, the calls we started to process incoming requests. The most important point here is the way we design our functions to run. Instead of starting to process a request and making the caller wait until we finish the job, we quickly end our function after doing an acceptable amount of work. When we come to a point where we need to wait for another component to do some work and return us a value, instead of waiting for that, we simply finish our function adding the rest of work to the queue.
-- 听起来太复杂了?
-- It sounds too complex?
-- 不不,我可能听起来很复杂;但系统本身非常简单,而且非常有意义.
-- No no, I might sound complex; but the system itself is very simple and it makes perfect sense.
现在我想停止引用这两个开发人员之间的对话,并在最后一个关于这些功能如何工作的快速示例之后完成我的回答.
Now I want to stop citing the dialogue between these two developers and finish my answer after a last quick example of how these functions work.
通过这种方式,我们正在做 OS Scheduler 通常会做的事情.我们在某个时候暂停我们的工作,让其他函数调用(如多线程环境中的其他线程)运行,直到再次轮到我们.这比将工作留给操作系统调度程序要好得多,操作系统调度程序试图为系统上的每个线程提供时间.我们比 OS Scheduler 更清楚我们在做什么,我们应该在该停止的时候停止.
In this way, we are doing what OS Scheduler would normally do. We pause our work at some point and let other function calls (like other threads in a multi-threaded environment) run until we get our turn again. This is much better than leaving the work to OS Scheduler which tries to give just time to every thread on system. We know what we are doing much better than OS Scheduler does and we are expected to stop when we should stop.
下面是一个简单的例子,我们打开一个文件并读取它来对数据做一些工作.
Below is a simple example where we open a file and read it to do some work on the data.
同步方式:
Open File
Repeat This:
Read Some
Do the work
异步方式:
Open File and Do this when it is ready: // Our function returns
Repeat this:
Read Some and when it is ready: // Returns again
Do some work
如你所见,我们的函数要求系统打开一个文件,而不是等待它被打开.它通过在文件准备好后提供后续步骤来完成自己.当我们返回时,Node 会在队列上运行其他函数调用.运行完所有函数后,事件循环移动到下一个回合......
As you see, our function asks the system to open a file and does not wait for it to be opened. It finishes itself by providing next steps after file is ready. When we return, Node runs other function calls on the queue. After running over all the functions, the event loop moves to next turn...
总而言之,Node 与多线程开发有着完全不同的范式;但这并不意味着它缺乏东西.对于同步作业(我们可以决定处理的顺序和方式),它与多线程并行一样有效.对于来自外部的工作,比如对服务器的请求,它简直是优越的.
In summary, Node has a completely different paradigm than multi-threaded development; but this does not mean that it lacks things. For a synchronous job (where we can decide the order and way of processing), it works as well as multi-threaded parallelism. For a job that comes from outside like requests to a server, it simply is superior.
(1) 除非您使用其他语言(如 C/C++)构建库,在这种情况下您仍然不会创建用于划分作业的线程.对于这种工作,您有两个线程,其中一个将继续与 Node 通信,而另一个执行实际工作.
(1) Unless you are building libraries in other languages like C/C++ in which case you still do not create threads for dividing jobs. For this kind of work you have two threads one of which will continue communication with Node while the other does the real work.
(2) 事实上,出于我在第一个脚注中提到的相同原因,每个 Node 进程都有多个线程.然而,这不像 1000 个线程在做类似的工作.这些额外的线程用于接受 IO 事件和处理进程间消息.
(2) In fact, every Node process has multiple threads for the same reasons I mentioned in the first footnote. However this is no way like 1000 threads doing similar works. Those extra threads are for things like to accept IO events and to handle inter-process messaging.
@Mark,感谢您的建设性批评.在 Node 的范式中,除非队列中的所有其他调用都被设计为一个接一个地运行,否则您永远不应该拥有处理时间过长的函数.对于计算量很大的任务,如果我们完整地查看图片,我们会发现这不是我们应该使用线程还是进程?"的问题.但是一个问题是我们如何以平衡的方式将这些任务划分为我们可以使用系统上的多个 CPU 内核并行运行它们的子任务?"假设我们将在具有 8 个内核的系统上处理 400 个视频文件.如果我们想一次处理一个文件,那么我们需要一个系统来处理同一个文件的不同部分,在这种情况下,多线程单进程系统可能更容易构建,甚至更高效.我们仍然可以通过运行多个进程并在需要状态共享/通信时在它们之间传递消息来使用 Node.正如我之前所说,Node 的多进程方法在此类任务中以及 是一种多线程方法;但不止于此.同样,正如我之前所说,Node 的亮点是当我们将这些任务作为来自多个来源的系统输入时,因为与每个连接的线程或每个连接的进程相比,在 Node 中同时保持多个连接要轻得多系统.
@Mark, thank you for the constructive criticism. In Node's paradigm, you should never have functions that takes too long to process unless all other calls in the queue are designed to be run one after another. In case of computationally expensive tasks, if we look at the picture in complete, we see that this is not a question of "Should we use threads or processes?" but a question of "How can we divide these tasks in a well balanced manner into sub-tasks that we can run them in parallel employing multiple CPU cores on the system?" Let's say we will process 400 video files on a system with 8 cores. If we want to process one file at a time, then we need a system that will process different parts of the same file in which case, maybe, a multi-threaded single-process system will be easier to build and even more efficient. We can still use Node for this by running multiple processes and passing messages between them when state-sharing/communication is necessary. As I said before, a multi-process approach with Node is as well as a multi-threaded approach in this kind of tasks; but not more than that. Again, as I told before, the situation that Node shines is when we have these tasks coming as input to system from multiple sources since keeping many connections concurrently is much lighter in Node compared to a thread-per-connection or process-per-connection system.
至于 setTimeout(...,0)
调用;有时可能需要在耗时的任务期间暂停,以允许队列中的调用有自己的处理份额.以不同的方式划分任务可以使您免于这些;但是,这并不是真正的黑客,它只是事件队列的工作方式.此外,为此目的使用 process.nextTick
会好得多,因为当您使用 setTimeout
时,需要计算和检查通过的时间,而 process.nextTick
正是我们真正想要的:嘿任务,回到队列的末尾,你已经使用了你的份额!"
As for setTimeout(...,0)
calls; sometimes giving a break during a time consuming task to allow calls in the queue have their share of processing can be required. Dividing tasks in different ways can save you from these; but still, this is not really a hack, it is just the way event queues work. Also, using process.nextTick
for this aim is much better since when you use setTimeout
, calculation and checks of the time passed will be necessary while process.nextTick
is simply what we really want: "Hey task, go back to end of the queue, you have used your share!"
这篇关于哪个更适合 node.js 上的并发任务?纤维?网络工作者?或线程?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!