问题描述
我的客户要求我构建一个实时应用程序,该应用程序可以实时聊天、发送图像和视频.他让我想出自己的技术栈,所以我做了很多研究,发现最容易构建的就是使用下面的技术栈
My client ask me to build a realtime application that could chat, send images and videos all in realtime. He asked me to come up with my own technology stack, so I did a lot of research and found out that the easiest one to build would be using below tech stack
1) Node.js 和集群为一个服务器实例最大化 CPU 内核 - 语言
1) Node.js and cluster to max out the CPU core for one instance of server - Language
2) Socket.io - 实时框架
2) Socket.io - realtime framework
3) Redis - 发布/订阅多个服务器实例
3) Redis - pub/sub for multiple instances of server
4) Nginx - 反向代理和负载均衡多个服务器
4) Nginx - to reverse proxy and load balance multiple servers
5) Amazon EC2 - 运行服务器
5) Amazon EC2 - to run the server
6) Amazon S3 和 CloudFront - 保存图像/视频并交付
6) Amazon S3 and CloudFront - to save the images/videos and to deliver
如果我对上述堆栈有误,请纠正我.我真正的问题是,上述技术堆栈可以每秒扩展 1,000,000 条消息(文本、图像、视频)吗?
Correct me if I'm wrong for the above stack. My real question is, can the above tech stack scale 1,000,000 messages per seconds (text, images, videos)?
任何使用过 node.js 和 socket.io 的人都可以给我提供上述堆栈的见解或替代方案.
Anyone who have experienced with node.js and socket.io, could give me an insights or an alternatives of the above stack.
问候,
SinusGob
推荐答案
当然可以.使用正确的设计和足够的硬件.您的客户应该问的问题实际上不是它是否可以做得那么大,而是可以做到的成本和实用性,以及这些是最好的选择.
Sure it can. With the right design and enough hardware. The question your client should be asking is really not whether it can be made to go that big, but at what cost and practicality can it be done and are those the best choices.
让我们看看你提到的每一件事情:
Let's look at each piece you've mentioned:
node.js - 对于以 I/O 为中心的应用程序,它是大规模的绝佳选择,它可以通过在集群中部署多个 CPU(每个服务器多进程和多进程)进行扩展.服务器).这种规模的实用性在很大程度上取决于所有这些服务器进程需要访问哪种共享数据.通常,数据存储最终会成为扩展中更难的瓶颈,因为很容易在请求处理中投入更多的服务器.将更多硬件投入集中式数据存储并非易事.有很多方法可以做到这一点,但这在很大程度上取决于应用对你如何做以及难度有多大的要求.
node.js - For an I/O centric app, it's an excellent choice for high scale and it can scale by deploying many CPUs in a cluster (both multi-process per server and multi-server). How practical this type of scale is depends a lot on what kind of shared data all these server processes need access to. Usually, the data store ultimately ends up being the harder bottleneck in scaling because it's easy to throw more servers at the request processing. It's not so easy to throw more hardware at a centralized data store. There are ways to do that, but it depends a lot on the demands of the app for how you do it and how hard it is.
socket.io - 如果您需要高效的服务器推送小消息,那么 socket.io 可能是最好的方法,因为它最有效地推送到客户端.不过,它并不适用于所有类型的运输.例如,我不会通过 socket.io 移动大图像或视频,因为有更多专门构建的方法可以做到这一点.因此,socket.io 的使用在很大程度上取决于应用程序想要将它用于什么目的.如果您想将视频推送给客户端,您也可以只推送一个 URL,然后让客户端使用众所周知的大规模技术通过常规 http URL 转过来请求视频.
socket.io - If you need efficient server push of smallish messages, then socket.io is probably the best way to go because it's the most efficient at push to the client. It is not great at all types of transport though. For example, I wouldn't be moving large images or video around through socket.io as there are more purpose built ways to do that. So, the use of socket.io depends a lot on what exactly the app wants to use it for. If you wanted to push a video to a client, you could also push just an URL and have the client turn around and request the video via a regular http URL using well known high scale technology.
Redis - 同样,对某些事情很好,但对所有事情都不是很好.所以,这真的取决于你想要做什么.我之前解释过的是,数据存储的设计和通过它的事务数量可能是您真正的规模问题所在.如果我开始这项工作,我会首先了解服务器的数据存储需求、各种类型的每秒事务数、缓存策略、冗余、故障转移、数据持久性等......并设计高首先扩展对数据的访问.我不能完全确定 redis 是首选.我可能会建议您在项目早期需要一个高级数据库专家作为顾问.
Redis - Again, great for some things, not great at everything. So, it really depends upon what you're trying to do. What I explained earlier is that the design of your data store and the number of transactions through it is probably where your real scale problems lie. If I were starting this job, I'd start with an understanding of the data storage needs for a server, transactions per second of various types, caching strategy, redundancy, fail-over, data persistence, etc... and design the high scale access to data first. I wouldn't be entirely sure redis was the preferred choice. I'd probably suggest you need a high scale database guy as a consultant early in the project.
Nginx - 许多大型站点使用 nginx,因此它当然是一个很好的工具.它是否正是适合您的工具取决于您的设计.我可能会最后处理这部分,因为它似乎不是设计的核心,一旦系统的其余部分布局完毕,您就可以在这里考虑您需要什么.
Nginx - Lots of high scale sites using nginx so it's certainly a good tool. Whether it's exactly the right tool for you depends upon your design. I'd probably work on this part last because it seems less central to the design and once the rest of the system is laid out, you can then consider what you need here.
Amazon EC2 - 几种可能的选择之一.这些选择很难在苹果与苹果的比较中直接进行比较.大型系统是由 EC2 构建的,因此那里有概念证明,并且通用架构似乎是合适的匹配.如果您想知道真正的小鬼在哪里,您需要一位在 EC2 上做过大规模工作的顾问.
Amazon EC2 - One of several possible choices. These choices are hard to compare directly in an apples to apples comparison. Large scale systems have been built out of EC2 so there is proof of concept there and the general architecture seems an appropriate match. If you wanted to know where the real gremlins are there, you'd need a consultant that had done high scale stuff on EC2.
Amazon S3 - 我个人知道一些非常高的存储和带宽站点使用 S3 来处理视频和图像.它适用于此.
Amazon S3 - I personally know some very high storage and bandwidth sites using S3 for both video and images. It works for that.
所以......如果以正确的方式使用,这些通常可能是很好的工具.Redis 将是一个问号,具体取决于实际应用程序的存储需求(您提供了零要求,并且不能选择零要求的数据库).一个更合理的答案是基于将一组高级需求放在一起,这些需求分析系统需要能够做什么才能为 1,000,000 人提供服务.可以将这些要求与其中一些部分的已知功能进行比较,以开始扩展系统.然后,您必须将一些基准测试放在一起,以便对系统的某些部分运行一些测试.失败的成功很大程度上取决于应用程序的构建方式和工具的使用方式,以及选择了哪些工具.您可能可以使用许多不同类型的工具成功地进行扩展.哎呀,Facebook 运行在 PHP 上(嗯,这是一种高度修改、定制的 PHP,在运行时根本不是典型的 PHP).
So ... these are generally likely good tools to use if they are used in the right way. Redis would be a question-mark depending upon the storage needs of the actual application (you've provided zero requirements and a database can't be selected with zero requirements). A more reasoned answer would be based on putting together a high level set of requirements that analyze what the system needs to be able to do to serve 1,000,000 whatever. Those requirements could be compared with known capabilities for some of these pieces to start a ballpark on scaling a system. Then, you'd have to put together some benchmarking tests to run some tests on certain pieces of the system. As much of the success of failure would depend upon how the app was built and how the tools were used as it would which tools were selected. You can likely make a successful scale with many different types of tools. Heck, Facebook runs on PHP (well, a highly modified, customized PHP that is not really typical PHP at all at runtime).
这篇关于这种技术堆栈可以扩展吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!