问题描述
我们有一个Rails应用程序,可以正常运行数月.今天,我们发现与领导人选举有些矛盾.主要是:
We have a rails app that has been working fine for months. Today we discovered some inconsistencies with leader election. Primarily:
su - "leader_only bundle exec rake db:migrate" webapp
经过数小时的反复试验(以及数十次部署),我们的dev应用程序中的所有实例均未运行此迁移. /usr/bin/leader_only
查找从未在任何实例上设置的环境变量(开发应用程序只有一个实例).
After many hours of trial and error (and dozens of deployments) none of the instances in our dev application run this migration. /usr/bin/leader_only
looks for an environment variable that is never set on any instance (the dev app has only one instance).
一次将应用程序部署设置为1个实例,并提供/usr/bin/leader_only
期望为env var的值,但不能一直有效. (现在所有实例都是领导者,因此它们将无用地运行db:migrate,并且一次运行1,因此,如果我们有很多实例,这会使我们放慢速度)
Setting the application deployment to 1 instance at a time and providing the value that /usr/bin/leader_only
expects as an env var works, but not as it has been and should. (Now all instances are leaders so they will fruitlessly run db:migrate and it's 1 at a time, so if we have many instances this will slow us down)
我们认为可能是由于代码和/或应用程序的某些问题所致,所以我们对其进行了重新构建.没变化.
We thought maybe it was due to some issues with the code and/or app, so we rebuilt it. No change.
我什至克隆了测试应用程序的RDS服务器,并从保存的配置中创建了一个新应用程序,部署了一个新的git hash,并且它也从未运行过db:migrate.它尝试显示并显示leader_only行,但是从不运行.那排除了代码,配置和工件.
I even cloned our test application's RDS server and created a new application from a saved configuration, deployed a new git hash, and it never ran db:migrate as well. It attempts to and shows the leader_only line, but it never runs. That rules out code, configuration, artifacts.
对于它的价值,它也从未说过由于RAILS_SKIP_MIGRATIONS而导致的迁移,该迁移的值为false.这意味着它实际上是在尝试运行db:migrate,但这并不是由于没有被描述为领导者.
Also for what its worth, it never says skipping migrations due to RAILS_SKIP_MIGRATIONS, which has a value of false. This means that it is in fact trying to run db:migrate but isn't due to not being described as the leader.
推荐答案
我们一直在与AWS支持团队进行交流.看来EB领导人选举非常脆弱.根据技术:
We have been in talk with the AWS support teams. It seems as though EB leader election is very fragile.Per the tech:
发生的事情是我们丢失了所有实例.领导者将被选举一次,并通过实例旋转进行传递.如果您没有丢失所有实例,那么一切都很好.
What happened is that we lost all instances. The leader is elected once, and is passed through instance rotation. If you do not lose all instances, everything is fine.
我没有提及细节.我们有许多非生产环境,并且通过弹性beantalk自动缩放设置,我们使用定时缩放将实例数在晚上设置为0,并在白天恢复到预期的1-2数量.我们在开发,测试和UAT环境中执行此操作,以确保我们不会全天候24/7运行.因此,我们失去了领导者,却再也没有得到它.
I did not mention a detail. We have many non-production environments, and through elastic beanstalk autoscaling settings, we use timed scaling to set our instance count to 0 at night, and back up to the expected 1-2 amount during the day. We do this for our dev, test, and UAT environments to make sure we dont run at full speed 24/7. Because of this, we lost the leader and never got it back.
根据技术人员的跟进:
这篇关于弹性豆茎领袖选举的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!