问题描述
我正在开发一个视频应用程序,并使用默认URL,例如 https://***.amazonaws.com/*** $ c $,将文件存储在AWS S3上c>可以正常工作,但我决定使用CloudFront来更快地进行内容交付。
使用CF,我一直得到 403(禁止访问)
使用此URL https://***.cloudfront.net/***
。我错过了什么吗?
一切正常,直到我决定从CloudFront加载指向我的存储桶的内容为止。
请问有解决方案吗?
使用检查接收到的<$ c $的存储桶策略限制对S3内容的访问时c> Referer:标头,您需要做一些自定义配置,以超越 CloudFront。
了解CloudFront非常重要设计为行为良好的缓存。所谓行为良好,是指CloudFront设计为永远不会返回与原始服务器将返回的响应不同的响应。我确信您可以看到这是一个重要因素。
假设我在CloudFront后面有一个Web服务器(不是S3),并且我的网站设计为它会根据对 Referer:
标头...或其他任何http请求标头的检查,例如 User-Agent:$,返回不同的内容。例如,c $ c>。根据您的浏览器,我可能会返回不同的内容。 CloudFront如何知道这一点,从而避免为用户提供某个页面的错误版本?
答案是,它无法分辨- -它不知道。因此,CloudFront的解决方案根本不将大多数请求标头转发到我的服务器。我的Web服务器看不到,无法响应,因此我返回的内容不会根据我未收到的标头而有所不同,这会阻止CloudFront基于这些标头缓存并返回错误的响应。 Web缓存有义务避免为给定的页面返回错误的缓存内容。
但是请稍等,您反对。 我的网站取决于特定标头中的值,以便确定如何响应。没错,这很有意义...所以我们必须告诉CloudFront:
除了仅根据请求的路径缓存页面外,我还需要您转发引荐来源网址:
或用户代理:
或浏览器,和缓存发送的其他几个标头之一用于其他请求的响应,这些请求不仅包含相同的路径,而且还包含转发给我的额外标头的相同值。
但是,当原始服务器为S3时,CloudFront不支持转发大多数请求标头,因为静态内容不太可能变化,因此这些标头只会导致它不必要地缓存多个相同的响应。
您的解决方案是不告诉CloudFront您使用S3作为起点。相反,将您的分发配置为使用自定义原始,并为其指定存储桶的主机名,以用作原始服务器的主机名。
然后,您可以配置CloudFront将 Referer:
标头转发到源,并且您的S3存储桶策略(基于该标头拒绝/允许请求)将按预期工作。
好吧,几乎可以预期。因为现在将基于路径+引用页面来缓存缓存的页面,所以这会在某种程度上降低缓存的命中率。如果一个S3对象被您的网站的多个页面引用,则CloudFront将为每个唯一请求缓存一个副本。这听起来像是一个限制,但实际上,这只是适当的缓存行为的产物-无论转发到后端,几乎所有东西,都必须用来确定该特定响应是否可用于满足将来的请求。 / p>
请参见用于将CloudFront配置为将要发送到您的特定标头列入白名单
重要提示:请勿转发您不需要的任何标头,因为每个变体请求都会进一步降低您的命中率。尤其是在将S3用作自定义来源的后端时,请勿转发 Host:
标头,因为这可能无法满足您的期望。在此处选择 Referer:
标头,然后进行测试。 S3应该开始看到标题并做出相应的反应。
请注意,当您删除存储桶策略进行测试时,CloudFront将继续提供缓存的错误页面,除非您刷新通过发送无效请求来缓存,这将导致CloudFront在大约15分钟的时间内清除与您指定的路径模式匹配的所有缓存页面。实验时最简单的操作是使用新配置创建新的CloudFront分发,因为分发本身不收费。
查看响应标头时从CloudFront中,请注意 X缓存:
(命中/未命中)和 Age:
(此特定页面不久前已缓存)响应。这些在故障排除中也很有用。
更新: 有一个重要的发现:与其使用存储桶策略并将 Referer:
标头转发到S3进行分析,而不是这样做–这会在一定程度上损害您的缓存比率,具体程度取决于资源在引用页面上的分布情况-您可以使用新的AWS Web Application Firewall服务,该服务可对传入CloudFront的请求强加过滤规则,以允许或阻止请求基于。
为此,您需要将分发版作为S3来源连接到S3(正常配置,与我的相反建议在上述解决方案中使用自定义来源),并将CloudFront的内置功能用于然后向S3发出后端请求(因此,如果恶意行为者直接从S3请求,则无法直接访问存储桶内容)。
请参见以获得更多有关此选项的信息。
I'm working on a video app and storing the files on AWS S3, using the default URL like https://***.amazonaws.com/***
works fine but I have decided to use CloudFront which is faster for content delivery.
Using CF, I keep getting 403 (Forbidden)
using this URL https://***.cloudfront.net/***
. Did I miss anything?
Everything works fine until I decide to load the contents from CloudFront which points to my bucket.
Any solution please?
When restricting access to S3 content using a bucket policy that inspects the incoming Referer:
header, you need to do a little bit of custom configuration to "outsmart" CloudFront.
It's important to understand that CloudFront is designed to be a well-behaved cache. By "well-behaved," I mean that CloudFront is designed to never return a response that differs from what the origin server would have returned. I'm sure you can see that is an important factor.
Let's say I have a web server (not S3) behind CloudFront, and my web site is designed so that it returns different content based on an inspection of the Referer:
header... or any other http request header, like User-Agent:
for example. Depending on your browser, I might return different content. How would CloudFront know this, so that it would avoid serving a user the wrong version of a certain page?
The answer is, it wouldn't be able to tell -- it can't know this. So, CloudFront's solution is not to forward most request headers to my server at all. What my web server can't see, it can't react to, so the content I return cannot vary based on headers I don't receive, which prevents CloudFront from caching and returning the wrong response, based on those headers. Web caches have an obligation to avoid returning the wrong cached content for a given page.
"But wait," you object. "My site depends on the value from a certain header in order to determine how to respond." Right, that makes sense... so we have to tell CloudFront this:
Instead of caching my pages based on just the requested path, I need you to also forward the Referer:
or User-Agent:
or one of several other headers as sent by the browser, and cache the response for use on other requests that include not only the same path, but also the same values for the extra header(s) that you forward to me.
However, when the origin server is S3, CloudFront doesn't support forwarding most request headers, on the assumption that since static content is unlikely to vary, these headers would just cause it to cache multiple identical responses unnecessarily.
Your solution is not to tell CloudFront that you're using S3 as the origin. Instead, configure your distribution to use a "custom" origin, and give it the hostname of the bucket to use as the origin server hostname.
Then, you can configure CloudFront to forward the Referer:
header to the origin, and your S3 bucket policy that denies/allows requests based on that header will work as expected.
Well, almost as expected. This will lower your cache hit ratio somewhat, since now the cached pages will be cached based on path + referring page. It an S3 object is referenced by more than one of your site's pages, CloudFront will cache a copy for each unique request. It sounds like a limitation, but really, it's only an artifact of proper cache behavior -- whatever gets forwarded to the back-end, almost all of it, must be used to determine whether that particular response is usable for servicing future requests.
See http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/distribution-web-values-specify.html#DownloadDistValuesForwardHeaders for configuring CloudFront to whitelist specific headers to send to your origin server.
Important: don't forward any headers you don't need, since every variant request reduces your hit rate further. Particularly when using S3 as the back-end for a custom origin, do not forward the Host:
header, because that is probably not going to do what you expect. Select the Referer:
header here, and test. S3 should begin to see the header and react accordingly.
Note that when you removed your bucket policy for testing, CloudFront would have continued to serve the cached error page unless you flushed your cache by sending an invalidation request, which causes CloudFront to purge all cached pages matching the path pattern you specify, over the course of about 15 minutes. The easiest thing to do when experimenting is to just create a new CloudFront distribution with the new configuration, since there is no charge for the distributions themselves.
When viewing the response headers from CloudFront, note the X-Cache:
(hit/miss) and Age:
(how long ago this particular page was cached) responses. These are also useful in troubleshooting.
Update: @alexjs has made an important observation: instead of doing this using the bucket policy and forwarding the Referer:
header to S3 for analysis -- which will hurt your cache ratio to an extent that varies with the spread of resources over referring pages -- you can use the new AWS Web Application Firewall service, which allows you to impose filtering rules against incoming requests to CloudFront, to allow or block requests based on string matching in request headers.
For this, you'd need to connect the distribution to S3 as as S3 origin (the normal configuration, contrary to what I proposed, in the solution above, with a "custom" origin) and use the built-in capability of CloudFront to authenticate back-end requests to S3 (so the bucket contents aren't directly accessible if requested from S3 directly by a malicious actor).
See https://www.alexjs.eu/preventing-hotlinking-using-cloudfront-waf-and-referer-checking/ for more on this option.
这篇关于加载AWS CloudFront文件时获取403(禁止)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!