本文介绍了NGINX 删除 .html 扩展名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我找到了删除页面上 .html 扩展名的答案,此代码可以正常工作:

So, I found an answer to removing the .html extension on my page, that works fine with this code:

server {
    listen 80;
    server_name _;
    root /var/www/html/;
    index index.html;

    if (!-f "${request_filename}index.html") {
        rewrite ^/(.*)/$ /$1 permanent;
    }

    if ($request_uri ~* "/index.html") {
        rewrite (?i)^(.*)index.html$ $1 permanent;
    }

    if ($request_uri ~* ".html") {
        rewrite (?i)^(.*)/(.*).html $1/$2 permanent;
    }

    location / {
        try_files $uri.html $uri $uri/ /index.html;
    }
}

但是如果我打开 mypage.com 它会将我重定向到 mypage.com/index
这不是通过将 index.html 声明为索引来解决的吗?任何帮助表示赞赏.

But if I open mypage.com it redirects me to mypage.com/index
Wouldn't this be fixed by declaring index.html as index? Any help is appreciated.

推荐答案

NGINX 中删除.html"的圣杯"解决方案:

更新的答案:这个问题激起了我的好奇心,于是我又一次更深入地寻找圣杯".NGINX 中 .html 重定向的解决方案.这是我找到的答案的链接,因为我不是自己想出来的:https://stackoverflow.com/a/32966347/4175718

The "Holy Grail" Solution for Removing ".html" in NGINX:

UPDATED ANSWER: This question piqued my curiosity, and I went on another, more in-depth search for a "holy grail" solution for .html redirects in NGINX. Here is the link to the answer I found, since I didn't come up with it myself: https://stackoverflow.com/a/32966347/4175718

不过,我会举一个例子并解释它是如何工作的.代码如下:

However, I'll give an example and explain how it works. Here is the code:

location / {
    if ($request_uri ~ ^/(.*).html) {
        return 302 /$1;
    }
    try_files $uri $uri.html $uri/ =404;
}

这里发生的事情是对 if 指令的巧妙使用.NGINX 在传入请求的 $request_uri 部分运行正则表达式.正则表达式检查 URI 是否具有 .html 扩展名,然后将 URI 的无扩展名部分存储在内置变量 $1 中.

What's happening here is a pretty ingenious use of the if directive. NGINX runs a regex on the $request_uri portion of incoming requests. The regex checks if the URI has an .html extension and then stores the extension-less portion of the URI in the built-in variable $1.

来自 docs,因为我花了一段时间才弄清楚在哪里$1 来自:

From the docs, since it took me a while to figure out where the $1 came from:

正则表达式可以包含可供以后在 $1..$9 变量中重用的捕获.

正则表达式检查是否存在不需要的 .html 请求并有效地清理 URI 使其不包含扩展名.然后,使用一个简单的 return 语句,将请求重定向到现在存储在 $1 中的清理过的 URI.

The regex both checks for the existence of unwanted .html requests and effectively sanitizes the URI so that it does not include the extension. Then, using a simple return statement, the request is redirected to the sanitized URI that is now stored in $1.

正如原作者 cnst 解释的那样,最好的部分是

The best part about this, as original author cnst explains, is that

由于每个请求 $request_uri 始终保持不变,并且不受其他重写的影响,因此实际上不会形成任何无限循环.

与对任何 .html 请求(包括对/index.html 的不可见内部重定向)进行操作的重写不同,此解决方案仅对用户可见的外部 URI 进行操作.

Unlike the rewrites, which operate on any .html request (including the invisible internal redirect to /index.html), this solution only operates on external URIs that are visible to the user.

您仍然需要 try_files 指令,否则 NGINX 将不知道如何处理新清理的无扩展 URI.上面显示的 try_files 指令将首先自己尝试新的 URL,然后使用.html"来尝试它.扩展名,然后尝试将其作为目录名.

You will still need the try_files directive, as otherwise NGINX will have no idea what to do with the newly sanitized extension-less URIs. The try_files directive shown above will first try the new URL by itself, then try it with the ".html" extension, then try it as a directory name.

NGINX 文档还解释了默认的 try_files 指令是如何工作的.默认的 try_files 指令的顺序与上面的示例不同,因此下面的解释并不完全一致:

The NGINX docs also explain how the default try_files directive works. The default try_files directive is ordered differently than the example above so the explanation below does not perfectly line up:

NGINX 将首先将 .html 附加到 URI 的末尾并尝试为其提供服务.如果找到合适的 .html 文件,它将返回该文件并维护无扩展名的 URI.如果找不到合适的 .html 文件,它会尝试不带任何扩展名的 URI,然后将 URI 作为目录,最后返回 404 错误.

更新:正则表达式有什么作用?

上面的回答涉及到了正则表达式的使用,但这里有一个更具体的解释给那些仍然好奇的人.使用了以下正则表达式(regex):

UPDATE: What does the regex do?

The above answer touches on the use of regular expressions, but here is a more specific explanation for those who are still curious. The following regular expression (regex) is used:

^/(.*).html

这分解为:

^:表示行首.

/:匹配字符/";字面上地.在 NGINX 中不需要对正斜杠进行转义.

/: match the character "/" literally. Forward slashes do NOT need to be escaped in NGINX.

(.*):捕获组:无限次匹配任意字符

(.*): capturing group: match any character an unlimited number of times

.:匹配字符.";字面上地.这必须用反斜杠转义.

.: match the character "." literally. This must be escaped with a backslash.

html:匹配字符串html";字面意思.

html: match the string "html" literally.

捕获组 (.*) 是包含非.html"的URL 的一部分.稍后可以使用变量 $1 引用它.然后 NGINX 被配置为重新尝试请求(return 302/$1;),并且 try_files 指令在内部重新附加.html".扩展名,以便可以找到文件.

The capturing group (.*) is what contains the non-".html" portion of the URL. This can later be referenced with the variable $1. NGINX is then configured to re-try the request (return 302 /$1;) and the try_files directive internally re-appends the ".html" extension so the file can be located.

要保留传递给 .html 页面的查询字符串和参数,可以将 return 语句更改为:

To retain query strings and arguments passed to a .html page, the return statement can be changed to:

return 302 /$1$is_args$args;

这应该允许诸如 /index.html?test 之类的请求重定向到 /index?test 而不仅仅是 /index.

This should allow requests such as /index.html?test to redirect to /index?test instead of just /index.

来自 NGINX 页面 If Is Evil:

From the NGINX page If Is Evil:

如果在位置上下文中,可以在内部完成的唯一 100% 安全的事情是:

返回...;

重写...最后;


另外,请注意,您可以将302"重定向替换为301".

301 重定向是永久性的,并由网络浏览器和搜索引擎缓存.如果您的目标是从已被搜索引擎索引的页面中永久删除 .html 扩展名,您将需要使用 301 重定向.但是,如果您在实时站点上进行测试,最好的做法是从 302 开始,只有在您完全确信您的配置工作正常时才移动到 301.


Also, note that you may swap out the '302' redirect for a '301'.

A 301 redirect is permanent, and is cached by web browsers and search engines. If your goal is to permanently remove the .html extension from pages that are already indexed by a search engine, you will want to use a 301 redirect. However, if you are testing on a live site, it is best practice to start with a 302 and only move to a 301 when you are absolutely confident your configuration is working correctly.

这篇关于NGINX 删除 .html 扩展名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-14 19:15