使用 XSLT 标记文本匹配正则表达式?

本文介绍了使用 XSLT 标记文本匹配正则表达式?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试在 HTML 文档上使用 XSLT 2.0 (Saxon-PE 9.6) 来创建标记，这些标记将来自指定的非拉丁 Unicode 块(允许空格)中的所有连续字符包围起来.我需要将此过程应用于文档中的每个 text() 节点.我已经通过使用 <xsl:analyze-string> 和使用 fn:replace() 的两种方法取得了一些进展，但我还没有达到令人满意和完整的解决方案.

I am trying to use XSLT 2.0 (Saxon-PE 9.6) on an HTML document to create tags that surround all contiguous runs of characters from a specified non-Latin Unicode block (spaces allowed). I need to apply this process to every text() node in the document. I have made some progress with two approaches that use <xsl:analyze-string> and using fn:replace() but I've not been able to arrive at a satisfactory and complete solution.

例如，这里有一些包含印地语的文本:

For example, here is some text containing Hindi:

输入:चायकाकप在हिन्दि中的意思是一杯茶".

期望输出:

चाय का कप在 हिन्दि.

Desired Output: चाय का कप means ‘cup of tea’ in हिन्दि.

如何在 XSLT 2.0 中实现这个过程?

How can this process be implemented in XSLT 2.0?

这是我对的尝试:

(注意:印地语使用梵文代码块 U+0900 到 U+097F.)

(Note: the Hindi language uses the Devanagari code block U+0900 to U+097F.)

<xsl:template match="text()">
  <xsl:variable name="textValue" select="."/>

  <xsl:analyze-string select="$textValue" regex="(\s*.*?)([&#x0900;-&#x097f;]+)((\s+[&#x0900;-&#x097f;]+)*)(\s*.*)">

    <xsl:matching-substring>
      <xsl:value-of select="regex-group(1)"/>
      <span xml:lang="hi-Deva"><xsl:value-of select="regex-group(2)"/><xsl:value-of select="regex-group(3)"/></span>
      <xsl:value-of select="regex-group(5)"/>
    </xsl:matching-substring>

    <xsl:non-matching-substring>
      <xsl:value-of select="$textValue"/>
    </xsl:non-matching-substring>

  </xsl:analyze-string>
</xsl:template>

在测试输入上，这会产生:चायकाकप在हिन्दि中表示一杯茶". 这种方法忽略了印地语文本的第二个区域 (हिन्दि).我需要一种方法来查找和标记与正则表达式匹配的所有事件.

On the test input, this produces:चाय का कप means ‘cup of tea’ in हिन्दि. This approach misses the second region of Hindi text (हिन्दि). I need an approach that will find and tag all occurrences matched by the regex.

我的第二种方法使用了 fn:replace():

My second approach used fn:replace():

<xsl:template match="text()">
  <xsl:value-of select='fn:replace(., "[&#x0900;-&#x097f;]+(\s+[&#x0900;-&#x097f;]+)*", "xxx$0xxx")'/>
</xsl:template>

在测试输入中产生:xxxचाय का कपxxx 在 xxxहिन्दिxxx 中的意思是一杯茶". 这显然不正确，因为在印地语中被包裹，不是跨度标签，但从积极的方面来说，印地语的每个区域实际上都被发现和处理.我不能用 span 标签替换 xxx 代码，因为那是无效的 XSLT.

On the test input this produces: xxxचाय का कपxxx means ‘cup of tea’ in xxxहिन्दिxxx. This is clearly incorrect, since the Hindi is wrapped in xxx’s, not span tags, but on the positive side, each region of Hindi is in fact discovered and processed. I cannot replace the xxx code with span tags because that is invalid XSLT.

推荐答案

我想出了 http://xsltransform.net/jyH9rMo 就是这样

I came up with http://xsltransform.net/jyH9rMo which just does

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
    <xsl:output method="html" doctype-public="XSLT-compat" omit-xml-declaration="yes" encoding="UTF-8" indent="yes" />

    <xsl:template match="/">
      <hmtl>
        <head>
          <title>New Version!</title>
        </head>
        <xsl:apply-templates/>
      </hmtl>
    </xsl:template>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="text()">
   <xsl:analyze-string select="." regex="([&#x0900;-&#x097f;]+)((\s+[&#x0900;-&#x097f;]+)*)">

    <xsl:matching-substring>
      <span xml:lang="hi-Deva"><xsl:value-of select="."/></span>
    </xsl:matching-substring>

    <xsl:non-matching-substring>
      <xsl:value-of select="."/>
    </xsl:non-matching-substring>

  </xsl:analyze-string>       
    </xsl:template>
</xsl:transform>

这篇关于使用 XSLT 标记文本匹配正则表达式?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！