问题描述
我有一个这样的示例 xml 文件,
I have a sample xml file like this,
<doc>
<p>text1 text2 </p>
<p>text1 text2 </p>
<p>text1 text2 </p>
</doc>
这个示例xml,第一个有空格空格字符(
 
),第二个具有制表符空白字符 (
	
) 并且第三个 <p>
具有空格不间断空白字符 ( 代码>).
this sample xml, first <p>
has space whitespace character ( 
), second <p>
has tab whitespace whitespace character (	
) and third <p>
has space non-breaking whitespace character ( 
).
我需要删除在结束标记之前出现的任何空格.
I need to remove the any white spaces appearing just before closing tag.
所以,预期的输出应该是,
So, expected output should be,
<doc>
<p>text1 text2</p>
<p>text1 text2</p>
<p>text1 text2</p>
</doc>
通过使用 xslt normalize-space() 我可以删除不必要的空格和制表符,但不能删除不间断的空白字符.
By using xslt normalize-space() I can remove unnecessary spaces and tab characters but not non-breaking whitespace characters.
<xsl:template match="p/text()">
<xsl:value-of select="normalize-space()"/>
</xsl:template>
有什么建议可以规范化所有空格,包括 xslt 中的不间断空格?
Any suggestions how can I normalize all white spaces including non-breaking white spaces in xslt?
推荐答案
您可以:
<xsl:value-of select="normalize-space(translate(., ' ', ' '))"/>
这将适用于 XSLT 1.0 和 2.0.
This will work in XSLT 1.0 and 2.0 alike.
在 XSLT 2.0 中,您还可以使用正则表达式 - 例如:
In XSLT 2.0, you could also use regex - for example:
<xsl:value-of select="replace(., '[\t\p{Zs}]', '')"/>
将删除水平制表符以及Unicode Space_Separator
类别中的任何字符,其中不仅包括空格和不间断空格字符,还包括其他空格字符.文档很难找到,但我相信这是目前的完整列表:(摘自 http://www.unicode.org/Public/UNIDATA/UnicodeData.txt):
will remove the horizontal tab character as well as any character in the Unicode Space_Separator
category, which includes not only the space and non-breaking space characters but also other space characters. Documentation is hard to find, but I believe this is currently the complete list: (extracted from http://www.unicode.org/Public/UNIDATA/UnicodeData.txt):
  SPACE
  NO-BREAK SPACE
  OGHAM SPACE MARK
  EN QUAD
  EM QUAD
  EN SPACE
  EM SPACE
  THREE-PER-EM SPACE
  FOUR-PER-EM SPACE
  SIX-PER-EM SPACE
  FIGURE SPACE
  PUNCTUATION SPACE
  THIN SPACE
  HAIR SPACE
  NARROW NO-BREAK SPACE
  MEDIUM MATHEMATICAL SPACE
  IDEOGRAPHIC SPACE
𐲰 OLD HUNGARIAN CAPITAL LETTER EZS
𐳰 OLD HUNGARIAN SMALL LETTER EZS
𖼶 MIAO LETTER ZSHA
𖼼 MIAO LETTER ZSA
𖼾 MIAO LETTER ZZSA
𖽁 MIAO LETTER ZZSYA
但是,使用 Saxon 9.5 进行的测试表明无法识别最后 6 个字符:http://xsltransform.net/ncntCSo
However, testing with Saxon 9.5 shows that the last 6 characters are not recognized: http://xsltransform.net/ncntCSo
这篇关于XSLT-标准化不间断的空白字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!