问题描述
我要寻找一种方式来突出两个字符串之间的差异。的想法是要表明,在一个终端,通过的iconv被改变的字符。两个字符串都已经处理,除去开头和结尾的空格,但是内部空间必须进行处理。
I am looking for a way to highlight the differences between 2 strings. The idea is to show, in a terminal, what characters were changed by iconv. Both strings are already processed to remove leading and trailing spaces, but internal spaces must be handled.
RED="$(tput setaf 1)" ## Short variables for the tput ->
CYA="$(tput setaf 6)" ## -> commands to make output strings ->
CLS="$(tput sgr0)" ## -> easier to read
str1="[String nâmè™]" # String prior to iconv
str2="[String name[tm]]" # String after iconv -f utf-8 -t ascii//translit
最后,我要那么它们被色输出codeS,我可以回显到终端包围自动化的差异格式。
Ultimately I want to automate the formatting of the differences so they are surrounded by tput color codes that I can echo to the terminal.
$ {STR1}
=以红色突出显示,字符不常见到两个字符串
${str1}
= Highlight in red, characters not common to both strings
$ {STR2}
=突出显示的青色,人物不常见到两个字符串
${str2}
= Highlight in cyan, characters not common to both strings
诚征输出:
output1="[String n${RED}â${CLS}m${RED}è™${CLS}]"
output2="[String n${CYA}a${CLS}m${CYA}e[tm]${CLS}]"
大多数差异公用事业我看着就行或字级的工作。我想解析CMP的第一个差异的字节#输出,但我将不得不重新解析它似乎多重分歧。
Most diff utilities I looked at work on the line or word level. I was thinking of parsing the output of cmp for the byte# of the first diff, but I would have to re-parse for multiple differences it seems.
无论如何,我想想,好像这将是一个复杂的过程,所以我只是想确保我不会错过一个显而易见的解决方案或工具。
Anyway I think about it, it seems like it going to be an involved process so I just want to make sure I'm not missing an obvious solution or tool.
现在我想的最简单的方法就是格式化每个字符串把一个单字节上一个新行,然后我的选项打开。
Right now I'm thinking the easiest way would be to format each string to put a single byte on a new line and then my options open up.
nstr1="$(fold -w1 <<< "$(echo "${str1}")")"
nstr2="$(fold -w1 <<< "$(echo "${str2}")")"
diff <(echo -e "${nstr1}") <(echo -e "${nstr2}")
这是据我得到了,不想再进一步,除非我是在正确的轨道上。我敢肯定有无数个方法可以做到这一点,但有一个更有效的方法去吗?
This is as far as i got and didn't want to go further unless I was on the right track. I'm certain there is a zillion ways to do this but is there a more efficient way to go here?
推荐答案
要放在一起:
#!/usr/bin/env bash
# Using stdin input, outputs each char. on its own line, with actual newlines
# in the input represented as literal '\n'.
toSingleCharLines() {
sed 's/\(.\)/\1\'$'\n''/g; s/\n$/\'$'\n''\\n/'
}
# Using stdin input, reassembles a string split into 1-character-per-line output
# by toSingleCharLines().
fromSingleCharLines() {
awk '$0=="\\n" { printf "\n"; next} { printf "%s", $0 }'
}
# Prints a colored string read from stdin by interpreting embedded color references such
# as '${RED}'.
printColored() {
local str=$(</dev/stdin)
local RED="$(tput setaf 1)" CYA="$(tput setaf 6)" RST="$(tput sgr0)"
str=${str//'${RED}'/${RED}}
str=${str//'${CYA}'/${CYA}}
str=${str//'${RST}'/${RST}}
printf '%s\n' "$str"
}
# The non-ASCII input string.
strOrg='[String nâmè™]'
# Create its ASCII-chars.-only transliteration.
strTransLit=$(iconv -f utf-8 -t ascii//translit <<<"$strOrg")
# Print the ORIGINAL string with the characters that NEED transliteration
# highlighted in RED.
diff --changed-group-format='${RED}%=${RST}' \
<(toSingleCharLines <<<"$strOrg") <(toSingleCharLines <<<"$strTransLit") |
fromSingleCharLines | printColored
# Print the TRANSLITERATED string with the characters that RESULT FROM
# transliteration highlighted in CYAN.
diff --changed-group-format='${CYA}%=${RST}' \
<(toSingleCharLines <<<"$strTransLit") <(toSingleCharLines <<<"$strOrg") |
fromSingleCharLines | printColored
这产生:
这篇关于突出显示字符串的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!