问题描述
我有一个看起来像这样的文件:
I have a file that somewhat looks like this:
{1:F195}{2:O5350646}{3:{1028:076}}{4:
:16R:GL
:16R:ADD
:19A::P//U9,1
:16S:AFO
-}{5:{MAC:00}{CHK:1C}}{S:{SAC:}{COP:S}{MAN:P2}}${1:33339}{2:O53}{4:
:16S:G
:16R:A
:19A::H0,
:19A::H0,
:16S:ADDINFO
-}{5:{MAC:0}{CHK:4}}{S:{SAC:}{COP:S}{MAN:GP2}}
现在我想根据分隔符 $
将这个单个文件拆分为两个文件,然后也删除分隔符.任何帮助将不胜感激:)
Now I want to split this single file into two files based on the delimiter $
and then remove the delimiter also. Any help would be greatly appreciated :)
我使用了以下逻辑:
- 首先在每次出现
$
时换行. - 我可以创建多个文件,但这些文件有分隔符.
代码:
FILE=test.dat
sed 's/\$/\n&/g' $FILE > Inter_$FILE
FILE=Inter_$FILE
cat $FILE | while read line
do
sleep 1
FormattedDate=`date +%Y%m%d%H%M%S`
Final_FILE=New_${FormattedDate}_$FILE
echo "line --- $line"
echo "FormattedDate --- $FormattedDate"
Line_Check=`echo $line | tr '$' '@' | cut -c1`
##Line_Check=`sed -e 's/\$/@/g' $line | cut -c1`
echo "Line_Check --- $Line_Check"
echo "Final_FILE --- $Final_FILE"
if [ "$Line_Check" = "@" ]
then
Final_FILE=New_$FormattedDate_$FILE
FILE=$Final_FILE
echo "FOUND In --- $line"
echo "FILE --->>> $FILE"
else
FILE=$Final_FILE
echo "FILE --->>> $FILE"
###`echo $line | cut -c2-` >>
###cat $line` >> $FILE
###Filter_Line=`echo $line`
###echo "Filter_Line --- $Filter_Line"
fi
echo $line >> $FILE
###sed 's/^@//' $FILE > 3_$FILE
done
sed 's/^\$//' $FILE >> Final_$FILE;
推荐答案
我认为您可能正在尝试重新发明轮子.awk
是一个很棒的工具,可用于在分隔符上拆分文件并执行其他文本处理.您可能想尝试以下操作:
I think you may be trying to reinvent the wheel. awk
is a great tool that can be used to split files on delimiters and perform other text processing. You may like to try the following:
awk '{ for(i=1;i<=NF;i++) print $i > "file_" i ".txt" }' RS= FS='\\$' file
结果:
file_1.txt
的内容:
{1:F195}{2:O5350646}{3:{1028:076}}{4:
:16R:GL
:16R:ADD
:19A::P//U9,1
:16S:AFO
-}{5:{MAC:00}{CHK:1C}}{S:{SAC:}{COP:S}{MAN:P2}}
file_2.txt
的内容:
{1:33339}{2:O53}{4:
:16S:G
:16R:A
:19A::H0,
:19A::H0,
:16S:ADDINFO
-}{5:{MAC:0}{CHK:4}}{S:{SAC:}{COP:S}{MAN:GP2}}
说明:
将记录分隔符设置为空,这会将 awk
置于段落模式"(默认情况下,RS
设置为 "\n"
,它可以逐行处理).由于您的文件看起来不像包含段落,因此这实际上会将您的文件视为单个记录.然后我们将字段分隔符设置为美元符号字符(需要转义).因此,对于每条记录(并且应该只有一条记录),我们遍历每个字段(NF
是 Number of Fields 的缩写)并使用迭代器将其打印到文件中.值得注意的是,如果您的输入包含多个段落,您会得到奇怪的结果.与格伦在上面/下面的回答相比,他的解决方案不会有这个问题,但它处理的最后一个文件将包含一个尾随换行符.哈.
Set the Record Separator to null, which puts awk
in 'paragraph mode' (by default RS
is set to "\n"
, which enables line-by-line processing). Since your file doesn't look like it contains paragraphs, this will essentially treat your file as a single record. We then set the Field Separator to a dollar-sign character (which needs to be escaped). So for each record (and there should only be one record) we loop over each field (NF
is short for Number of Fields) and print it to a file using the iterator. It's worthwhile noting that you will get strange results if your input contains multiple paragraphs. In comparison with Glenn's answer above/below, his solution won't have this problem, but the last file it processes will contain a trailing newline. HTH.
这篇关于如何根据分隔符将文件拆分为多个文件,并在 Unix 中删除分隔符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!