本文介绍了使用bash shell脚本从文件中查找和提取特定字符串后的值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含以下详细信息的文件:文件.txt

I have a file which contains below details :file.txt

+----------------------------------------------------+
|                   createtab_stmt                   |
+----------------------------------------------------+
| CREATE EXTERNAL TABLE `dv.par_kst`( |
|   `col1` string,                                   |
|   `col2` string,                                   |
|   `col3` int,                                      |
|   `col4` int,                                      |
|   `col5` string,                                   |
|   `col6` float,                                    |
|   `col7` int,                                      |
|   `col8` string,                                   |
|   `col9` string,                                   |
|   `col10` int,                                     |
|   `col11` int,                                     |
|   `col12` string,                                  |
|   `col13` float,                                   |
|   `col14` string,                                  |
|   `col15` string)                                  |
| PARTITIONED BY (                                   |
|   `part_col1` int,                                 |
|   `part_col2` int)                                 |
| ROW FORMAT SERDE                                   |
|   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  |
| STORED AS INPUTFORMAT                              |
|   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  |
| OUTPUTFORMAT                                       |
|   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' |
| LOCATION                                           |
|   'hdfs://nameservicets1/dv/hdfsdata/par_kst' |
| TBLPROPERTIES (                                    |
|   'spark.sql.create.version'='2.2 or prior',       |
|   'spark.sql.sources.schema.numPartCols'='2',      |
|   'spark.sql.sources.schema.numParts'='1',         |
|   'spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"col1","type":"string","nullable":true,"metadata":{}},{"name":"col2","type":"string","nullable":true,"metadata":{}},{"name":"col3","type":"integer","nullable":true,"metadata":{}},{"name":"col4","type":"integer","nullable":true,"metadata":{}},{"name":"col5","type":"string","nullable":true,"metadata":{}},{"name":"col6","type":"float","nullable":true,"metadata":{}},{"name":"col7","type":"integer","nullable":true,"metadata":{}},{"name":"col8","type":"string","nullable":true,"metadata":{}},{"name":"col9","type":"string","nullable":true,"metadata":{}},{"name":"col10","type":"integer","nullable":true,"metadata":{}},{"name":"col11","type":"integer","nullable":true,"metadata":{}},{"name":"col12","type":"string","nullable":true,"metadata":{}},{"name":"col13","type":"float","nullable":true,"metadata":{}},{"name":"col14","type":"string","nullable":true,"metadata":{}},{"name":"col15","type":"string","nullable":true,"metadata":{}},{"name":"part_col1","type":"integer","nullable":true,"metadata":{}},{"name":"part_col2","type":"integer","nullable":true,"metadata":{}}]}',  |
|   'spark.sql.sources.schema.partCol.0'='part_col1',  |
|   'spark.sql.sources.schema.partCol.1'='part_col2',  |
|   'transient_lastDdlTime'='1587487456')            |
+----------------------------------------------------+

我想从上面的文件中提取 PARTITIONED BY 详细信息.

from above file I want to extract PARTITIONED BY details.

Desired output :

part_col1 , part_col2

而且这些 PARTITIONED BY 不是固定的,这意味着对于其他一些文件,它可能包含 3 个或更多,所以我想提取所有的 PARTITIONED BY.

and these PARTITIONED BY is not fixed , means for some other file it might contains 3 or more , so I want extract all the PARTITIONED BY.

PARTITIONED BY 和 ROW FORMAT SERDE 之间的所有值,去掉空格`"和数据类型!

All the values between PARTITIONED BY and ROW FORMAT SERDE , removing spaces "`" and data types!

你能帮我解决这个问题吗?

Could you please help me with this ?

推荐答案

当结果的布局无关紧要时,您可以要求 sed 考虑开始和结束标记之间的行,并且仅在可以在 2 个反引号之间找到字段时才打印这样的行.

When the layout of your result doesn't matter, you can ask sed to consider lines between a start and an end tag, and only print such a line when a field can be found between 2 backquotes.

sed -rn '/PARTITIONED BY/,/ROW FORMAT/s/.*`(.*)`.*/\1/p' file.txt

可以根据需要将结果组合成一行

Combining the results in a line as desired can be done with

printf "%s , " $(sed -rn '/PARTITIONED BY/,/ROW FORMAT/s/.*`(.*)`.*/\1 /p' file.txt) |
   sed 's/ , $/\n/'

这篇关于使用bash shell脚本从文件中查找和提取特定字符串后的值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-08 20:31