问题描述
我已经定义了一个表格:
I have defined a table as such:
create external table PageViews (Userid string, Page_View string)
partitioned by (ds string)
row format as delimited fields terminated by ','
stored as textfile location '/user/data';
我不希望/user/data 目录中的所有文件都用作表的一部分.我可以执行以下操作吗?
I do not want all the files in the /user/data directory to be used as part of the table. Is it possible for me to do the following?
location 'user/data/*.csv'
推荐答案
当我有一个类似的问题需要解决时,我遇到了这个线程.我能够通过使用自定义 SerDe 来解决它.然后我添加了 SerDe 属性,这些属性指导将 RegEx 应用于任何特定表的文件名模式.
I came across this thread when I had a similar problem to solve. I was able to resolve it by using a custom SerDe. I then added SerDe properties which guided what RegEx to apply to the file name patterns for any particular table.
如果您只处理标准 CSV 文件,自定义 SerDe 可能看起来有点过分,我有一个更复杂的文件格式需要处理.如果您不回避编写一些 Java,这仍然是一个非常可行的解决方案.当您无法重构存储位置中的数据并且您正在不成比例的大文件集中寻找非常特定的文件模式时,它特别有用.
A custom SerDe might seem overkill if you are only dealing with standard CSV files, I had a more complex file format to deal with. Still this is a very viable solution if you don't shy away from writing some Java. It is particularly useful when you are unable to restructure the data in your storage location and you are looking for a very specific file pattern among a disproportionately large file set.
> CREATE EXTERNAL TABLE PageViews (Userid string, Page_View string)
> ROW FORMAT SERDE 'com.something.MySimpleSerDe'
> WITH SERDEPROPERTIES ( "input.regex" = "*.csv")
> LOCATION '/user/data';
这篇关于在 hive 中创建外部表时,我可以将位置指向目录中的特定文件吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!