如何修复“文件只能复制到 0 个节点而不是 minReplication (=1)"?

本文介绍了如何修复“文件只能复制到 0 个节点而不是 minReplication (=1)"?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在所有执行引擎上都试过了:spark、tez、mr.

请不要提出说我需要格式化 namenode 的解决方案，因为它们不起作用，而且它们无论如何都不是解决方案.

更新:

在查看 namenode 的日志后，我注意到了这一点，如果有帮助的话:

未能放置足够的副本，仍然需要 1 个才能达到 1 (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) 所有需要的存储类型都不可用:unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}

这些日志表明:

有关更多信息，请启用 DEBUG 日志级别org.apache.hadoop.hdfs.ser ver.blockmanagement.BlockPlacementPolicy和 org.apache.hadoop.net.NetworkTopology

我该怎么做?

我也注意到这里有一个类似的未解决的帖子:

解决方案

我最终联系了 cloudera 论坛，他们在几分钟内回答了我的问题:http://community.cloudera.com/t5/Storage-Random-Access-HDFS/Why-can-tI-partition-a-1-gigabyte-dataset-into-300/mp/86554#M3981工作完美！

这是他说的:

如果您正在处理来自数据源的无序分区，您最终可能会并行创建大量文件，因为分区是尝试过.

在 HDFS 中，当一个文件(或更具体地说，它的块)打开时，DataNode 对其目标块大小执行逻辑保留.所以如果您配置的块大小为 128 MiB，则每个并发打开块将从可用的剩余中扣除该值(逻辑上)DataNode 发布到 NameNode 的空间.

进行此预订是为了帮助管理空间并保证完整的阻止写入客户端，以便开始写入其的客户端文件永远不会在中途遇到空间不足异常.

注意:文件关闭时，只持久化实际长度，并调整了预订计算以反映实际情况已用和可用空间.然而，当文件块保持打开状态时，它总是被认为持有完整的块大小.

如果可以，NameNode 将进一步只选择一个 DataNode 进行写入保证完整的目标块大小.它会忽略它的任何 DataNodes认为(根据其报告的值和指标)不适合请求写入的参数.您的错误表明 NameNode 有尝试分配时停止考虑您唯一的活动 DataNode新的区块请求.

例如，70 GiB 的可用空间将被证明是不够的，如果将有超过 560 个并发、打开的文件(70 GiB 分割成 128 MiB 块大小).因此 DataNode 将在大约 560 个打开文件的点，将不再作为有效目标用于进一步的文件请求.

根据您对插入内容的描述，这很可能，因为数据集的 300 个块中的每一个可能仍然带有不同的 ID，导致每个并行任务请求大量打开文件，例如插入几个不同的分区.

你可以通过减少请求块来解决这个问题查询中的大小(例如，将 dfs.blocksize 设置为 8 MiB)，影响预留计算.然而，这可能不是一个当您扩展时，对于更大的数据集是个好主意，因为它会提高file:block 计数并增加 NameNode 的内存成本.

解决此问题的更好方法是执行预分区插入(首先按分区排序，然后插入已分区的方式).例如，Hive 提供了这个选项:hive.optimize.sort.dynamic.partition，如果你使用普通的 Spark或 MapReduce 那么他们的默认分区策略完全正确这个.

所以，在一天结束时，我做了 set hive.optimize.sort.dynamic.partition=true; 并且一切都开始工作了.但我也做了另一件事.

这是我之前在调查这个问题时的一篇帖子:为什么我得到文件只能复制到 0 个节点"?写入分区表时? 我遇到了 hive 无法对我的数据集进行分区的问题，因为 hive.exec.max.dynamic.partitions 设置为 100，所以，我用谷歌搜索了这个问题，在 hortonworks 论坛的某个地方我看到了一个答案，说我应该这样做:

SET hive.exec.max.dynamic.partitions=100000;设置 hive.exec.max.dynamic.partitions.pernode=100000;

这是另一个问题，也许 hive 会尝试打开与您设置的 hive.exec.max.dynamic.partitions 一样多的那些并发连接，所以我的 insert 查询直到我将这些值减少到 500 后才开始工作.

I asked a similar question a while ago, and thought I solved this problem, but it turned out that it went away simply because I was working on a smaller dataset.

Numerous people have asked this question and I have gone through every single internet post that I could find and still didn't make any progress.

What I'm trying to do is this:I have an external table browserdata in hive that refers to about 1 gigabyte of data.I try to stick that data into a partitioned table partbrowserdata, whose definition goes like this:

CREATE EXTERNAL TABLE IF NOT EXISTS partbrowserdata (
    BidID string,
    Timestamp_ string,
    iPinYouID string,
    UserAgent string,
    IP string,
    RegionID int,
    AdExchange int,
    Domain string,
    URL string,
    AnonymousURL string,
    AdSlotID string,
    AdSlotWidth int,
    AdSlotHeight int,
    AdSlotVisibility string,
    AdSlotFormat string,
    AdSlotFloorPrice decimal,
    CreativeID string,
    BiddingPrice decimal,
    AdvertiserID string,
    UserProfileIDs array<string>
)
PARTITIONED BY (CityID int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '	'
STORED AS TEXTFILE
LOCATION '/user/maria_dev/data2';

with this query:

insert into table partbrowserdata partition(cityid)
select BidID,Timestamp_ ,iPinYouID ,UserAgent ,IP ,RegionID ,AdExchange ,Domain ,URL ,AnonymousURL ,AdSlotID ,AdSlotWidth ,AdSlotHeight ,AdSlotVisibility ,AdSlotFormat ,AdSlotFloorPrice ,CreativeID ,BiddingPrice ,AdvertiserID ,UserProfileIDs ,CityID
from browserdata;

And every time, on every platform, be it hortonworks or cloudera, I get this message:

Caused by:

org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/maria_dev/data2/.hive-staging_hive_2019-02-06_18-58-39_333_7627883726303986643-1/_task_tmp.-ext-10000/cityid=219/_tmp.000000_3 could only be replicated to 0 nodes instead of minReplication (=1).  There are 4 datanode(s) running and no node(s) are excluded in this operation.
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1720)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3389)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:683)
        at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:214)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:495)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)

        at org.apache.hadoop.ipc.Client.call(Client.java:1504)
        at org.apache.hadoop.ipc.Client.call(Client.java:1441)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
        at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:413)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
        at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1814)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1610)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:773)

What do I do? I can't understand why this is happening. It does seem like a memory issue though, because I am able to insert a few rows, but not all of them for some reason. Note that I have plenty of memory on HDFS, so 1 gig of extra data is pennies on a dollar, so it's probably a RAM issue?

Here's my dfs report output:

I have tried this on all execution engines: spark, tez, mr.

Please do not suggest solutions that say that I need to format the namenode, because they do not work, and they are not solutions in any way.

update:

After looking at logs for namenode I noticed this, if it helps:

Failed to place enough replicas, still in need of 1 to reach 1 (unavailableStorages=[DISK ], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All required storage types are unavailable: unavailableStorages=[DISK], stor agePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}

These logs suggest this:

How do I do that?

I also noticed a similar unresolved post on here:

HDP 2.2@Linux/CentOS@OracleVM (Hortonworks) fails on remote submission from Eclipse@Windows

update 2:

I just tried partitioning this with spark, and it works! So, this must be a hive bug...

update 3:

Just tested this on MapR and it worked, but MapR doesn't use HDFS. This is is definitely some sort of HDFS + Hive combination bug.

Proof:

解决方案

I ended up reaching out to cloudera forums and they answered my question in a matter of minutes: http://community.cloudera.com/t5/Storage-Random-Access-HDFS/Why-can-t-I-partition-a-1-gigabyte-dataset-into-300/m-p/86554#M3981 I tried what Harsh J suggests and it worked perfectly!

Here's what he said:

So, at the end of the day I did set hive.optimize.sort.dynamic.partition=true; and everything started working. But I also did another thing.

Here's one of my posts from earlier as I was investigating this issue: Why do I get "File could only be replicated to 0 nodes" when writing to a partitioned table? I was running into a problem where hive couldn't partition my dataset, because hive.exec.max.dynamic.partitions was set to 100, so, I googled this issue and somewhere on hortonworks forums I saw an answer, saying that I should just do this:

SET hive.exec.max.dynamic.partitions=100000;
SET hive.exec.max.dynamic.partitions.pernode=100000;

This was another problem, maybe hive tries to open as many of those concurrent connections as you set hive.exec.max.dynamic.partitions, so my insert query didn't start working until I decreased these values to 500.

这篇关于如何修复“文件只能复制到 0 个节点而不是 minReplication (=1)"?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！