pk哥你好:
我遇到的问题是,当我想用IEDA访问云端的HDFS,得到如下的错误:
0/06/05 16:59:14 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, ANY, 7978 bytes)
20/06/05 16:59:14 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
20/06/05 16:59:14 INFO rdd.WholeTextFileRDD: Input split: Paths:/test/test-access.log:0+2715
20/06/05 17:00:14 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader.
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/172.31.69.xxx:50010]
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3590)
at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:849)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:764)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:377)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:666)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:904)
网上搜索之后发现,是因为虚拟机和IDEA的主机之间通过外网ip传递信息,但是当想要或许HDFS中datanode的数据时,namenode传递给IDEA主机上面的是datanode的内网ip,导致无法访问,我做的尝试:
我根据一些帖子的指导做了相应的配置改变,hdfs-site.xml
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>true</value>
</property>
类似于虚拟机上的HDSF用域名传递而非ip传递给主机上的IDEA,并且主机也配置好了映射:
18.207.78.xxx hadoop000 //(公共ip,idea可访问)
虚拟机配置映射:
172.31.69.xxx hadoop000 //(内网ip)
因为是伪分布,按照你的视频,datanode的 hostname 也是叫hadoop000.
运行结果还是报同样的错,IDEA还是尝试解析内网ip导致超时,希望您能帮忙看看。