请稍等 ...
×

采纳答案成功!

向帮助你的同学说点啥吧!感谢那些助人为乐的人

IDEA 无法访问远程虚拟机的datanode拿到数据

pk哥你好:
我遇到的问题是,当我想用IEDA访问云端的HDFS,得到如下的错误:

0/06/05 16:59:14 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, ANY, 7978 bytes)
20/06/05 16:59:14 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
20/06/05 16:59:14 INFO rdd.WholeTextFileRDD: Input split: Paths:/test/test-access.log:0+2715
20/06/05 17:00:14 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader.
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/172.31.69.xxx:50010]
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3590)
at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:849)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:764)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:377)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:666)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:904)

网上搜索之后发现,是因为虚拟机和IDEA的主机之间通过外网ip传递信息,但是当想要或许HDFS中datanode的数据时,namenode传递给IDEA主机上面的是datanode的内网ip,导致无法访问,我做的尝试:
我根据一些帖子的指导做了相应的配置改变,hdfs-site.xml

<property>
        <name>dfs.client.use.datanode.hostname</name>
        <value>true</value>
</property>

<property>
        <name>dfs.datanode.use.datanode.hostname</name>
        <value>true</value>
</property>

类似于虚拟机上的HDSF用域名传递而非ip传递给主机上的IDEA,并且主机也配置好了映射:
18.207.78.xxx hadoop000 //(公共ip,idea可访问)

虚拟机配置映射:
172.31.69.xxx hadoop000 //(内网ip)

因为是伪分布,按照你的视频,datanode的 hostname 也是叫hadoop000.

运行结果还是报同样的错,IDEA还是尝试解析内网ip导致超时,希望您能帮忙看看。

正在回答 回答被采纳积分+3

2回答

提问者 weixin_慕妹8043461 2020-06-07 22:40:23

更新报错:


20/06/07 10:33:14 INFO datasources.FileScanRDD: Reading File path: hdfs://hadoop000:8020/test/drivers.csv, range: 0-1997, partition values: [empty row]

20/06/07 10:33:14 INFO codegen.CodeGenerator: Code generated in 7.95912 ms

20/06/07 10:34:15 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader.

org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/172.31.69.197:50010](还是能接收到内网ip并且尝试链接的)

at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)

at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3590)

at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:849)

at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:764)

at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:377)

at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:666)

at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:904)

at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:963)

at java.io.DataInputStream.read(DataInputStream.java:149)

at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:62)

at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)

at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)


太长省略一些。。。。

0/06/07 10:34:15 WARN hdfs.DFSClient: Failed to connect to /172.31.69.197:50010 for block BP-536449180-172.31.69.197-1591536838863:blk_1073741825_1001, add to deadNodes and continue. 


省略一些。。。。。。。。。。。。。。。


20/06/07 10:34:15 INFO hdfs.DFSClient: Could not obtain BP-536449180-172.31.69.197-1591536838863:blk_1073741825_1001 from any node:  No live nodes contain current block Block locations: DatanodeInfoWithStorage[172.31.69.197:50010,DS-84a95ba2-e53a-41b2-8224-0d38ad969b24,DISK] Dead nodes:  DatanodeInfoWithStorage[172.31.69.197:50010,DS-84a95ba2-e53a-41b2-8224-0d38ad969b24,DISK]. Will get new block locations from namenode and retry...

20/06/07 10:34:15 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 892.3526112545212 msec.




0 回复 有任何疑惑可以回复我~
  • 云主机内部配置的是内网ip,外面访问过去是公网ip
    回复 有任何疑惑可以回复我~ 2020-06-08 10:00:40
Michael_PK 2020-06-06 12:43:40

你是云主机那,那你的代码需要设置dfs.client.use.datanode.hostname是true设置到Configuration中

0 回复 有任何疑惑可以回复我~
  • 提问者 weixin_慕妹8043461 #1
    添加了参数看报错还是访问的是 云主机内网的ip 真的不知道怎么改了
    回复 有任何疑惑可以回复我~ 2020-06-07 21:03:38
  • Michael_PK 回复 提问者 weixin_慕妹8043461 #2
    你把你设置参数的代码贴我看下
    回复 有任何疑惑可以回复我~ 2020-06-07 22:09:26
  • 提问者 weixin_慕妹8043461 回复 Michael_PK #3
    我写了一个test,读取hdfs上面的一个我上传的csv文件做测试,代码如下:
    
    package com.imooc.bigdata.spark
    
    import org.apache.hadoop.conf.Configuration
    import org.apache.spark.sql.SparkSession
    
    object Testapp {
    
    
        def main(args: Array[String]): Unit = {
          val conf= new Configuration()
          conf.set("dfs.client.use.datanode.hostname", "true")
          conf.set("HADOOP_USER_NAME", "yanzhao")
          val input= "hdfs://hadoop000:8020/test/*"
          val spark=SparkSession.builder().appName("Testapp").master("local[2]").getOrCreate()
          val logdf=spark.read.format("csv")
              .option("path",input)
              .load()
    
          logdf.show()
          spark.stop()
    
        }
    
      }
    回复 有任何疑惑可以回复我~ 2020-06-07 22:30:35
问题已解决,确定采纳
还有疑问,暂不采纳
意见反馈 帮助中心 APP下载
官方微信