ChunJun集成CDH

发布于 2022-11-22  160 次阅读


编译

环境

  • Hadoop 3.0.0-cdh3.6.1
  • Hive 2.1.1-cdh3.6.1

建议

使用mvn clean install -DskipTests 直接编译安装,方便后续修改代码打包模块

可能遇到的问题

FileNotFoundException: File file:/root/.flink/application_xxxxxxxxxxx/

原因: 这是由于cdh版本的hadoop和开源hadoop的配置文件位置不同导致的,在chunjun的启动脚本submit.sh中获取hadoop配置文件是直接使用$HADOOP_HOME/etc/hadoop

# if HADOOP_HOME is not set or not a directory, ignore hadoopConfDir parameter
if [ ! -z $HADOOP_HOME ] && [ -d $HADOOP_HOME ];
  then
    echo "HADOOP_HOME is $HADOOP_HOME"
    PARAMS="$PARAMS -hadoopConfDir $HADOOP_HOME/etc/hadoop"
  else
    echo "HADOOP_HOME is empty!"
fi

解决方案: 启动命令加上-hadoopConfDir $HADOOP_CONF_DIR 或 直接修改chunjun启动脚本

No FileSystem for scheme "hdfs"

需要在$FLINK_HOME/lib/目录下加入flink-shaded-hadoop-x jar包,下载地址如下

Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{use:database=default})

原因: hive版本依赖冲突

解决方案:

最外层pom文件中添加cdh的仓库地址

<!--仓库中添加cdh的maven地址-->
<repositories>
	<repository>
		<id>cloudera-repo-releases</id>
		<url>https://repository.cloudera.com/artifactory/public/</url>
		<releases>
			<enabled>true</enabled>
		</releases>
		<snapshots>
			<enabled>false</enabled>
		</snapshots>
	</repository>
</repositories>

在chunjun-connector-hive模块的pom文件中修改hive版本

<!--修改hive版本-->
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>2.1.1-cdh6.3.1</version>

<groupId>org.apache.hive</groupId>
<artifactId>hive-serde</artifactId>
<version>2.1.1-cdh6.3.1</version>

<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>2.1.1-cdh6.3.1</version>

在chunjun-connector-hdfs模块的pom文件中修改hive版本

<hive.version>2.1.1-cdh6.3.1</hive.version>

cannot be cast to com.google.protobuf.Message

原因: hadoop版本依赖冲突

解决方案: 排除hadoop-common和hadoop-client依赖

最外层pom文件中注释以下代码(否则修改代码会设计认证问题 无法打包)

<!--<licenses>-->
<!--	<license>-->
<!--		<name>The Apache Software License, Version 2.0</name>-->
<!--		<url>https://www.apache.org/licenses/LICENSE-2.0.txt</url>-->
<!--		<distribution>repo</distribution>-->
<!--	</license>-->
<!--</licenses>-->

<!--<plugin>-->
<!--	<groupId>com.diffplug.spotless</groupId>-->
<!--	<artifactId>spotless-maven-plugin</artifactId>-->
<!--</plugin>-->

<!--<plugin>-->
<!--	<groupId>com.diffplug.spotless</groupId>-->
<!--	<artifactId>spotless-maven-plugin</artifactId>-->
<!--	<version>2.4.2</version>-->
<!--	<configuration>-->
<!--		<java>-->
<!--			<googleJavaFormat>-->
<!--				<version>1.7</version>-->
<!--				<style>AOSP</style>-->
<!--			</googleJavaFormat>-->

<!--			&lt;!&ndash; \# refers to the static imports &ndash;&gt;-->
<!--			<importOrder>-->
<!--				<order>com.dtstack,org.apache.flink,org.apache.flink.shaded,,javax,java,scala,\#</order>-->
<!--			</importOrder>-->

<!--			<removeUnusedImports/>-->
<!--		</java>-->
<!--	</configuration>-->
<!--	<executions>-->
<!--		<execution>-->
<!--			<id>spotless-check</id>-->
<!--			<phase>validate</phase>-->
<!--			<goals>-->
<!--				<goal>check</goal>-->
<!--			</goals>-->
<!--		</execution>-->
<!--	</executions>-->
<!--</plugin>-->

在chunjun-connector-hive模块的pom文件中排除hadoop依赖

<groupId>com.dtstack.chunjun</groupId>
<artifactId>chunjun-connector-hdfs</artifactId>
<version>${project.version}</version>
<exclusion>
	<artifactId>hive-serde</artifactId>
	<groupId>org.apache.hive</groupId>
</exclusion>
<exclusion>
	<artifactId>hive-exec</artifactId>
	<groupId>org.apache.hive</groupId>
</exclusion>
<exclusion>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-hdfs-client</artifactId>
</exclusion>
<exclusion>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-common</artifactId>
</exclusion>

<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>2.1.1-cdh6.3.1</version>
<exclusion>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-hdfs</artifactId>
</exclusion>
<exclusion>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-common</artifactId>
</exclusion>
<exclusion>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-hdfs-client</artifactId>
</exclusion>
<exclusion>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-client</artifactId>
</exclusion>
<exclusion>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-yarn-registry</artifactId>
</exclusion>

<groupId>org.apache.hive</groupId>
<artifactId>hive-serde</artifactId>
<version>2.1.1-cdh6.3.1</version>
<exclusion>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-hdfs</artifactId>
</exclusion>
<exclusion>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-common</artifactId>
</exclusion>
<exclusion>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-hdfs-client</artifactId>
</exclusion>

<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>2.1.1-cdh6.3.1</version>
<exclusion>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-hdfs</artifactId>
</exclusion>
<exclusion>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-common</artifactId>
</exclusion>
<exclusion>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-hdfs-client</artifactId>
</exclusion>
<exclusion>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-yarn-server-resourcemanager</artifactId>
</exclusion>

在chunjun-connector-hdfs模块的pom文件中排除hadoop依赖(包含在hive-common中)

<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>${hive.version}</version>
<exclusion>
	<artifactId>hive-common</artifactId>
	<groupId>org.apache.hive</groupId>
</exclusion>

<groupId>org.apache.hive</groupId>
<artifactId>hive-serde</artifactId>
<version>${hive.version}</version>
 <exclusion>
	<artifactId>hive-common</artifactId>
	<groupId>org.apache.hive</groupId>
</exclusion>

Invalid signature file digest for Manifest main attributes

原因: META-INF中的文件没有排除干净 涉及认证问题

解决方案: 在chunjun-connector-hdfs模块的pom文件中修改成以下代码

<plugin>
	<groupId>org.apache.maven.plugins</groupId>
	<artifactId>maven-shade-plugin</artifactId>
	<executions>
		<execution>
			<phase>package</phase>
			<goals>
				<goal>shade</goal>
			</goals>
			<configuration>
				<artifactSet>
					<excludes>
						<exclude>org.slf4j:slf4j-api</exclude>
						<exclude>log4j:log4j</exclude>
						<exclude>ch.qos.logback:*</exclude>
					</excludes>
				</artifactSet>
				<filters>
					<filter>
						<artifact>*:*</artifact>
						<excludes>
							<exclude>META-INF/*.SF</exclude>
							<exclude>META-INF/*.DSA</exclude>
							<exclude>META-INF/*.RSA</exclude>
						</excludes>
					</filter>
				</filters>
			</configuration>
		</execution>
	</executions>
</plugin>