柚子快報(bào)激活碼778899分享：大數(shù)據(jù)環(huán)境搭建@Hive編譯

Kakaku比價(jià)出海棧綜合2025-05-05530

http://yzkb.51969.com/

Hive3.1.3編譯

1.編譯原因1.1Guava依賴沖突1.2開啟MetaStore后運(yùn)行有StatsTask報(bào)錯(cuò)1.3Spark版本過低

2.環(huán)境部署2.1jdk安裝2.2maven部署2.3安裝圖形化桌面2.4安裝Git2.5安裝IDEA

3.拉取Hive源碼4.Hive源碼編譯4.1環(huán)境測試1.測試方法——編譯2.問題及解決方案?問題1：下載不到 pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar?問題2：阿里云鏡像沒有被使用?問題3：jdk版本沖突，<2.環(huán)境部署>做的不細(xì)致

4.2解決Guava版本沖突問題1.修改內(nèi)容2.問題及解決方案?問題1：`Futures.addCallback()`方法27.0-jre中3個(gè)參數(shù)，19.0中2個(gè)參數(shù)?問題2：Iterators的 `emptyIterator` 方法過時(shí)了

4.3開啟MetaStore之后StatsTask報(bào)錯(cuò)1.修改內(nèi)容2.問題及解決方案?問題1：cherry-pick失敗

4.4Spark兼容問題1.修改內(nèi)容2.問題及解決方案?問題1：SparkCounter中方法過時(shí)，需要替換?問題2：ShuffleWriteMetrics中方法過時(shí)，需要替換?問題3：TestStatsUtils中方法過時(shí)，需要替換

4.5編譯成功

各組件版本選擇：

hadoop-3.3.2

hive-3.1.3

spark-3.3.4 Scala version 2.12.15 （spark-3.3.4依賴hadoop-3.3.2）

1.編譯原因

1.1Guava依賴沖突

tail -200 /tmp/root/hive.log > /home/log/hive-200.log

hive的github地址

https://github.com/apache/hive

查詢guava依賴

https://github.com/apache/hive/blob/rel/release-3.1.3/pom.xml 19.0

hadoop的github地址

https://github.com/apache/hadoop

查詢guava依賴

https://github.com/apache/hadoop/blob/rel/release-3.3.2/hadoop-project/pom.xml 27.0-jre

1.2開啟MetaStore后運(yùn)行有StatsTask報(bào)錯(cuò)

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.StatsTask

MapReduce Jobs Launched:

1.3Spark版本過低

Hive3.1.3默認(rèn)支持Spark2.3.0，版本過低很多新的高效方法都沒用到，所以替換為spark-3.3.4（Hadoop3.3.2支持的最高spark版本）

2.3.0

2.環(huán)境部署

2.1jdk安裝

已安裝1.8版本

(base) [root@bigdata01 opt]# java -version

java version "1.8.0_301"

Java(TM) SE Runtime Environment (build 1.8.0_301-b09)

Java HotSpot(TM) 64-Bit Server VM (build 25.301-b09, mixed mode)

2.2maven部署

下載3.6.3安裝包 https://archive.apache.org/dist/maven/maven-3/3.6.3/binaries/

(base) [root@bigdata01 ~]# cd /opt

(base) [root@bigdata01 opt]# tar -zxvf apache-maven-3.6.3-bin.tar.gz

(base) [root@bigdata01 opt]# mv apache-maven-3.6.3 maven

(base) [root@bigdata01 opt]# vim /etc/profile

# 增加MAVEN_HOME

export MAVEN_HOME=/opt/maven

export PATH=$MAVEN_HOME/bin:$PATH

(base) [root@bigdata01 opt]# source /etc/profile

監(jiān)測 maven 是否安裝成功

(base) [root@bigdata01 opt]# mvn -version

Apache Maven 3.6.3 ()

Maven home: /opt/maven

Java version: 1.8.0_301, vendor: Oracle Corporation, runtime: /opt/jdk/jre

Default locale: zh_CN, platform encoding: UTF-8

OS name: "linux", version: "4.18.0-365.el8.x86_64", arch: "amd64", family: "unix"

配置倉庫鏡像，阿里云公共倉庫

vim /opt/maven/conf/settings.xml

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">

/repo

aliyunmaven

central

阿里云公共倉庫

https://maven.aliyun.com/repository/public

2.3安裝圖形化桌面

采用帶圖形界面的Centos，卸載多余的jdk，避免版本沖突。這里的操作非常重要，不處理會(huì)報(bào)稀奇古怪的錯(cuò)誤。例如：Fatal error compiling: 無效的目標(biāo)發(fā)行版: 1.11 ，報(bào)錯(cuò)java1.11沒有這個(gè)版本，即使升了java11也沒有用。

找到多余安裝的jdk

(base) [root@bigdata01 ~]# yum list installed |grep jdk

copy-jdk-configs.noarch 4.0-2.el8 @appstream

java-1.8.0-openjdk.x86_64 1:1.8.0.362.b08-3.el8 @appstream

java-1.8.0-openjdk-devel.x86_64 1:1.8.0.362.b08-3.el8 @appstream

java-1.8.0-openjdk-headless.x86_64 1:1.8.0.362.b08-3.el8 @appstream

卸載多余安裝的jdk

(base) [root@bigdata01 ~]# yum remove -y copy-jdk-configs.noarch java-1.8.0-openjdk.x86_64 java-1.8.0-openjdk-devel.x86_64 java-1.8.0-openjdk-headless.x86_64

驗(yàn)證當(dāng)前的jdk

(base) [root@bigdata01 ~]# java -version

java version "1.8.0_301"

Java(TM) SE Runtime Environment (build 1.8.0_301-b09)

Java HotSpot(TM) 64-Bit Server VM (build 25.301-b09, mixed mode)

2.4安裝Git

安裝第三方倉庫

(base) [root@bigdata01 opt]# yum install https://repo.ius.io/ius-release-el7.rpm https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

安裝Git

(base) [root@bigdata01 opt]# yum install -y git

git版本檢查

(base) [root@bigdata01 ~]# git -v

git version 2.43.0

2.5安裝IDEA

https://download.jetbrains.com.cn/idea/ideaIU-2021.1.3.tar.gz 下載 linux版

(base) [root@bigdata01 opt]# tar -zxvf ideaIU-2021.1.3.tar.gz

啟動(dòng)IDEA，啟動(dòng)圖形化界面要在VMware中

cd /opt/idea-IU-211.7628.21

./bin/idea.sh

這里試用30天

配置 maven，settings.xml中已配置阿里云公共倉庫地址

設(shè)置 idea 快捷圖標(biāo)（這里的 bluetooth-sendto.desktop 是隨便復(fù)制了一個(gè)，可以任意換）

(base) [root@bigdata01 bin]# cd /usr/share/applications

(base) [root@bigdata01 applications]# cp bluetooth-sendto.desktop idea.desktop

(base) [root@bigdata01 applications]# vim idea.desktop

# 刪掉原有的，補(bǔ)充這個(gè)內(nèi)容

[Desktop Entry]

Name=idea

Exec=sh /opt/idea-IU-211.7628.21/bin/idea.sh

Terminal=false

Type=Application

Icon=/opt/idea-IU-211.7628.21/bin/idea.png

Comment=idea

Categories=Application;

3.拉取Hive源碼

Get from VCS拉取hive源碼，拉取的全過程大約需要1小時(shí)

配置URL https://gitee.com/apache/hive.git，并設(shè)置文件地址

信任項(xiàng)目后注意配置，這里按圖填，否則容易jdk版本異常造成錯(cuò)誤(-Xmx2048m)

加載hive3.1.3

并建立分支 slash-hive-3.1.3

4.Hive源碼編譯

4.1環(huán)境測試

1.測試方法——編譯

https://hive.apache.org/development/gettingstarted/ 點(diǎn)擊Getting Started Guide

https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-BuildingHivefromSource 點(diǎn)擊Building Hive from Source

獲得編碼方式，執(zhí)行在 Terminal 終端執(zhí)行，運(yùn)行成功的7min左右

mvn clean package -Pdist -DskipTests -Dmaven.javadoc.skip=true

2.問題及解決方案

?問題1：下載不到 pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar

[ERROR] Failed to execute goal on project hive-upgrade-acid: Could not resolve dependencies for project org.apache.hive:hive-upgrade-acid:jar:3.1.3

Downloading from conjars: http://conjars.org/repo/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.pom 下載不到 pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar

這個(gè)問題是一個(gè)已知問題，它是由于Pentaho公司的Maven存儲(chǔ)庫服務(wù)器已被永久關(guān)閉，所以無法從該倉庫獲取它的依賴項(xiàng)導(dǎo)致的。

解決方案，先修改 /opt/maven/conf/setting.xml 文件如下

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">

/repo

aliyunmaven

spring-plugin

https://maven.aliyun.com/repository/spring-plugin

aliyunmaven

central

阿里云公共倉庫

https://maven.aliyun.com/repository/public

成功下載 /org/pentaho/ 相關(guān)內(nèi)容后再改回去?。。?！

?問題2：阿里云鏡像沒有被使用

[ERROR] Failed to execute goal on project hive-upgrade-acid: Could not resolve dependencies for project org.apache.hive:hive-upgrade-acid:jar:3.1.3

Downloading from conjars: https://maven.glassfish.org/content/groups/glassfish/asm/asm/3.1/asm-3.1.jar 下載不到 asm-3.1.jar

修改/opt/maven/conf/settings.xml文件，之前的阿里云鏡像沒有被使用。復(fù)制如下內(nèi)容，覆蓋整個(gè)settings.xml文件

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">

/repo

aliyunmaven

central

阿里云公共倉庫

https://maven.aliyun.com/repository/public

配置后重啟服務(wù)，阿里云鏡像被成功使用

?問題3：jdk版本沖突，<2.環(huán)境部署>做的不細(xì)致

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.5.1:compile (default-compile) on project hive-upgrade-acid: Fatal error compiling: 無效的目標(biāo)發(fā)行版: 1.11 -> [Help 1]

報(bào)這個(gè)錯(cuò)誤是jdk版本沖突了，Linux版盡管 java -version 都顯示了 1.8版本，但圖形化、IDEA沒做處理就會(huì)有很多jdk存在，需要做的就是重新做<2.環(huán)境部署><3.拉取Hive源碼>

4.2解決Guava版本沖突問題

1.修改內(nèi)容

修改pom.xml中的guava.version的版本為 27.0-jre

# 原來版本

19.0

# 修改后版本

27.0-jre

修改版本后執(zhí)行編譯 mvn clean package -Pdist -DskipTests -Dmaven.javadoc.skip=true

結(jié)果保存：/home/slash/hive/packaging/target/apache-hive-3.1.3-bin.tar.gz

2.問題及解決方案

?問題1：Futures.addCallback()方法27.0-jre中3個(gè)參數(shù)，19.0中2個(gè)參數(shù)

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile (default-compile) on project hive-llap-common: Compilation failure: Compilation failure:

[ERROR] /home/slash/hive/llap-common/src/java/org/apache/hadoop/hive/llap/AsyncPbRpcProxy.java:[173,16] 無法將類 com.google.common.util.concurrent.Futures中的方法 addCallback應(yīng)用到給定類型;

[ERROR] 需要: com.google.common.util.concurrent.ListenableFuture,com.google.common.util.concurrent.FutureCallback,java.util.concurrent.Executor

[ERROR] 找到: com.google.common.util.concurrent.ListenableFuture,org.apache.hadoop.hive.llap.AsyncPbRpcProxy.ResponseCallback

[ERROR] 原因: 無法推斷類型變量 V

[ERROR] (實(shí)際參數(shù)列表和形式參數(shù)列表長度不同)

修改 Futures.addCallback()，為其增加第3個(gè)參數(shù),MoreExecutors.directExecutor()，這個(gè)修改大概15處，方法相同

// 原來的

@VisibleForTesting

void submitToExecutor(

CallableRequest request, LlapNodeId nodeId) {

ListenableFuture future = executor.submit(request);

Futures.addCallback(future, new ResponseCallback(

request.getCallback(), nodeId, this));

}

// 修改后的

@VisibleForTesting

void submitToExecutor(

CallableRequest request, LlapNodeId nodeId) {

ListenableFuture future = executor.submit(request);

Futures.addCallback(future, new ResponseCallback(

request.getCallback(), nodeId, this),MoreExecutors.directExecutor());

}

過程中如果出現(xiàn) ”找不到MoreExecutors方法“的問題可以手動(dòng) import 這個(gè)方法，具體方法可以拷貝其他文件中的 import

?問題2：Iterators的 emptyIterator 方法過時(shí)了

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile (default-compile) on project hive-druid-handler: Compilation failure

[ERROR] /home/slash/hive/druid-handler/src/java/org/apache/hadoop/hive/druid/serde/DruidScanQueryRecordReader.java:[46,61] emptyIterator()在com.google.common.collect.Iterators中不是公共的; 無法從外部程序包中對(duì)其進(jìn)行訪問

修改Iterators中的emptyIterator方法

# org.apache.hadoop.hive.druid.serde.DruidScanQueryRecordReader

# 原始代碼

private Iterator> compactedValues = Iterators.emptyIterator();

# 修改后代碼

private Iterator> compactedValues = ImmutableSet.>of().iterator();

4.3開啟MetaStore之后StatsTask報(bào)錯(cuò)

1.修改內(nèi)容

# 錯(cuò)誤信息

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.StatsTask

MapReduce Jobs Launched:

# 錯(cuò)誤日志 /tmp/root/hive.log

exec.StatsTask: Failed to run stats task

org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransp

ortException

錯(cuò)誤分析見 https://issues.apache.org/jira/browse/HIVE-19316

IDEA點(diǎn)擊 Cherry-pick，將StatsTask fails due to ClassCastException的補(bǔ)丁合并到當(dāng)前分支

修改版本后執(zhí)行編譯 mvn clean package -Pdist -DskipTests -Dmaven.javadoc.skip=true

結(jié)果保存：/home/slash/hive/packaging/target/apache-hive-3.1.3-bin.tar.gz

2.問題及解決方案

?問題1：cherry-pick失敗

Cherry-pick failed 3d21bc38 HIVE-19316: StatsTask fails due to ClassCastException (Jaume Marhuenda, reviewed by Jesus Camacho Rodriguez) Committer identity unknown *** Please tell me who you are. Run git config --global user.email “you@example.com” git config --global user.name “Your Name” to set your account’s default identity. Omit --global to set the identity only in this repository. unable to auto-detect email address (got ‘root@bigdata01.(none)’)

需要提交修復(fù)的版本信息

Cherry-pick failed

3d21bc38 HIVE-19316: StatsTask fails due toClassCastException (Jaume Marhuenda, reviewedby Jesus Camacho Rodriguez)your local changes would be overwritten bycherry-pick.hint: commit your changes or stash them toproceed.cherry-pick failed

工作目錄中已經(jīng)存在一些未提交的更改。git 不允許在未提交更改的情況下進(jìn)行 cherry-pick

# 提交修復(fù)的版本信息

git config --global user.email "寫個(gè)郵箱地址"

git config --global user.name "slash"

# 添加并commit提交

git add .

git commit -m "resolve conflict guava"

4.4Spark兼容問題

1.修改內(nèi)容

修改pom.xml中的 spark.version、scala.version、hadoop.version

2.3.0

2.11

2.11.8

3.3.4

2.12

2.12.15

spark中消除部分hadoop依賴，hive3.1.3依賴的是hadoop3.1.0，不同于spark-3.3.4依賴hadoop3.3.2，不用改hive pom的hadoop依賴

org.apache.spark

spark-core_${scala.binary.version}

${spark.version}

org.apache.hadoop

hadoop-core

org.apache.hadoop

hadoop-client

org.apache.hadoop

hadoop-client-api

org.apache.hadoop

hadoop-client-runtime

2.問題及解決方案

?問題1：SparkCounter中方法過時(shí)，需要替換

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile (default-compile) on project hive-spark-client: Compilation failure: Compilation failure:

[ERROR] /home/slash/hive/spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java:[22,24] 找不到符號(hào)

[ERROR] 符號(hào): 類 Accumulator

[ERROR] 位置: 程序包 org.apache.spark

[ERROR] /home/slash/hive/spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java:[23,24] 找不到符號(hào)

[ERROR] 符號(hào): 類 AccumulatorParam

[ERROR] 位置: 程序包 org.apache.spark

[ERROR] /home/slash/hive/spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java:[30,11] 找不到符號(hào)

[ERROR] 符號(hào): 類 Accumulator

[ERROR] 位置: 類 org.apache.hive.spark.counter.SparkCounter

[ERROR] /home/slash/hive/spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java:[91,41] 找不到符號(hào)

[ERROR] 符號(hào): 類 AccumulatorParam

[ERROR] 位置: 類 org.apache.hive.spark.counter.SparkCounter

移除無用的方法，并修改相關(guān)內(nèi)容，最終結(jié)果如下

* Licensed to the Apache Software Foundation (ASF) under one

* or more contributor license agreements. See the NOTICE file

* distributed with this work for additional information

* regarding copyright ownership. The ASF licenses this file

* to you under the Apache License, Version 2.0 (the

* "License"); you may not use this file except in compliance

* with the License. You may obtain a copy of the License at

* http://www.apache.org/licenses/LICENSE-2.0

* Unless required by applicable law or agreed to in writing, software

* distributed under the License is distributed on an "AS IS" BASIS,

* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

* See the License for the specific language governing permissions and

* limitations under the License.

package org.apache.hive.spark.counter;

import java.io.Serializable;

import org.apache.spark.api.java.JavaSparkContext;

import org.apache.spark.util.LongAccumulator;

public class SparkCounter implements Serializable {

private String name;

private String displayName;

private LongAccumulator accumulator;

// Values of accumulators can only be read on the SparkContext side. This field is used when

// creating a snapshot to be sent to the RSC client.

private long accumValue;

public SparkCounter() {

// For serialization.

}

private SparkCounter(

String name,

String displayName,

long value) {

this.name = name;

this.displayName = displayName;

this.accumValue = value;

}

public SparkCounter(

String name,

String displayName,

String groupName,

long initValue,

JavaSparkContext sparkContext) {

this.name = name;

this.displayName = displayName;

String accumulatorName = groupName + "_" + name;

this.accumulator = sparkContext.sc().longAccumulator(accumulatorName);

this.accumulator.setValue(initValue);

}

public long getValue() {

if (accumulator != null) {

return accumulator.value();

} else {

return accumValue;

}

public void increment(long incr) {

accumulator.add(incr);

}

public String getName() {

return name;

}

public String getDisplayName() {

return displayName;

}

public void setDisplayName(String displayName) {

this.displayName = displayName;

}

SparkCounter snapshot() {

return new SparkCounter(name, displayName, accumulator.value());

}

?問題2：ShuffleWriteMetrics中方法過時(shí)，需要替換

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile (default-compile) on project hive-spark-client: Compilation failure: Compilation failure:

[ERROR] /home/slash/hive/spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleWriteMetrics.java:[50,39] 找不到符號(hào)

[ERROR] 符號(hào): 方法 shuffleBytesWritten()

[ERROR] 位置: 類 org.apache.spark.executor.ShuffleWriteMetrics

[ERROR] /home/slash/hive/spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleWriteMetrics.java:[51,36] 找不到符號(hào)

[ERROR] 符號(hào): 方法 shuffleWriteTime()

[ERROR] 位置: 類 org.apache.spark.executor.ShuffleWriteMetrics

修改相關(guān)方法

// 原始代碼

public ShuffleWriteMetrics(TaskMetrics metrics) {

this(metrics.shuffleWriteMetrics().shuffleBytesWritten(),

metrics.shuffleWriteMetrics().shuffleWriteTime());

}

// 修改后

public ShuffleWriteMetrics(TaskMetrics metrics) {

this(metrics.shuffleWriteMetrics().bytesWritten(),

metrics.shuffleWriteMetrics().writeTime());

}

?問題3：TestStatsUtils中方法過時(shí)，需要替換

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.6.1:testCompile (default-testCompile) on project hive-exec: Compilation failure

[ERROR] /home/slash/hive/ql/src/test/org/apache/hadoop/hive/ql/stats/TestStatsUtils.java:[34,39] 程序包org.spark_project.guava.collect不存在

修改相關(guān)方法

// 原始代碼

import org.spark_project.guava.collect.Sets;

// 修改后

import org.sparkproject.guava.collect.Sets;

4.5編譯成功

mvn clean package -Pdist -DskipTests -Dmaven.javadoc.skip=true

Hive3.1.3-spark-3.3.4-hadoop-3.3.2編譯成功，結(jié)果保存：/home/slash/hive/packaging/target/apache-hive-3.1.3-bin.tar.gz

聲明：本文所載信息不保證準(zhǔn)確性和完整性。文中所述內(nèi)容和意見僅供參考，不構(gòu)成實(shí)際商業(yè)建議，可收藏可轉(zhuǎn)發(fā)但請(qǐng)勿轉(zhuǎn)載，如有雷同純屬巧合

柚子快報(bào)激活碼778899分享：大數(shù)據(jù)環(huán)境搭建@Hive編譯

http://yzkb.51969.com/