柚子快報(bào)激活碼778899分享:大數(shù)據(jù)環(huán)境搭建@Hive編譯
柚子快報(bào)激活碼778899分享:大數(shù)據(jù)環(huán)境搭建@Hive編譯
Hive3.1.3編譯
1.編譯原因1.1Guava依賴沖突1.2開啟MetaStore后運(yùn)行有StatsTask報(bào)錯(cuò)1.3Spark版本過低
2.環(huán)境部署2.1jdk安裝2.2maven部署2.3安裝圖形化桌面2.4安裝Git2.5安裝IDEA
3.拉取Hive源碼4.Hive源碼編譯4.1環(huán)境測試1.測試方法——編譯2.問題及解決方案?問題1:下載不到 pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar?問題2:阿里云鏡像沒有被使用?問題3:jdk版本沖突,<2.環(huán)境部署>做的不細(xì)致
4.2解決Guava版本沖突問題1.修改內(nèi)容2.問題及解決方案?問題1:`Futures.addCallback()`方法27.0-jre中3個(gè)參數(shù),19.0中2個(gè)參數(shù)?問題2:Iterators的 `emptyIterator` 方法過時(shí)了
4.3開啟MetaStore之后StatsTask報(bào)錯(cuò)1.修改內(nèi)容2.問題及解決方案?問題1:cherry-pick失敗
4.4Spark兼容問題1.修改內(nèi)容2.問題及解決方案?問題1:SparkCounter中方法過時(shí),需要替換?問題2:ShuffleWriteMetrics中方法過時(shí),需要替換?問題3:TestStatsUtils中方法過時(shí),需要替換
4.5編譯成功
各組件版本選擇:
hadoop-3.3.2
hive-3.1.3
spark-3.3.4 Scala version 2.12.15 (spark-3.3.4依賴hadoop-3.3.2)
1.編譯原因
1.1Guava依賴沖突
tail -200 /tmp/root/hive.log > /home/log/hive-200.log
hive的github地址
https://github.com/apache/hive
查詢guava依賴
https://github.com/apache/hive/blob/rel/release-3.1.3/pom.xml
hadoop的github地址
https://github.com/apache/hadoop
查詢guava依賴
https://github.com/apache/hadoop/blob/rel/release-3.3.2/hadoop-project/pom.xml
1.2開啟MetaStore后運(yùn)行有StatsTask報(bào)錯(cuò)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.StatsTask
MapReduce Jobs Launched:
1.3Spark版本過低
Hive3.1.3默認(rèn)支持Spark2.3.0,版本過低很多新的高效方法都沒用到,所以替換為spark-3.3.4(Hadoop3.3.2支持的最高spark版本)
2.環(huán)境部署
2.1jdk安裝
已安裝1.8版本
(base) [root@bigdata01 opt]# java -version
java version "1.8.0_301"
Java(TM) SE Runtime Environment (build 1.8.0_301-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.301-b09, mixed mode)
2.2maven部署
下載3.6.3安裝包 https://archive.apache.org/dist/maven/maven-3/3.6.3/binaries/
(base) [root@bigdata01 ~]# cd /opt
(base) [root@bigdata01 opt]# tar -zxvf apache-maven-3.6.3-bin.tar.gz
(base) [root@bigdata01 opt]# mv apache-maven-3.6.3 maven
(base) [root@bigdata01 opt]# vim /etc/profile
# 增加MAVEN_HOME
export MAVEN_HOME=/opt/maven
export PATH=$MAVEN_HOME/bin:$PATH
(base) [root@bigdata01 opt]# source /etc/profile
監(jiān)測 maven 是否安裝成功
(base) [root@bigdata01 opt]# mvn -version
Apache Maven 3.6.3 ()
Maven home: /opt/maven
Java version: 1.8.0_301, vendor: Oracle Corporation, runtime: /opt/jdk/jre
Default locale: zh_CN, platform encoding: UTF-8
OS name: "linux", version: "4.18.0-365.el8.x86_64", arch: "amd64", family: "unix"
配置倉庫鏡像,阿里云公共倉庫
vim /opt/maven/conf/settings.xml
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
2.3安裝圖形化桌面
采用帶圖形界面的Centos,卸載多余的jdk,避免版本沖突。這里的操作非常重要,不處理會(huì)報(bào)稀奇古怪的錯(cuò)誤。例如:Fatal error compiling: 無效的目標(biāo)發(fā)行版: 1.11 ,報(bào)錯(cuò)java1.11沒有這個(gè)版本,即使升了java11也沒有用。
找到多余安裝的jdk
(base) [root@bigdata01 ~]# yum list installed |grep jdk
copy-jdk-configs.noarch 4.0-2.el8 @appstream
java-1.8.0-openjdk.x86_64 1:1.8.0.362.b08-3.el8 @appstream
java-1.8.0-openjdk-devel.x86_64 1:1.8.0.362.b08-3.el8 @appstream
java-1.8.0-openjdk-headless.x86_64 1:1.8.0.362.b08-3.el8 @appstream
卸載多余安裝的jdk
(base) [root@bigdata01 ~]# yum remove -y copy-jdk-configs.noarch java-1.8.0-openjdk.x86_64 java-1.8.0-openjdk-devel.x86_64 java-1.8.0-openjdk-headless.x86_64
驗(yàn)證當(dāng)前的jdk
(base) [root@bigdata01 ~]# java -version
java version "1.8.0_301"
Java(TM) SE Runtime Environment (build 1.8.0_301-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.301-b09, mixed mode)
2.4安裝Git
安裝第三方倉庫
(base) [root@bigdata01 opt]# yum install https://repo.ius.io/ius-release-el7.rpm https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
安裝Git
(base) [root@bigdata01 opt]# yum install -y git
git版本檢查
(base) [root@bigdata01 ~]# git -v
git version 2.43.0
2.5安裝IDEA
https://download.jetbrains.com.cn/idea/ideaIU-2021.1.3.tar.gz 下載 linux版
(base) [root@bigdata01 opt]# tar -zxvf ideaIU-2021.1.3.tar.gz
啟動(dòng)IDEA,啟動(dòng)圖形化界面要在VMware中
cd /opt/idea-IU-211.7628.21
./bin/idea.sh
這里試用30天
配置 maven,settings.xml中已配置阿里云公共倉庫地址
設(shè)置 idea 快捷圖標(biāo)(這里的 bluetooth-sendto.desktop 是隨便復(fù)制了一個(gè),可以任意換)
(base) [root@bigdata01 bin]# cd /usr/share/applications
(base) [root@bigdata01 applications]# cp bluetooth-sendto.desktop idea.desktop
(base) [root@bigdata01 applications]# vim idea.desktop
# 刪掉原有的,補(bǔ)充這個(gè)內(nèi)容
[Desktop Entry]
Name=idea
Exec=sh /opt/idea-IU-211.7628.21/bin/idea.sh
Terminal=false
Type=Application
Icon=/opt/idea-IU-211.7628.21/bin/idea.png
Comment=idea
Categories=Application;
3.拉取Hive源碼
Get from VCS拉取hive源碼,拉取的全過程大約需要1小時(shí)
配置URL https://gitee.com/apache/hive.git,并設(shè)置文件地址
信任項(xiàng)目 后注意配置,這里按圖填,否則容易jdk版本異常造成錯(cuò)誤(-Xmx2048m)
加載hive3.1.3
并建立分支 slash-hive-3.1.3
4.Hive源碼編譯
4.1環(huán)境測試
1.測試方法——編譯
https://hive.apache.org/development/gettingstarted/ 點(diǎn)擊Getting Started Guide
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-BuildingHivefromSource 點(diǎn)擊Building Hive from Source
獲得編碼方式,執(zhí)行在 Terminal 終端執(zhí)行,運(yùn)行成功的7min左右
mvn clean package -Pdist -DskipTests -Dmaven.javadoc.skip=true
2.問題及解決方案
?問題1:下載不到 pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar
[ERROR] Failed to execute goal on project hive-upgrade-acid: Could not resolve dependencies for project org.apache.hive:hive-upgrade-acid:jar:3.1.3
Downloading from conjars: http://conjars.org/repo/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/pentaho-aggdesigner-algorithm-5.1.5-jhyde.pom 下載不到 pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar
這個(gè)問題是一個(gè)已知問題,它是由于Pentaho公司的Maven存儲(chǔ)庫服務(wù)器已被永久關(guān)閉,所以無法從該倉庫獲取它的依賴項(xiàng)導(dǎo)致的。
解決方案,先修改 /opt/maven/conf/setting.xml 文件如下
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
成功下載 /org/pentaho/ 相關(guān)內(nèi)容后再改回去?。。?!
?問題2:阿里云鏡像沒有被使用
[ERROR] Failed to execute goal on project hive-upgrade-acid: Could not resolve dependencies for project org.apache.hive:hive-upgrade-acid:jar:3.1.3
Downloading from conjars: https://maven.glassfish.org/content/groups/glassfish/asm/asm/3.1/asm-3.1.jar 下載不到 asm-3.1.jar
修改/opt/maven/conf/settings.xml文件,之前的阿里云鏡像沒有被使用。復(fù)制如下內(nèi)容,覆蓋整個(gè)settings.xml文件
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
配置后重啟服務(wù),阿里云鏡像被成功使用
?問題3:jdk版本沖突,<2.環(huán)境部署>做的不細(xì)致
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.5.1:compile (default-compile) on project hive-upgrade-acid: Fatal error compiling: 無效的目標(biāo)發(fā)行版: 1.11 -> [Help 1]
報(bào)這個(gè)錯(cuò)誤是jdk版本沖突了,Linux版盡管 java -version 都顯示了 1.8版本,但圖形化、IDEA沒做處理就會(huì)有很多jdk存在,需要做的就是重新做<2.環(huán)境部署><3.拉取Hive源碼>
4.2解決Guava版本沖突問題
1.修改內(nèi)容
修改pom.xml中的guava.version的版本為 27.0-jre
# 原來版本
# 修改后版本
修改版本后執(zhí)行編譯 mvn clean package -Pdist -DskipTests -Dmaven.javadoc.skip=true
結(jié)果保存:/home/slash/hive/packaging/target/apache-hive-3.1.3-bin.tar.gz
2.問題及解決方案
?問題1:Futures.addCallback()方法27.0-jre中3個(gè)參數(shù),19.0中2個(gè)參數(shù)
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile (default-compile) on project hive-llap-common: Compilation failure: Compilation failure:
[ERROR] /home/slash/hive/llap-common/src/java/org/apache/hadoop/hive/llap/AsyncPbRpcProxy.java:[173,16] 無法將類 com.google.common.util.concurrent.Futures中的方法 addCallback應(yīng)用到給定類型;
[ERROR] 需要: com.google.common.util.concurrent.ListenableFuture
[ERROR] 找到: com.google.common.util.concurrent.ListenableFuture,org.apache.hadoop.hive.llap.AsyncPbRpcProxy.ResponseCallback
[ERROR] 原因: 無法推斷類型變量 V
[ERROR] (實(shí)際參數(shù)列表和形式參數(shù)列表長度不同)
修改 Futures.addCallback(),為其增加第3個(gè)參數(shù),MoreExecutors.directExecutor(),這個(gè)修改大概15處,方法相同
// 原來的
@VisibleForTesting
CallableRequest
ListenableFuture future = executor.submit(request);
Futures.addCallback(future, new ResponseCallback(
request.getCallback(), nodeId, this));
}
// 修改后的
@VisibleForTesting
CallableRequest
ListenableFuture future = executor.submit(request);
Futures.addCallback(future, new ResponseCallback(
request.getCallback(), nodeId, this),MoreExecutors.directExecutor());
}
過程中如果出現(xiàn) ”找不到MoreExecutors方法“的問題可以手動(dòng) import 這個(gè)方法,具體方法可以拷貝其他文件中的 import
?問題2:Iterators的 emptyIterator 方法過時(shí)了
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile (default-compile) on project hive-druid-handler: Compilation failure
[ERROR] /home/slash/hive/druid-handler/src/java/org/apache/hadoop/hive/druid/serde/DruidScanQueryRecordReader.java:[46,61]
修改Iterators中的emptyIterator方法
# org.apache.hadoop.hive.druid.serde.DruidScanQueryRecordReader
# 原始代碼
private Iterator> compactedValues = Iterators.emptyIterator();
# 修改后代碼
private Iterator> compactedValues = ImmutableSet.
>of().iterator();
4.3開啟MetaStore之后StatsTask報(bào)錯(cuò)
1.修改內(nèi)容
# 錯(cuò)誤信息
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.StatsTask
MapReduce Jobs Launched:
# 錯(cuò)誤日志 /tmp/root/hive.log
exec.StatsTask: Failed to run stats task
org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransp
ortException
錯(cuò)誤分析見 https://issues.apache.org/jira/browse/HIVE-19316
IDEA點(diǎn)擊 Cherry-pick,將StatsTask fails due to ClassCastException的補(bǔ)丁合并到當(dāng)前分支
修改版本后執(zhí)行編譯 mvn clean package -Pdist -DskipTests -Dmaven.javadoc.skip=true
結(jié)果保存:/home/slash/hive/packaging/target/apache-hive-3.1.3-bin.tar.gz
2.問題及解決方案
?問題1:cherry-pick失敗
Cherry-pick failed 3d21bc38 HIVE-19316: StatsTask fails due to ClassCastException (Jaume Marhuenda, reviewed by Jesus Camacho Rodriguez) Committer identity unknown *** Please tell me who you are. Run git config --global user.email “you@example.com” git config --global user.name “Your Name” to set your account’s default identity. Omit --global to set the identity only in this repository. unable to auto-detect email address (got ‘root@bigdata01.(none)’)
需要提交修復(fù)的版本信息
Cherry-pick failed
3d21bc38 HIVE-19316: StatsTask fails due toClassCastException (Jaume Marhuenda, reviewedby Jesus Camacho Rodriguez)your local changes would be overwritten bycherry-pick.hint: commit your changes or stash them toproceed.cherry-pick failed
工作目錄中已經(jīng)存在一些未提交的更改。git 不允許在未提交更改的情況下進(jìn)行 cherry-pick
# 提交修復(fù)的版本信息
git config --global user.email "寫個(gè)郵箱地址"
git config --global user.name "slash"
# 添加并commit提交
git add .
git commit -m "resolve conflict guava"
4.4Spark兼容問題
1.修改內(nèi)容
修改pom.xml中的 spark.version、scala.version、hadoop.version
spark中消除部分hadoop依賴,hive3.1.3依賴的是hadoop3.1.0,不同于spark-3.3.4依賴hadoop3.3.2,不用改hive pom的hadoop依賴
2.問題及解決方案
?問題1:SparkCounter中方法過時(shí),需要替換
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile (default-compile) on project hive-spark-client: Compilation failure: Compilation failure:
[ERROR] /home/slash/hive/spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java:[22,24] 找不到符號(hào)
[ERROR] 符號(hào): 類 Accumulator
[ERROR] 位置: 程序包 org.apache.spark
[ERROR] /home/slash/hive/spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java:[23,24] 找不到符號(hào)
[ERROR] 符號(hào): 類 AccumulatorParam
[ERROR] 位置: 程序包 org.apache.spark
[ERROR] /home/slash/hive/spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java:[30,11] 找不到符號(hào)
[ERROR] 符號(hào): 類 Accumulator
[ERROR] 位置: 類 org.apache.hive.spark.counter.SparkCounter
[ERROR] /home/slash/hive/spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java:[91,41] 找不到符號(hào)
[ERROR] 符號(hào): 類 AccumulatorParam
[ERROR] 位置: 類 org.apache.hive.spark.counter.SparkCounter
移除無用的方法,并修改相關(guān)內(nèi)容,最終結(jié)果如下
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hive.spark.counter;
import java.io.Serializable;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.util.LongAccumulator;
public class SparkCounter implements Serializable {
private String name;
private String displayName;
private LongAccumulator accumulator;
// Values of accumulators can only be read on the SparkContext side. This field is used when
// creating a snapshot to be sent to the RSC client.
private long accumValue;
public SparkCounter() {
// For serialization.
}
private SparkCounter(
String name,
String displayName,
long value) {
this.name = name;
this.displayName = displayName;
this.accumValue = value;
}
public SparkCounter(
String name,
String displayName,
String groupName,
long initValue,
JavaSparkContext sparkContext) {
this.name = name;
this.displayName = displayName;
String accumulatorName = groupName + "_" + name;
this.accumulator = sparkContext.sc().longAccumulator(accumulatorName);
this.accumulator.setValue(initValue);
}
public long getValue() {
if (accumulator != null) {
return accumulator.value();
} else {
return accumValue;
}
}
public void increment(long incr) {
accumulator.add(incr);
}
public String getName() {
return name;
}
public String getDisplayName() {
return displayName;
}
public void setDisplayName(String displayName) {
this.displayName = displayName;
}
SparkCounter snapshot() {
return new SparkCounter(name, displayName, accumulator.value());
}
}
?問題2:ShuffleWriteMetrics中方法過時(shí),需要替換
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile (default-compile) on project hive-spark-client: Compilation failure: Compilation failure:
[ERROR] /home/slash/hive/spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleWriteMetrics.java:[50,39] 找不到符號(hào)
[ERROR] 符號(hào): 方法 shuffleBytesWritten()
[ERROR] 位置: 類 org.apache.spark.executor.ShuffleWriteMetrics
[ERROR] /home/slash/hive/spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleWriteMetrics.java:[51,36] 找不到符號(hào)
[ERROR] 符號(hào): 方法 shuffleWriteTime()
[ERROR] 位置: 類 org.apache.spark.executor.ShuffleWriteMetrics
修改相關(guān)方法
// 原始代碼
public ShuffleWriteMetrics(TaskMetrics metrics) {
this(metrics.shuffleWriteMetrics().shuffleBytesWritten(),
metrics.shuffleWriteMetrics().shuffleWriteTime());
}
// 修改后
public ShuffleWriteMetrics(TaskMetrics metrics) {
this(metrics.shuffleWriteMetrics().bytesWritten(),
metrics.shuffleWriteMetrics().writeTime());
}
?問題3:TestStatsUtils中方法過時(shí),需要替換
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.6.1:testCompile (default-testCompile) on project hive-exec: Compilation failure
[ERROR] /home/slash/hive/ql/src/test/org/apache/hadoop/hive/ql/stats/TestStatsUtils.java:[34,39] 程序包org.spark_project.guava.collect不存在
修改相關(guān)方法
// 原始代碼
import org.spark_project.guava.collect.Sets;
// 修改后
import org.sparkproject.guava.collect.Sets;
4.5編譯成功
mvn clean package -Pdist -DskipTests -Dmaven.javadoc.skip=true
Hive3.1.3-spark-3.3.4-hadoop-3.3.2編譯成功,結(jié)果保存:/home/slash/hive/packaging/target/apache-hive-3.1.3-bin.tar.gz
聲明:本文所載信息不保證準(zhǔn)確性和完整性。文中所述內(nèi)容和意見僅供參考,不構(gòu)成實(shí)際商業(yè)建議,可收藏可轉(zhuǎn)發(fā)但請(qǐng)勿轉(zhuǎn)載,如有雷同純屬巧合
柚子快報(bào)激活碼778899分享:大數(shù)據(jù)環(huán)境搭建@Hive編譯
相關(guān)鏈接
本文內(nèi)容根據(jù)網(wǎng)絡(luò)資料整理,出于傳遞更多信息之目的,不代表金鑰匙跨境贊同其觀點(diǎn)和立場。
轉(zhuǎn)載請(qǐng)注明,如有侵權(quán),聯(lián)系刪除。