欧美free性护士vide0shd,老熟女,一区二区三区,久久久久夜夜夜精品国产,久久久久久综合网天天,欧美成人护士h版

首頁綜合 正文
目錄

柚子快報(bào)激活碼778899分享:數(shù)據(jù)倉庫哈哈

柚子快報(bào)激活碼778899分享:數(shù)據(jù)倉庫哈哈

http://yzkb.51969.com/

數(shù)據(jù)倉庫

基本概念數(shù)據(jù)庫(database)和數(shù)據(jù)倉庫(Data Warehouse)的異同

整體架構(gòu)分層架構(gòu)方法論ER模型(建模理論)維度模型

何為分層第一層:數(shù)據(jù)源(ODS ER模型)設(shè)計(jì)要點(diǎn)日志表業(yè)務(wù)表1活動(dòng)信息表(全量表)2活動(dòng)規(guī)則表(全量表)3一級(jí)品類表(全量表)4二級(jí)品類表(全量表)5三級(jí)品類表(全量表)6編碼字典表(全量表)7省份表(全量表)8地區(qū)表(全量表)9品牌表(全量表)10購(gòu)物車表(全量表)11優(yōu)惠券信息表(全量表)12商品平臺(tái)屬性表(全量表)13商品表(全量表)14商品銷售屬性值表(全量表)15SPU表(全量表)16營(yíng)銷坑位表(全量表)17營(yíng)銷渠道表(全量表)18購(gòu)物車表(增量表)19評(píng)論表(增量表)20優(yōu)惠券領(lǐng)用表(增量表)21收藏表(增量表)22訂單明細(xì)表(增量表)23訂單明細(xì)活動(dòng)關(guān)聯(lián)表(增量表)24訂單明細(xì)優(yōu)惠券關(guān)聯(lián)表(增量表)25訂單表(增量表)26退單表(增量表)27訂單狀態(tài)流水表(增量表)28支付表(增量表)29退款表(增量表)30用戶表(增量表)31數(shù)據(jù)裝載腳本

第二層:數(shù)據(jù)加工(DWD data warehouse detail)事實(shí)表設(shè)計(jì)(事務(wù)型事實(shí)表)

事務(wù)的原子性事實(shí)表設(shè)計(jì)(周期型快照事實(shí)表)

從當(dāng)前表中取數(shù)據(jù)后再放回去需考慮去重問題,增加retry的容錯(cuò)性事實(shí)表設(shè)計(jì)(累積型快照事實(shí)表)分區(qū)策略

第三層:數(shù)據(jù)統(tǒng)計(jì)(DWS data warehouse summary 提高性能的關(guān)鍵層)第四層:數(shù)據(jù)分析(ADS application data service)表的設(shè)計(jì)(要點(diǎn))優(yōu)化(假)1流量主題1.1各渠道流量統(tǒng)計(jì)

第五層:共通層(DIM dimension)設(shè)計(jì)要點(diǎn)是否創(chuàng)建表維度表設(shè)計(jì)1商品維度表2優(yōu)惠券維度表3活動(dòng)(規(guī)則)維度表4地區(qū)維度表5營(yíng)銷坑位維度表6營(yíng)銷渠道維度表7日期維度表8用戶維度表(拉鏈(壓縮)表)

CTE : 共通表表達(dá)式拉鏈表設(shè)計(jì)任務(wù)調(diào)度器

基本概念

本質(zhì)是對(duì)數(shù)據(jù)進(jìn)行加工處理后對(duì)外提供數(shù)據(jù)服務(wù)

數(shù)據(jù)庫(database)和數(shù)據(jù)倉庫(Data Warehouse)的異同

數(shù)據(jù)庫用于存儲(chǔ)企業(yè)基礎(chǔ),核心的業(yè)務(wù)數(shù)據(jù)從數(shù)據(jù)來源進(jìn)行區(qū)分

數(shù)據(jù)庫:企業(yè)的業(yè)務(wù)系統(tǒng)數(shù)據(jù)倉庫:數(shù)據(jù)庫(后臺(tái)的后臺(tái)) 從數(shù)據(jù)存儲(chǔ)進(jìn)行區(qū)分

數(shù)據(jù)庫:存儲(chǔ)的目的為了可以快速進(jìn)行數(shù)據(jù)查詢操作 索引 : SQL 存儲(chǔ)方式:行式存儲(chǔ) 數(shù)據(jù)量:不能存儲(chǔ)海量數(shù)據(jù)數(shù)據(jù)倉庫:存儲(chǔ)的目的為了可以快速進(jìn)行統(tǒng)計(jì)分析 索引 : 沒有索引(k-v) 存儲(chǔ)方式:列式存儲(chǔ) 數(shù)據(jù)量:必須存儲(chǔ)海量數(shù)據(jù) 從數(shù)據(jù)價(jià)值進(jìn)行區(qū)分

數(shù)據(jù)庫 :保障企業(yè)業(yè)務(wù)系統(tǒng)的執(zhí)行 事務(wù)(回滾)數(shù)據(jù)倉庫 :統(tǒng)計(jì)分析的結(jié)果可以為企業(yè)的經(jīng)營(yíng)決策提供數(shù)據(jù)依據(jù) 沒有事務(wù) 數(shù)據(jù)倉庫不是數(shù)據(jù)流轉(zhuǎn)的終點(diǎn) :可視化才是數(shù)據(jù)的終點(diǎn)

整體架構(gòu)

Spark : 數(shù)據(jù)的統(tǒng)計(jì)分析 數(shù)據(jù)倉庫:數(shù)據(jù)的統(tǒng)計(jì)分析 數(shù)據(jù)倉庫不能直接對(duì)接MySQL數(shù)據(jù)庫作為數(shù)據(jù)源!

數(shù)據(jù)庫不是為了數(shù)據(jù)倉庫服務(wù)的。數(shù)據(jù)倉庫如果直接對(duì)象數(shù)據(jù)庫,會(huì)導(dǎo)致數(shù)據(jù)庫的性能降低數(shù)據(jù)庫不能存儲(chǔ)海量數(shù)據(jù)。數(shù)據(jù)倉庫必須獲取海量數(shù)據(jù)數(shù)據(jù)庫采用行式存儲(chǔ)。數(shù)據(jù)倉庫為了提高統(tǒng)計(jì)分析效率,所以需要列式存儲(chǔ)

數(shù)據(jù)倉庫應(yīng)該增加自己的數(shù)據(jù)源

數(shù)據(jù)倉庫的數(shù)據(jù)源中的數(shù)據(jù)應(yīng)該和MySQL數(shù)據(jù)庫中的數(shù)據(jù)保持一致 數(shù)據(jù)倉庫的數(shù)據(jù)源應(yīng)該不斷融合(匯總)MySQL數(shù)據(jù)庫中的數(shù)據(jù) 將數(shù)據(jù)庫的數(shù)據(jù)匯總的到數(shù)據(jù)倉庫數(shù)據(jù)源的過程,一般稱之為數(shù)據(jù)同步,也稱之為數(shù)據(jù)采集

分層架構(gòu)

數(shù)據(jù)倉庫計(jì)算周期為1天:1天統(tǒng)計(jì)一回?cái)?shù)據(jù)結(jié)果

方法論

ER模型(建模理論)

ER(Entity Relationship)(實(shí)體關(guān)系)模型 采用面向?qū)ο蟮姆绞皆O(shè)計(jì)表(和Java一樣)

將對(duì)象理解為表將對(duì)象之間的關(guān)系理解為表之間的關(guān)系

超詳細(xì)內(nèi)容(帶圖)看這里

維度模型

事實(shí) :行為所產(chǎn)生的事情(數(shù)據(jù)) 維度:分析數(shù)據(jù)的角度(狀態(tài))

超詳細(xì)內(nèi)容(帶圖)看這里

何為分層

Spark中的方法可能會(huì)含有shuffle功能, shuffle操作會(huì)將完整的計(jì)算流程一分為二,會(huì)分為2個(gè)階段(Stage),前面一個(gè)階段稱之為Map階段,后面的階段稱之為Reduce階段, shuffle中前一個(gè)階段的任務(wù)不執(zhí)行完,后面的階段的任務(wù)不允許執(zhí)行的, Task Pool(任務(wù)池) - 任務(wù)調(diào)度(FIFO, FAIR)。

數(shù)據(jù)倉庫也存在同樣的問題,將整個(gè)計(jì)算流程分為了4段, 在數(shù)據(jù)倉庫中不稱之為段,一般稱之為層,每一層有特殊的含義和特殊的功能 前面一層的數(shù)據(jù)沒有處理完,后面一層的數(shù)據(jù)沒有辦法處理

第一層:數(shù)據(jù)源(ODS ER模型)

功能:

為整個(gè)數(shù)據(jù)倉庫作為數(shù)據(jù)來源 不斷匯總業(yè)務(wù)數(shù)據(jù)和日志數(shù)據(jù) 數(shù)據(jù)量非常大:海量數(shù)據(jù) -> 考慮資源問題:使用最少的資源存儲(chǔ)最多的資源(考慮使用壓縮算法gzip、lzo、snappy);考慮網(wǎng)絡(luò)資源:考慮傳輸方式,數(shù)據(jù)盡可能不變(格式、壓縮方式、存儲(chǔ)方式) 統(tǒng)計(jì)本質(zhì)上就是對(duì)行為數(shù)據(jù)進(jìn)行統(tǒng)計(jì) 分析本質(zhì)上就是站在什么角度對(duì)統(tǒng)計(jì)結(jié)果進(jìn)行分析

-- ODS

-- 1. ODS層表建模方式:ER模型

-- 2. 數(shù)據(jù)格式不變,數(shù)據(jù)壓縮方式 gzip

-- 3. 表名

-- 分層標(biāo)記(ods_) + 同步數(shù)據(jù)的表名 + 增量/全量(inc/full)

-- 增量,全量

設(shè)計(jì)要點(diǎn)

(1)ODS層的表結(jié)構(gòu)設(shè)計(jì)依托于從業(yè)務(wù)系統(tǒng)同步過來的數(shù)據(jù)結(jié)構(gòu)。 (2)ODS層要保存全部歷史數(shù)據(jù),故其壓縮格式應(yīng)選擇壓縮比較高的,此處選擇gzip。 (3)ODS層表名的命名規(guī)范為:ods_表名_單分區(qū)增量全量標(biāo)識(shí)(inc/full)。

日志表

1)建表語句

DROP TABLE IF EXISTS ods_log_inc;

CREATE EXTERNAL TABLE ods_log_inc

(

`common` STRUCT

ba :STRING,

ch :STRING,

is_new :STRING,

md :STRING,

mid :STRING,

os :STRING,

sid :STRING,

uid :STRING,

vc :STRING> COMMENT '公共信息',

`page` STRUCT

item :STRING,

item_type :STRING,

last_page_id :STRING,

page_id :STRING,

from_pos_id :STRING,

from_pos_seq :STRING,

refer_id :STRING> COMMENT '頁面信息',

`actions` ARRAY

item:STRING,

item_type:STRING,

ts:BIGINT>> COMMENT '動(dòng)作信息',

`displays` ARRAY

item :STRING,

item_type :STRING,

`pos_seq` :STRING,

pos_id :STRING>> COMMENT '曝光信息',

`start` STRUCT

first_open :BIGINT,

loading_time :BIGINT,

open_ad_id :BIGINT,

open_ad_ms :BIGINT,

open_ad_skip_ms :BIGINT> COMMENT '啟動(dòng)信息',

`err` STRUCT

msg:STRING> COMMENT '錯(cuò)誤信息',

`ts` BIGINT COMMENT '時(shí)間戳'

) COMMENT '活動(dòng)信息表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'

LOCATION '/warehouse/gmall/ods/ods_log_inc/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

2)數(shù)據(jù)裝載

load data inpath '/origin_data/gmall/log/topic_log/2022-06-08' into table ods_log_inc partition(dt='2022-06-08');

3)每日數(shù)據(jù)裝載腳本 (1)在hadoop102的/home/atguigu/bin目錄下創(chuàng)建hdfs_to_ods_log.sh

vim hdfs_to_ods_log.sh

(2)編寫如下內(nèi)容

#!/bin/bash

# 定義變量方便修改

APP=gmall

# 如果是輸入的日期按照取輸入日期;如果沒輸入日期取當(dāng)前時(shí)間的前一天

if [ -n "$1" ] ;then

do_date=$1

else

do_date=`date -d "-1 day" +%F`

fi

echo ================== 日志日期為 $do_date ==================

sql="

load data inpath '/origin_data/$APP/log/topic_log/$do_date' into table ${APP}.ods_log_inc partition(dt='$do_date');

"

hive -e "$sql"

(3)增加腳本執(zhí)行權(quán)限

chmod +x hdfs_to_ods_log.sh

(4)腳本用法

hdfs_to_ods_log.sh 2022-06-08

業(yè)務(wù)表

1活動(dòng)信息表(全量表)

DROP TABLE IF EXISTS ods_activity_info_full;

CREATE EXTERNAL TABLE ods_activity_info_full

(

`id` STRING COMMENT '活動(dòng)id',

`activity_name` STRING COMMENT '活動(dòng)名稱',

`activity_type` STRING COMMENT '活動(dòng)類型',

`activity_desc` STRING COMMENT '活動(dòng)描述',

`start_time` STRING COMMENT '開始時(shí)間',

`end_time` STRING COMMENT '結(jié)束時(shí)間',

`create_time` STRING COMMENT '創(chuàng)建時(shí)間',

`operate_time` STRING COMMENT '修改時(shí)間'

) COMMENT '活動(dòng)信息表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

NULL DEFINED AS ''

LOCATION '/warehouse/gmall/ods/ods_activity_info_full/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

2活動(dòng)規(guī)則表(全量表)

DROP TABLE IF EXISTS ods_activity_rule_full;

CREATE EXTERNAL TABLE ods_activity_rule_full

(

`id` STRING COMMENT '編號(hào)',

`activity_id` STRING COMMENT '活動(dòng)ID',

`activity_type` STRING COMMENT '活動(dòng)類型',

`condition_amount` DECIMAL(16, 2) COMMENT '滿減金額',

`condition_num` BIGINT COMMENT '滿減件數(shù)',

`benefit_amount` DECIMAL(16, 2) COMMENT '優(yōu)惠金額',

`benefit_discount` DECIMAL(16, 2) COMMENT '優(yōu)惠折扣',

`benefit_level` STRING COMMENT '優(yōu)惠級(jí)別',

`create_time` STRING COMMENT '創(chuàng)建時(shí)間',

`operate_time` STRING COMMENT '修改時(shí)間'

) COMMENT '活動(dòng)規(guī)則表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

NULL DEFINED AS ''

LOCATION '/warehouse/gmall/ods/ods_activity_rule_full/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

3一級(jí)品類表(全量表)

DROP TABLE IF EXISTS ods_base_category1_full;

CREATE EXTERNAL TABLE ods_base_category1_full

(

`id` STRING COMMENT '編號(hào)',

`name` STRING COMMENT '分類名稱',

`create_time` STRING COMMENT '創(chuàng)建時(shí)間',

`operate_time` STRING COMMENT '修改時(shí)間'

) COMMENT '一級(jí)品類表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

NULL DEFINED AS ''

LOCATION '/warehouse/gmall/ods/ods_base_category1_full/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

4二級(jí)品類表(全量表)

DROP TABLE IF EXISTS ods_base_category2_full;

CREATE EXTERNAL TABLE ods_base_category2_full

(

`id` STRING COMMENT '編號(hào)',

`name` STRING COMMENT '二級(jí)分類名稱',

`category1_id` STRING COMMENT '一級(jí)分類編號(hào)',

`create_time` STRING COMMENT '創(chuàng)建時(shí)間',

`operate_time` STRING COMMENT '修改時(shí)間'

) COMMENT '二級(jí)品類表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

NULL DEFINED AS ''

LOCATION '/warehouse/gmall/ods/ods_base_category2_full/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

5三級(jí)品類表(全量表)

DROP TABLE IF EXISTS ods_base_category3_full;

CREATE EXTERNAL TABLE ods_base_category3_full

(

`id` STRING COMMENT '編號(hào)',

`name` STRING COMMENT '三級(jí)分類名稱',

`category2_id` STRING COMMENT '二級(jí)分類編號(hào)',

`create_time` STRING COMMENT '創(chuàng)建時(shí)間',

`operate_time` STRING COMMENT '修改時(shí)間'

) COMMENT '三級(jí)品類表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

NULL DEFINED AS ''

LOCATION '/warehouse/gmall/ods/ods_base_category3_full/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

6編碼字典表(全量表)

DROP TABLE IF EXISTS ods_base_dic_full;

CREATE EXTERNAL TABLE ods_base_dic_full

(

`dic_code` STRING COMMENT '編號(hào)',

`dic_name` STRING COMMENT '編碼名稱',

`parent_code` STRING COMMENT '父編號(hào)',

`create_time` STRING COMMENT '創(chuàng)建日期',

`operate_time` STRING COMMENT '修改日期'

) COMMENT '編碼字典表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

NULL DEFINED AS ''

LOCATION '/warehouse/gmall/ods/ods_base_dic_full/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

7省份表(全量表)

DROP TABLE IF EXISTS ods_base_province_full;

CREATE EXTERNAL TABLE ods_base_province_full

(

`id` STRING COMMENT '編號(hào)',

`name` STRING COMMENT '省份名稱',

`region_id` STRING COMMENT '地區(qū)ID',

`area_code` STRING COMMENT '地區(qū)編碼',

`iso_code` STRING COMMENT '舊版國(guó)際標(biāo)準(zhǔn)地區(qū)編碼,供可視化使用',

`iso_3166_2` STRING COMMENT '新版國(guó)際標(biāo)準(zhǔn)地區(qū)編碼,供可視化使用',

`create_time` STRING COMMENT '創(chuàng)建時(shí)間',

`operate_time` STRING COMMENT '修改時(shí)間'

) COMMENT '省份表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

NULL DEFINED AS ''

LOCATION '/warehouse/gmall/ods/ods_base_province_full/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

8地區(qū)表(全量表)

DROP TABLE IF EXISTS ods_base_region_full;

CREATE EXTERNAL TABLE ods_base_region_full

(

`id` STRING COMMENT '地區(qū)ID',

`region_name` STRING COMMENT '地區(qū)名稱',

`create_time` STRING COMMENT '創(chuàng)建時(shí)間',

`operate_time` STRING COMMENT '修改時(shí)間'

) COMMENT '地區(qū)表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

NULL DEFINED AS ''

LOCATION '/warehouse/gmall/ods/ods_base_region_full/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

9品牌表(全量表)

DROP TABLE IF EXISTS ods_base_trademark_full;

CREATE EXTERNAL TABLE ods_base_trademark_full

(

`id` STRING COMMENT '編號(hào)',

`tm_name` STRING COMMENT '品牌名稱',

`logo_url` STRING COMMENT '品牌LOGO的圖片路徑',

`create_time` STRING COMMENT '創(chuàng)建時(shí)間',

`operate_time` STRING COMMENT '修改時(shí)間'

) COMMENT '品牌表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

NULL DEFINED AS ''

LOCATION '/warehouse/gmall/ods/ods_base_trademark_full/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

10購(gòu)物車表(全量表)

DROP TABLE IF EXISTS ods_cart_info_full;

CREATE EXTERNAL TABLE ods_cart_info_full

(

`id` STRING COMMENT '編號(hào)',

`user_id` STRING COMMENT '用戶ID',

`sku_id` STRING COMMENT 'SKU_ID',

`cart_price` DECIMAL(16, 2) COMMENT '放入購(gòu)物車時(shí)價(jià)格',

`sku_num` BIGINT COMMENT '數(shù)量',

`img_url` BIGINT COMMENT '商品圖片地址',

`sku_name` STRING COMMENT 'SKU名稱 (冗余)',

`is_checked` STRING COMMENT '是否被選中',

`create_time` STRING COMMENT '創(chuàng)建時(shí)間',

`operate_time` STRING COMMENT '修改時(shí)間',

`is_ordered` STRING COMMENT '是否已經(jīng)下單',

`order_time` STRING COMMENT '下單時(shí)間'

) COMMENT '購(gòu)物車全量表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

NULL DEFINED AS ''

LOCATION '/warehouse/gmall/ods/ods_cart_info_full/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

11優(yōu)惠券信息表(全量表)

DROP TABLE IF EXISTS ods_coupon_info_full;

CREATE EXTERNAL TABLE ods_coupon_info_full

(

`id` STRING COMMENT '購(gòu)物券編號(hào)',

`coupon_name` STRING COMMENT '購(gòu)物券名稱',

`coupon_type` STRING COMMENT '購(gòu)物券類型 1 現(xiàn)金券 2 折扣券 3 滿減券 4 滿件打折券',

`condition_amount` DECIMAL(16, 2) COMMENT '滿額數(shù)',

`condition_num` BIGINT COMMENT '滿件數(shù)',

`activity_id` STRING COMMENT '活動(dòng)編號(hào)',

`benefit_amount` DECIMAL(16, 2) COMMENT '減免金額',

`benefit_discount` DECIMAL(16, 2) COMMENT '折扣',

`create_time` STRING COMMENT '創(chuàng)建時(shí)間',

`range_type` STRING COMMENT '范圍類型 1、商品(SPUID) 2、品類(三級(jí)品類id) 3、品牌',

`limit_num` BIGINT COMMENT '最多領(lǐng)用次數(shù)',

`taken_count` BIGINT COMMENT '已領(lǐng)用次數(shù)',

`start_time` STRING COMMENT '可以領(lǐng)取的開始時(shí)間',

`end_time` STRING COMMENT '可以領(lǐng)取的結(jié)束時(shí)間',

`operate_time` STRING COMMENT '修改時(shí)間',

`expire_time` STRING COMMENT '過期時(shí)間',

`range_desc` STRING COMMENT '范圍描述'

) COMMENT '優(yōu)惠券信息表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

NULL DEFINED AS ''

LOCATION '/warehouse/gmall/ods/ods_coupon_info_full/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

12商品平臺(tái)屬性表(全量表)

DROP TABLE IF EXISTS ods_sku_attr_value_full;

CREATE EXTERNAL TABLE ods_sku_attr_value_full

(

`id` STRING COMMENT '編號(hào)',

`attr_id` STRING COMMENT '平臺(tái)屬性ID',

`value_id` STRING COMMENT '平臺(tái)屬性值ID',

`sku_id` STRING COMMENT 'SKU_ID',

`attr_name` STRING COMMENT '平臺(tái)屬性名稱',

`value_name` STRING COMMENT '平臺(tái)屬性值名稱',

`create_time` STRING COMMENT '創(chuàng)建時(shí)間',

`operate_time` STRING COMMENT '修改時(shí)間'

) COMMENT '商品平臺(tái)屬性表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

NULL DEFINED AS ''

LOCATION '/warehouse/gmall/ods/ods_sku_attr_value_full/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

13商品表(全量表)

DROP TABLE IF EXISTS ods_sku_info_full;

CREATE EXTERNAL TABLE ods_sku_info_full

(

`id` STRING COMMENT 'SKU_ID',

`spu_id` STRING COMMENT 'SPU_ID',

`price` DECIMAL(16, 2) COMMENT '價(jià)格',

`sku_name` STRING COMMENT 'SKU名稱',

`sku_desc` STRING COMMENT 'SKU規(guī)格描述',

`weight` DECIMAL(16, 2) COMMENT '重量',

`tm_id` STRING COMMENT '品牌ID',

`category3_id` STRING COMMENT '三級(jí)品類ID',

`sku_default_img` STRING COMMENT '默認(rèn)顯示圖片地址',

`is_sale` STRING COMMENT '是否在售',

`create_time` STRING COMMENT '創(chuàng)建時(shí)間',

`operate_time` STRING COMMENT '修改時(shí)間'

) COMMENT '商品表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

NULL DEFINED AS ''

LOCATION '/warehouse/gmall/ods/ods_sku_info_full/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

14商品銷售屬性值表(全量表)

DROP TABLE IF EXISTS ods_sku_sale_attr_value_full;

CREATE EXTERNAL TABLE ods_sku_sale_attr_value_full

(

`id` STRING COMMENT '編號(hào)',

`sku_id` STRING COMMENT 'SKU_ID',

`spu_id` STRING COMMENT 'SPU_ID',

`sale_attr_value_id` STRING COMMENT '銷售屬性值ID',

`sale_attr_id` STRING COMMENT '銷售屬性ID',

`sale_attr_name` STRING COMMENT '銷售屬性名稱',

`sale_attr_value_name` STRING COMMENT '銷售屬性值名稱',

`create_time` STRING COMMENT '創(chuàng)建時(shí)間',

`operate_time` STRING COMMENT '修改時(shí)間'

) COMMENT '商品銷售屬性值表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

NULL DEFINED AS ''

LOCATION '/warehouse/gmall/ods/ods_sku_sale_attr_value_full/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

15SPU表(全量表)

DROP TABLE IF EXISTS ods_spu_info_full;

CREATE EXTERNAL TABLE ods_spu_info_full

(

`id` STRING COMMENT 'SPU_ID',

`spu_name` STRING COMMENT 'SPU名稱',

`description` STRING COMMENT '描述信息',

`category3_id` STRING COMMENT '三級(jí)品類ID',

`tm_id` STRING COMMENT '品牌ID',

`create_time` STRING COMMENT '創(chuàng)建時(shí)間',

`operate_time` STRING COMMENT '修改時(shí)間'

) COMMENT 'SPU表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

NULL DEFINED AS ''

LOCATION '/warehouse/gmall/ods/ods_spu_info_full/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

16營(yíng)銷坑位表(全量表)

DROP TABLE IF EXISTS ods_promotion_pos_full;

CREATE EXTERNAL TABLE ods_promotion_pos_full

(

`id` STRING COMMENT '營(yíng)銷坑位ID',

`pos_location` STRING COMMENT '營(yíng)銷坑位位置',

`pos_type` STRING COMMENT '營(yíng)銷坑位類型:banner,宮格,列表,瀑布',

`promotion_type` STRING COMMENT '營(yíng)銷類型:算法、固定、搜索',

`create_time` STRING COMMENT '創(chuàng)建時(shí)間',

`operate_time` STRING COMMENT '修改時(shí)間'

) COMMENT '營(yíng)銷坑位表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

NULL DEFINED AS ''

LOCATION '/warehouse/gmall/ods/ods_promotion_pos_full/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

17營(yíng)銷渠道表(全量表)

DROP TABLE IF EXISTS ods_promotion_refer_full;

CREATE EXTERNAL TABLE ods_promotion_refer_full

(

`id` STRING COMMENT '外部營(yíng)銷渠道ID',

`refer_name` STRING COMMENT '外部營(yíng)銷渠道名稱',

`create_time` STRING COMMENT '創(chuàng)建時(shí)間',

`operate_time` STRING COMMENT '修改時(shí)間'

) COMMENT '營(yíng)銷渠道表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

NULL DEFINED AS ''

LOCATION '/warehouse/gmall/ods/ods_promotion_refer_full/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

18購(gòu)物車表(增量表)

DROP TABLE IF EXISTS ods_cart_info_inc;

CREATE EXTERNAL TABLE ods_cart_info_inc

(

`type` STRING COMMENT '變動(dòng)類型',

`ts` BIGINT COMMENT '變動(dòng)時(shí)間',

`data` STRUCT

user_id :STRING,

sku_id :STRING,

cart_price :DECIMAL(16, 2),

sku_num :BIGINT,

img_url :STRING,

sku_name :STRING,

is_checked :STRING,

create_time :STRING,

operate_time :STRING,

is_ordered :STRING,

order_time:STRING> COMMENT '數(shù)據(jù)',

`old` MAP COMMENT '舊值'

) COMMENT '購(gòu)物車增量表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'

LOCATION '/warehouse/gmall/ods/ods_cart_info_inc/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

19評(píng)論表(增量表)

DROP TABLE IF EXISTS ods_comment_info_inc;

CREATE EXTERNAL TABLE ods_comment_info_inc

(

`type` STRING COMMENT '變動(dòng)類型',

`ts` BIGINT COMMENT '變動(dòng)時(shí)間',

`data` STRUCT

user_id :STRING,

nick_name :STRING,

head_img :STRING,

sku_id :STRING,

spu_id :STRING,

order_id :STRING,

appraise :STRING,

comment_txt :STRING,

create_time :STRING,

operate_time :STRING> COMMENT '數(shù)據(jù)',

`old` MAP COMMENT '舊值'

) COMMENT '評(píng)論表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'

LOCATION '/warehouse/gmall/ods/ods_comment_info_inc/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

20優(yōu)惠券領(lǐng)用表(增量表)

DROP TABLE IF EXISTS ods_coupon_use_inc;

CREATE EXTERNAL TABLE ods_coupon_use_inc

(

`type` STRING COMMENT '變動(dòng)類型',

`ts` BIGINT COMMENT '變動(dòng)時(shí)間',

`data` STRUCT

coupon_id :STRING,

user_id :STRING,

order_id :STRING,

coupon_status :STRING,

get_time :STRING,

using_time:STRING,

used_time :STRING,expire_time :STRING,

create_time :STRING,

operate_time :STRING> COMMENT '數(shù)據(jù)',

`old` MAP COMMENT '舊值'

) COMMENT '優(yōu)惠券領(lǐng)用表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'

LOCATION '/warehouse/gmall/ods/ods_coupon_use_inc/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

21收藏表(增量表)

DROP TABLE IF EXISTS ods_favor_info_inc;

CREATE EXTERNAL TABLE ods_favor_info_inc

(

`type` STRING COMMENT '變動(dòng)類型',

`ts` BIGINT COMMENT '變動(dòng)時(shí)間',

`data` STRUCT

user_id :STRING,

sku_id :STRING,

spu_id :STRING,

is_cancel :STRING,

create_time :STRING,

operate_time:STRING> COMMENT '數(shù)據(jù)',

`old` MAP

STRING> COMMENT '舊值'

) COMMENT '收藏表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'

LOCATION '/warehouse/gmall/ods/ods_favor_info_inc/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

22訂單明細(xì)表(增量表)

DROP TABLE IF EXISTS ods_order_detail_inc;

CREATE EXTERNAL TABLE ods_order_detail_inc

(

`type` STRING COMMENT '變動(dòng)類型',

`ts` BIGINT COMMENT '變動(dòng)時(shí)間',

`data` STRUCT

order_id :STRING,

sku_id :STRING,

sku_name :STRING,

img_url :STRING,

order_price:DECIMAL(16, 2),

sku_num :BIGINT,

create_time :STRING,

source_type :STRING,

source_id :STRING,

split_total_amount:DECIMAL(16, 2),

split_activity_amount :DECIMAL(16, 2),

split_coupon_amount:DECIMAL(16, 2),

operate_time :STRING> COMMENT '數(shù)據(jù)',

`old` MAP

STRING> COMMENT '舊值'

) COMMENT '訂單明細(xì)表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'

LOCATION '/warehouse/gmall/ods/ods_order_detail_inc/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

23訂單明細(xì)活動(dòng)關(guān)聯(lián)表(增量表)

DROP TABLE IF EXISTS ods_order_detail_activity_inc;

CREATE EXTERNAL TABLE ods_order_detail_activity_inc

(

`type` STRING COMMENT '變動(dòng)類型',

`ts` BIGINT COMMENT '變動(dòng)時(shí)間',

`data` STRUCT

order_id :STRING,

order_detail_id :STRING,

activity_id :STRING,

activity_rule_id :STRING,

sku_id:STRING,

create_time :STRING,

operate_time :STRING> COMMENT '數(shù)據(jù)',

`old` MAP COMMENT '舊值'

) COMMENT '訂單明細(xì)活動(dòng)關(guān)聯(lián)表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'

LOCATION '/warehouse/gmall/ods/ods_order_detail_activity_inc/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

24訂單明細(xì)優(yōu)惠券關(guān)聯(lián)表(增量表)

DROP TABLE IF EXISTS ods_order_detail_coupon_inc;

CREATE EXTERNAL TABLE ods_order_detail_coupon_inc

(

`type` STRING COMMENT '變動(dòng)類型',

`ts` BIGINT COMMENT '變動(dòng)時(shí)間',

`data` STRUCT

order_id :STRING,

order_detail_id :STRING,

coupon_id :STRING,

coupon_use_id :STRING,

sku_id:STRING,

create_time :STRING,

operate_time :STRING> COMMENT '數(shù)據(jù)',

`old` MAP COMMENT '舊值'

) COMMENT '訂單明細(xì)優(yōu)惠券關(guān)聯(lián)表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'

LOCATION '/warehouse/gmall/ods/ods_order_detail_coupon_inc/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

25訂單表(增量表)

DROP TABLE IF EXISTS ods_order_info_inc;

CREATE EXTERNAL TABLE ods_order_info_inc

(

`type` STRING COMMENT '變動(dòng)類型',

`ts` BIGINT COMMENT '變動(dòng)時(shí)間',

`data` STRUCT

consignee :STRING,

consignee_tel :STRING,

total_amount :DECIMAL(16, 2),

order_status :STRING,

user_id:STRING,

payment_way :STRING,

delivery_address :STRING,

order_comment :STRING,

out_trade_no :STRING,

trade_body:STRING,

create_time :STRING,

operate_time :STRING,

expire_time :STRING,

process_status :STRING,

tracking_no:STRING,

parent_order_id :STRING,

img_url :STRING,

province_id :STRING,

activity_reduce_amount:DECIMAL(16, 2),

coupon_reduce_amount :DECIMAL(16, 2),

original_total_amount :DECIMAL(16, 2),

freight_fee:DECIMAL(16, 2),

freight_fee_reduce :DECIMAL(16, 2),

refundable_time :DECIMAL(16, 2)> COMMENT '數(shù)據(jù)',

`old` MAP COMMENT '舊值'

) COMMENT '訂單表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'

LOCATION '/warehouse/gmall/ods/ods_order_info_inc/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

26退單表(增量表)

DROP TABLE IF EXISTS ods_order_refund_info_inc;

CREATE EXTERNAL TABLE ods_order_refund_info_inc

(

`type` STRING COMMENT '變動(dòng)類型',

`ts` BIGINT COMMENT '變動(dòng)時(shí)間',

`data` STRUCT

user_id :STRING,

order_id :STRING,

sku_id :STRING,

refund_type :STRING,

refund_num :BIGINT,

refund_amount:DECIMAL(16, 2),

refund_reason_type :STRING,

refund_reason_txt :STRING,

refund_status :STRING,

create_time:STRING,

operate_time :STRING> COMMENT '數(shù)據(jù)',

`old` MAP COMMENT '舊值'

) COMMENT '退單表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'

LOCATION '/warehouse/gmall/ods/ods_order_refund_info_inc/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

27訂單狀態(tài)流水表(增量表)

DROP TABLE IF EXISTS ods_order_status_log_inc;

CREATE EXTERNAL TABLE ods_order_status_log_inc

(

`type` STRING COMMENT '變動(dòng)類型',

`ts` BIGINT COMMENT '變動(dòng)時(shí)間',

`data` STRUCT

order_id :STRING,

order_status :STRING,

create_time :STRING,

operate_time :STRING> COMMENT '數(shù)據(jù)',

`old` MAP COMMENT '舊值'

) COMMENT '訂單狀態(tài)流水表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'

LOCATION '/warehouse/gmall/ods/ods_order_status_log_inc/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

28支付表(增量表)

DROP TABLE IF EXISTS ods_payment_info_inc;

CREATE EXTERNAL TABLE ods_payment_info_inc

(

`type` STRING COMMENT '變動(dòng)類型',

`ts` BIGINT COMMENT '變動(dòng)時(shí)間',

`data` STRUCT

out_trade_no :STRING,

order_id :STRING,

user_id :STRING,

payment_type :STRING,

trade_no:STRING,

total_amount :DECIMAL(16, 2),

subject :STRING,

payment_status :STRING,

create_time :STRING,

callback_time:STRING,

callback_content :STRING,

operate_time :STRING> COMMENT '數(shù)據(jù)',

`old` MAP COMMENT '舊值'

) COMMENT '支付表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'

LOCATION '/warehouse/gmall/ods/ods_payment_info_inc/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

29退款表(增量表)

DROP TABLE IF EXISTS ods_refund_payment_inc;

CREATE EXTERNAL TABLE ods_refund_payment_inc

(

`type` STRING COMMENT '變動(dòng)類型',

`ts` BIGINT COMMENT '變動(dòng)時(shí)間',

`data` STRUCT

out_trade_no :STRING,

order_id :STRING,

sku_id :STRING,

payment_type :STRING,

trade_no :STRING,

total_amount:DECIMAL(16, 2),

subject :STRING,

refund_status :STRING,

create_time :STRING,

callback_time :STRING,

callback_content:STRING,

operate_time :STRING> COMMENT '數(shù)據(jù)',

`old` MAP COMMENT '舊值'

) COMMENT '退款表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'

LOCATION '/warehouse/gmall/ods/ods_refund_payment_inc/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

30用戶表(增量表)

DROP TABLE IF EXISTS ods_user_info_inc;

CREATE EXTERNAL TABLE ods_user_info_inc

(

`type` STRING COMMENT '變動(dòng)類型',

`ts` BIGINT COMMENT '變動(dòng)時(shí)間',

`data` STRUCT

login_name :STRING,

nick_name :STRING,

passwd :STRING,

name :STRING,

phone_num :STRING,

email:STRING,

head_img :STRING,

user_level :STRING,

birthday :STRING,

gender :STRING,

create_time :STRING,

operate_time:STRING,

status :STRING> COMMENT '數(shù)據(jù)',

`old` MAP COMMENT '舊值'

) COMMENT '用戶表'

PARTITIONED BY (`dt` STRING)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'

LOCATION '/warehouse/gmall/ods/ods_user_info_inc/'

TBLPROPERTIES ('compression.codec'='org.apache.hadoop.io.compress.GzipCodec');

31數(shù)據(jù)裝載腳本

(1)在hadoop102的/home/atguigu/bin目錄下創(chuàng)建hdfs_to_ods_db.sh

vim hdfs_to_ods_db.sh

(2)編寫如下內(nèi)容

#!/bin/bash

APP=gmall

if [ -n "$2" ] ;then

do_date=$2

else

do_date=`date -d '-1 day' +%F`

fi

load_data(){

sql=""

for i in $*; do

#判斷路徑是否存在

hadoop fs -test -e /origin_data/$APP/db/${i:4}/$do_date

#路徑存在方可裝載數(shù)據(jù)

if [[ $? = 0 ]]; then

sql=$sql"load data inpath '/origin_data/$APP/db/${i:4}/$do_date' OVERWRITE into table ${APP}.$i partition(dt='$do_date');"

fi

done

hive -e "$sql"

}

case $1 in

"ods_activity_info_full")

load_data "ods_activity_info_full"

;;

"ods_activity_rule_full")

load_data "ods_activity_rule_full"

;;

"ods_base_category1_full")

load_data "ods_base_category1_full"

;;

"ods_base_category2_full")

load_data "ods_base_category2_full"

;;

"ods_base_category3_full")

load_data "ods_base_category3_full"

;;

"ods_base_dic_full")

load_data "ods_base_dic_full"

;;

"ods_base_province_full")

load_data "ods_base_province_full"

;;

"ods_base_region_full")

load_data "ods_base_region_full"

;;

"ods_base_trademark_full")

load_data "ods_base_trademark_full"

;;

"ods_cart_info_full")

load_data "ods_cart_info_full"

;;

"ods_coupon_info_full")

load_data "ods_coupon_info_full"

;;

"ods_sku_attr_value_full")

load_data "ods_sku_attr_value_full"

;;

"ods_sku_info_full")

load_data "ods_sku_info_full"

;;

"ods_sku_sale_attr_value_full")

load_data "ods_sku_sale_attr_value_full"

;;

"ods_spu_info_full")

load_data "ods_spu_info_full"

;;

"ods_promotion_pos_full")

load_data "ods_promotion_pos_full"

;;

"ods_promotion_refer_full")

load_data "ods_promotion_refer_full"

;;

"ods_cart_info_inc")

load_data "ods_cart_info_inc"

;;

"ods_comment_info_inc")

load_data "ods_comment_info_inc"

;;

"ods_coupon_use_inc")

load_data "ods_coupon_use_inc"

;;

"ods_favor_info_inc")

load_data "ods_favor_info_inc"

;;

"ods_order_detail_inc")

load_data "ods_order_detail_inc"

;;

"ods_order_detail_activity_inc")

load_data "ods_order_detail_activity_inc"

;;

"ods_order_detail_coupon_inc")

load_data "ods_order_detail_coupon_inc"

;;

"ods_order_info_inc")

load_data "ods_order_info_inc"

;;

"ods_order_refund_info_inc")

load_data "ods_order_refund_info_inc"

;;

"ods_order_status_log_inc")

load_data "ods_order_status_log_inc"

;;

"ods_payment_info_inc")

load_data "ods_payment_info_inc"

;;

"ods_refund_payment_inc")

load_data "ods_refund_payment_inc"

;;

"ods_user_info_inc")

load_data "ods_user_info_inc"

;;

"all")

load_data "ods_activity_info_full" "ods_activity_rule_full" "ods_base_category1_full" "ods_base_category2_full" "ods_base_category3_full" "ods_base_dic_full" "ods_base_province_full" "ods_base_region_full" "ods_base_trademark_full" "ods_cart_info_full" "ods_coupon_info_full" "ods_sku_attr_value_full" "ods_sku_info_full" "ods_sku_sale_attr_value_full" "ods_spu_info_full" "ods_promotion_pos_full" "ods_promotion_refer_full" "ods_cart_info_inc" "ods_comment_info_inc" "ods_coupon_use_inc" "ods_favor_info_inc" "ods_order_detail_inc" "ods_order_detail_activity_inc" "ods_order_detail_coupon_inc" "ods_order_info_inc" "ods_order_refund_info_inc" "ods_order_status_log_inc" "ods_payment_info_inc" "ods_refund_payment_inc" "ods_user_info_inc"

;;

esac

(3)增加腳本執(zhí)行權(quán)限

chmod +x hdfs_to_ods_db.sh

(4)腳本用法

hdfs_to_ods_db.sh all 2022-06-08

第二層:數(shù)據(jù)加工(DWD data warehouse detail)

功能:將數(shù)據(jù)源中的數(shù)據(jù)進(jìn)行加工處理(判空、無效) 為了后續(xù)統(tǒng)計(jì)分析做數(shù)據(jù)準(zhǔn)備 數(shù)據(jù)量非常大,所以分離出了DIM層將數(shù)據(jù)整合 壓縮方式:snappy

事實(shí)表設(shè)計(jì)(事務(wù)型事實(shí)表)

-- DWD

-- Data Warehouse Detail

-- detail : 詳細(xì),明細(xì)

-- DWD層表主要設(shè)計(jì)的目的為了統(tǒng)計(jì)分析做準(zhǔn)備

-- 表中主要保存的是行為數(shù)據(jù)

-- 多個(gè)行為數(shù)據(jù)中如果存在共通性的內(nèi)容,那么可以提煉出來形成DIM層維度表的數(shù)據(jù)

-- 表的設(shè)計(jì)要點(diǎn)

-- 表的設(shè)計(jì)要依據(jù)維度建模理論中的事實(shí)表

-- 表設(shè)計(jì)時(shí)需要orc列式存儲(chǔ)以及snappy壓縮

-- 命名規(guī)范:

-- 分層標(biāo)記(dwd_) + 數(shù)據(jù)域(分類) + 原子性行為名稱 + 增量/全量(inc/full)

-- 絕大多數(shù)的行為數(shù)據(jù)都是增量數(shù)據(jù)采集

-- 特殊情況例外,可以采用全量方式實(shí)現(xiàn)。

-- dwd_user_login_success_inc

-- 事實(shí)表

-- 維度引用 + 度量值(行為產(chǎn)生時(shí)可以用于統(tǒng)計(jì)分析的數(shù)值:金額,數(shù)量,個(gè)數(shù))

-- 事實(shí)表會(huì)根據(jù)場(chǎng)景分為3大類:

-- 1. 事務(wù)型事實(shí)表

-- 行為是原子性

-- 用戶登錄(非原子)

-- 用戶登錄成功(原子)

-- 用戶登錄失?。ㄔ樱?/p>

-- 粒度:描述一行數(shù)據(jù)的詳細(xì)程度

-- 描述的越詳細(xì)(維度越多),粒度越細(xì)

-- 描述的越簡(jiǎn)單(維度越多),粒度越粗

-- 設(shè)計(jì)步驟:

-- 1. 選擇業(yè)務(wù)過程 :確定表

-- 2. 聲明粒度:確定行

-- 3. 確認(rèn)維度:確定列

-- 4. 確認(rèn)事實(shí):確定度量值

-- 2. 周期快照事實(shí)表

-- 3. 累積快照事實(shí)表

-- 交易域加購(gòu)事務(wù)事實(shí)表

-- 交易域 : trade

-- 加購(gòu) : 行為

-- 將商品加入到購(gòu)物車中的行為

-- 購(gòu)物車中沒有這個(gè)商品,往購(gòu)物車中增加商品

-- 購(gòu)物車中有這個(gè)商品,繼續(xù)往購(gòu)物車中增加該商品

-- 事務(wù)事實(shí)表

-- 原子性

-- 時(shí)間(行為時(shí)間) + 用戶 + 商品 + 數(shù)量

-- 表的字段結(jié)構(gòu):必要的維度屬性 + 度量值 + 可選的維度屬性

-- 建表語句

-- 分區(qū)策略:哪一天的行為數(shù)據(jù)存放到哪一天分區(qū)

事務(wù)的原子性

登錄成功(OK) 登錄失敗(OK) 下單成功(OK) 下單失敗(非正常業(yè)務(wù)行為,不需要再創(chuàng)建一張表) 支付成功(OK) 支付失?。∣K)

事實(shí)表設(shè)計(jì)(周期型快照事實(shí)表)

全量

-- 事務(wù)性事實(shí)表局限性

-- 事實(shí)表只針對(duì)于當(dāng)前行為進(jìn)行的統(tǒng)計(jì)分析時(shí),性能可以得到保障。

-- 當(dāng)前行為事實(shí)表和其他行為數(shù)據(jù)進(jìn)行關(guān)聯(lián)時(shí),數(shù)據(jù)量會(huì)幾何爆炸性增長(zhǎng),性能會(huì)急劇下降。

-- 存量性統(tǒng)計(jì)指標(biāo)使用事務(wù)性事實(shí)表效率太低,所以一般會(huì)采用其他事實(shí)表的設(shè)計(jì)方式

-- 2. 周期型快照事實(shí)表

-- 交易域購(gòu)物車周期快照事實(shí)表

-- 交易域

-- 購(gòu)物車 : cart_info

-- 周期快照事實(shí)表

從當(dāng)前表中取數(shù)據(jù)后再放回去需考慮去重問題,增加retry的容錯(cuò)性

事實(shí)表設(shè)計(jì)(累積型快照事實(shí)表)

-- 多行為統(tǒng)計(jì)指標(biāo)使用事務(wù)性事實(shí)表效率太低,所以一般會(huì)采用其他事實(shí)表的設(shè)計(jì)方式

-- 3. 累積型快照事實(shí)表

-- 使用一張表保存多個(gè)行為的狀態(tài)數(shù)據(jù)

-- 交易域交易流程累積快照事實(shí)表

-- 交易域

-- 交易流程 : 以訂單為基礎(chǔ)的交易流程

-- 累積快照事實(shí)表

分區(qū)策略

-- 事務(wù)性事實(shí)表:哪一天的行為數(shù)據(jù)存放到哪一天的分區(qū)

-- 周期性事實(shí)表:每一天存儲(chǔ)一份數(shù)據(jù)

-- 累積快照事實(shí)表:從業(yè)務(wù)流程中獲取最后一個(gè)業(yè)務(wù)行為時(shí)間作為分區(qū)字段

-- 下單時(shí)間 (X)

-- 支付時(shí)間 (X)

-- 收貨時(shí)間 (OK)

第三層:數(shù)據(jù)統(tǒng)計(jì)(DWS data warehouse summary 提高性能的關(guān)鍵層)

功能:將加工后的數(shù)據(jù)進(jìn)行統(tǒng)計(jì) 數(shù)據(jù)量非常大 壓縮方式:snappy

第四層:數(shù)據(jù)分析(ADS application data service)

功能:將統(tǒng)計(jì)結(jié)果進(jìn)行分析,為用戶提供經(jīng)營(yíng)決策 壓縮方式:gzip 數(shù)據(jù)格式:tsv

表的設(shè)計(jì)(要點(diǎn))

-- ADS層中存儲(chǔ)的是統(tǒng)計(jì)分析的最終結(jié)果

-- 數(shù)據(jù)量不多

-- 表不需要分區(qū)

-- 無需做進(jìn)一步聚合

-- 無需orc列式存儲(chǔ)和snappy壓縮

-- 行式存儲(chǔ) + gzip

-- 結(jié)果還需要向后流轉(zhuǎn)(可視化)

-- tsv

-- 表的結(jié)構(gòu)不能太復(fù)雜(滿足客戶的需求即可)

優(yōu)化(假)

Spark:

reduceByKey(函數(shù)內(nèi)部combine減少落盤數(shù)據(jù)量)和groupByKeycache、persist和checkpointDWS

1流量主題

1.1各渠道流量統(tǒng)計(jì)

1)建表語句

DROP TABLE IF EXISTS ads_traffic_stats_by_channel;

CREATE EXTERNAL TABLE ads_traffic_stats_by_channel

(

`dt` STRING COMMENT '統(tǒng)計(jì)日期',

`recent_days` BIGINT COMMENT '最近天數(shù),1:最近1天,7:最近7天,30:最近30天',

`channel` STRING COMMENT '渠道',

`uv_count` BIGINT COMMENT '訪客人數(shù)',

`avg_duration_sec` BIGINT COMMENT '會(huì)話平均停留時(shí)長(zhǎng),單位為秒',

`avg_page_count` BIGINT COMMENT '會(huì)話平均瀏覽頁面數(shù)',

`sv_count` BIGINT COMMENT '會(huì)話數(shù)',

`bounce_rate` DECIMAL(16, 2) COMMENT '跳出率'

) COMMENT '各渠道流量統(tǒng)計(jì)'

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

LOCATION '/warehouse/gmall/ads/ads_traffic_stats_by_channel/';

2)數(shù)據(jù)裝載

第五層:共通層(DIM dimension)

功能:將共同的數(shù)據(jù)放在共通的表中,可在多個(gè)統(tǒng)計(jì)需求中使用 dimension:維度,分析數(shù)據(jù)的角度 該層不需要一開始就設(shè)計(jì),可以等DWD層設(shè)計(jì)的差不多了,或是寫著寫著發(fā)現(xiàn)DWD中有好多表都用到了共通的字段,有大量冗余數(shù)據(jù),那么就可以將這部分共通的數(shù)據(jù)提取成一個(gè)表

設(shè)計(jì)要點(diǎn)

(1)DIM層的設(shè)計(jì)依據(jù)是維度建模理論,該層存儲(chǔ)維度模型的維度表。 (2)DIM層的數(shù)據(jù)存儲(chǔ)格式為orc列式存儲(chǔ)+snappy壓縮。 (3)DIM層表名的命名規(guī)范為dim_表名_全量表或者拉鏈表標(biāo)識(shí)(full/zip)。 絕大多數(shù)的維度表都是全量表

是否創(chuàng)建表

數(shù)據(jù)量少,應(yīng)用面窄 :無需創(chuàng)建表(放用得到的表里即可,即維度退化) 數(shù)據(jù)量少,應(yīng)用面廣 :無需創(chuàng)建獨(dú)立表,一般和其他的數(shù)放置在一張表中(數(shù)據(jù)字典表(編碼表->是樹形表)) 樹形(有上下級(jí))數(shù)據(jù)保存時(shí)一般會(huì)采用 parent - child節(jié)點(diǎn)的設(shè)計(jì)方式

一般情況下,會(huì)采用一張表來設(shè)計(jì)上下級(jí)結(jié)構(gòu):部門(depart) – 表中的列:下級(jí)部門(主鍵)(N), 上級(jí)部門(外鍵)(1)字典表也是樹形表

維度表設(shè)計(jì)

確定維度表:確定維度的表是否該創(chuàng)建

原則上來講,每一個(gè)分析數(shù)據(jù)的角度(維度)都應(yīng)該創(chuàng)建一張表

案例:統(tǒng)計(jì)各個(gè)省份,各個(gè)品牌的訂單總銷量 – 訂單屬于事實(shí)(行為)表,省份和品牌就是維度表案例:統(tǒng)計(jì)各個(gè)性別不同年齡段的訂單總銷量 – 訂單屬于事實(shí)(行為)表, 性別和年齡就是維度表 如果多個(gè)維度存在關(guān)聯(lián),那么一般就會(huì)只創(chuàng)建一張表,表中包含了多個(gè)關(guān)聯(lián)的維度如果分析數(shù)據(jù)的角度應(yīng)用場(chǎng)景少,而且數(shù)據(jù)量小,不需要?jiǎng)?chuàng)建專門的維度表

案例:支付方式(微信支付,支付寶支付) 確定主維表和相關(guān)維表(用于分析維度表的列)

確定表中的列

案例:省份維度表 – 列:名稱數(shù)據(jù)倉庫的數(shù)據(jù)都來自于MySQL業(yè)務(wù)數(shù)據(jù), – 維度表的列的聲明可以參考業(yè)務(wù)數(shù)據(jù)庫表的字段MySQL業(yè)務(wù)數(shù)據(jù)庫中具有唯一性字段的那個(gè)業(yè)務(wù)表稱之為主維表 – 其他的表稱之為相關(guān)維表。 確定表中的列

盡可能豐富(多)編碼和文字共存沉淀通用屬性 :tel, xxx – 計(jì)算或轉(zhuǎn)換

1商品維度表

-- 商品維度表 :dim_sku_full

-- 確定維度表

-- 主維表和相關(guān)維表

-- 主維表和相關(guān)維表都是MySQL業(yè)務(wù)表

-- 主要用于分析列的表稱之主維表(主鍵)

-- sku_info

-- 其他用于分析列的表稱之相關(guān)維表

-- sku_attr_value

-- sku_sale_attr_value

-- 確定表的列

-- 建表語句

DROP TABLE IF EXISTS dim_sku_full;

CREATE EXTERNAL TABLE dim_sku_full

(

`id` STRING COMMENT 'SKU_ID',

`price` DECIMAL(16, 2) COMMENT '商品價(jià)格',

`sku_name` STRING COMMENT '商品名稱',

`sku_desc` STRING COMMENT '商品描述',

`weight` DECIMAL(16, 2) COMMENT '重量',

`is_sale` BOOLEAN COMMENT '是否在售',

`spu_id` STRING COMMENT 'SPU編號(hào)',

`spu_name` STRING COMMENT 'SPU名稱',

`category3_id` STRING COMMENT '三級(jí)品類ID',

`category3_name` STRING COMMENT '三級(jí)品類名稱',

`category2_id` STRING COMMENT '二級(jí)品類id',

`category2_name` STRING COMMENT '二級(jí)品類名稱',

`category1_id` STRING COMMENT '一級(jí)品類ID',

`category1_name` STRING COMMENT '一級(jí)品類名稱',

`tm_id` STRING COMMENT '品牌ID',

`tm_name` STRING COMMENT '品牌名稱',

`sku_attr_values` ARRAY

value_id :STRING,

attr_name :STRING,

value_name:STRING>> COMMENT '平臺(tái)屬性',

`sku_sale_attr_values` ARRAY

sale_attr_value_id :STRING,

sale_attr_name :STRING,

sale_attr_value_name:STRING>> COMMENT '銷售屬性',

`create_time` STRING COMMENT '創(chuàng)建時(shí)間'

) COMMENT '商品維度表'

PARTITIONED BY (`dt` STRING)

STORED AS ORC

LOCATION '/warehouse/gmall/dim/dim_sku_full/'

TBLPROPERTIES ('orc.compress' = 'snappy');

-- 數(shù)據(jù)裝載

-- load

-- 數(shù)據(jù)源一定是ODS層

-- save

-- 分區(qū)字段其實(shí)也是表的字段,但是我們一般稱之為虛擬字段

-- 數(shù)據(jù)字段(列):存儲(chǔ)在數(shù)據(jù)文件中

-- 分區(qū)字段:存儲(chǔ)在路徑中

-- 分區(qū)

-- dt : date(日期)

-- 策略:將每天采集的數(shù)據(jù)存放到ODS層的每天分區(qū)中

-- 將ODS層每天的數(shù)據(jù)關(guān)聯(lián)后存放到DIM層的每天分區(qū)中

-- 分區(qū)存儲(chǔ)應(yīng)該采用overwrite而不是into

set hive.vectorized.execution.enabled=false;

insert overwrite table dim_sku_full partition (dt='2022-06-08')

select

sku.`id` ,--STRING COMMENT 'SKU_ID',

`price` ,--DECIMAL(16, 2) COMMENT '商品價(jià)格',

`sku_name` ,--STRING COMMENT '商品名稱',

`sku_desc` ,--STRING COMMENT '商品描述',

`weight` ,--DECIMAL(16, 2) COMMENT '重量',

`is_sale` ,--BOOLEAN COMMENT '是否在售',

`spu_id` ,--STRING COMMENT 'SPU編號(hào)',

`spu_name` ,--STRING COMMENT 'SPU名稱',

`category3_id` ,--STRING COMMENT '三級(jí)品類ID',

`category3_name` ,--STRING COMMENT '三級(jí)品類名稱',

`category2_id` ,--STRING COMMENT '二級(jí)品類id',

`category2_name` ,--STRING COMMENT '二級(jí)品類名稱',

`category1_id` ,--STRING COMMENT '一級(jí)品類ID',

`category1_name` ,--STRING COMMENT '一級(jí)品類名稱',

`tm_id` ,-- STRING COMMENT '品牌ID',

`tm_name` ,-- STRING COMMENT '品牌名稱',

sku_attr_values,

sku_sale_attr_values,

`create_time` --STRING COMMENT '創(chuàng)建時(shí)間'

from (

select

`id` ,--STRING COMMENT 'SKU_ID',

`price` ,--DECIMAL(16, 2) COMMENT '商品價(jià)格',

`sku_name` ,--STRING COMMENT '商品名稱',

`sku_desc` ,--STRING COMMENT '商品描述',

`weight` ,--DECIMAL(16, 2) COMMENT '重量',

`is_sale` ,--BOOLEAN COMMENT '是否在售',

`spu_id` ,--STRING COMMENT 'SPU編號(hào)',

`category3_id` ,--STRING COMMENT '三級(jí)品類ID',

`tm_id` ,-- STRING COMMENT '品牌ID',

create_time

from ods_sku_info_full

where dt = '2022-06-08'

) sku

left join (

select

id,

spu_name

from ods_spu_info_full

where dt = '2022-06-08'

) spu on sku.spu_id = spu.id

left join (

select

id,

tm_name

from ods_base_trademark_full

where dt = '2022-06-08'

) tm on sku.tm_id = tm.id

left join (

select

id,

name category3_name,

category2_id

from ods_base_category3_full

where dt = '2022-06-08'

) c3 on sku.category3_id = c3.id

left join (

select

id,

name category2_name,

category1_id

from ods_base_category2_full

where dt = '2022-06-08'

) c2 on c3.category2_id = c2.id

left join (

select

id,

name category1_name

from ods_base_category1_full

where dt = '2022-06-08'

) c1 on c2.category1_id = c1.id

left join (

-- 將查詢結(jié)果轉(zhuǎn)換為結(jié)構(gòu)體后,形成Array

-- 將多個(gè)結(jié)構(gòu)體的數(shù)據(jù)聚合成一個(gè)數(shù)組類型的數(shù)據(jù)(聚合操作)

select

sku_id,

collect_list(named_struct("attr_id", attr_id, "value_id", value_id, "attr_name", attr_name, "value_name", value_name)) sku_attr_values

from ods_sku_attr_value_full

where dt = '2022-06-08'

group by sku_id

) sav on sku.id = sav.sku_id

left join (

select

sku_id,

collect_list(named_struct("sale_attr_id", sale_attr_id, "sale_attr_value_id", sale_attr_value_id, "sale_attr_name", sale_attr_name, "sale_attr_value_name", sale_attr_value_name)) sku_sale_attr_values

from ods_sku_sale_attr_value_full

where dt = '2022-06-08'

group by sku_id

) ssav on sku.id = ssav.sku_id;

-- join & left join

-- join 要求2張表的數(shù)據(jù)同時(shí)滿足條件才能作為結(jié)果返回

-- left join 要求2張表的數(shù)據(jù)左邊的表不滿足條件也能作為結(jié)果返回

-- 使用left join 替換 join,必須保證,替換后不影響結(jié)果

2優(yōu)惠券維度表

1)建表語句

-- 創(chuàng)建表

-- 分析表中的列

-- 主維表

-- coupon_info

-- base_dic

-- 相關(guān)維表

-- 建表語句

-- 表名

DROP TABLE IF EXISTS dim_coupon_full;

CREATE EXTERNAL TABLE dim_coupon_full

(

`id` STRING COMMENT '優(yōu)惠券編號(hào)',

`coupon_name` STRING COMMENT '優(yōu)惠券名稱',

`coupon_type_code` STRING COMMENT '優(yōu)惠券類型編碼',

`coupon_type_name` STRING COMMENT '優(yōu)惠券類型名稱',

`condition_amount` DECIMAL(16, 2) COMMENT '滿額數(shù)',

`condition_num` BIGINT COMMENT '滿件數(shù)',

`activity_id` STRING COMMENT '活動(dòng)編號(hào)',

`benefit_amount` DECIMAL(16, 2) COMMENT '減免金額',

`benefit_discount` DECIMAL(16, 2) COMMENT '折扣',

`benefit_rule` STRING COMMENT '優(yōu)惠規(guī)則:滿元*減*元,滿*件打*折',

`create_time` STRING COMMENT '創(chuàng)建時(shí)間',

`range_type_code` STRING COMMENT '優(yōu)惠范圍類型編碼',

`range_type_name` STRING COMMENT '優(yōu)惠范圍類型名稱',

`limit_num` BIGINT COMMENT '最多領(lǐng)取次數(shù)',

`taken_count` BIGINT COMMENT '已領(lǐng)取次數(shù)',

`start_time` STRING COMMENT '可以領(lǐng)取的開始時(shí)間',

`end_time` STRING COMMENT '可以領(lǐng)取的結(jié)束時(shí)間',

`operate_time` STRING COMMENT '修改時(shí)間',

`expire_time` STRING COMMENT '過期時(shí)間'

) COMMENT '優(yōu)惠券維度表'

PARTITIONED BY (`dt` STRING)

STORED AS ORC

LOCATION '/warehouse/gmall/dim/dim_coupon_full/'

TBLPROPERTIES ('orc.compress' = 'snappy');

2)數(shù)據(jù)裝載

insert overwrite table dim_coupon_full partition(dt='2022-06-08')

select

id,

coupon_name,

coupon_type,

coupon_dic.dic_name,

condition_amount,

condition_num,

activity_id,

benefit_amount,

benefit_discount,

case coupon_type

when '3201' then concat('滿',condition_amount,'元減',benefit_amount,'元')

when '3202' then concat('滿',condition_num,'件打', benefit_discount,' 折')

when '3203' then concat('減',benefit_amount,'元')

end benefit_rule,

create_time,

range_type,

range_dic.dic_name,

limit_num,

taken_count,

start_time,

end_time,

operate_time,

expire_time

from

(

select

id,

coupon_name,

coupon_type,

condition_amount,

condition_num,

activity_id,

benefit_amount,

benefit_discount,

create_time,

range_type,

limit_num,

taken_count,

start_time,

end_time,

operate_time,

expire_time

from ods_coupon_info_full

where dt='2022-06-08'

)ci

left join

(

select

dic_code,

dic_name

from ods_base_dic_full

where dt='2022-06-08'

and parent_code='32'

)coupon_dic

on ci.coupon_type=coupon_dic.dic_code

left join

(

select

dic_code,

dic_name

from ods_base_dic_full

where dt='2022-06-08'

and parent_code='33'

)range_dic

on ci.range_type=range_dic.dic_code;

3活動(dòng)(規(guī)則)維度表

1)建表語句

-- 活動(dòng)(規(guī)則)維度表

-- activity_info

-- activity_rule

DROP TABLE IF EXISTS dim_activity_full;

CREATE EXTERNAL TABLE dim_activity_full

(

`activity_rule_id` STRING COMMENT '活動(dòng)規(guī)則ID',

`activity_id` STRING COMMENT '活動(dòng)ID',

`activity_name` STRING COMMENT '活動(dòng)名稱',

`activity_type_code` STRING COMMENT '活動(dòng)類型編碼',

`activity_type_name` STRING COMMENT '活動(dòng)類型名稱',

`activity_desc` STRING COMMENT '活動(dòng)描述',

`start_time` STRING COMMENT '開始時(shí)間',

`end_time` STRING COMMENT '結(jié)束時(shí)間',

`create_time` STRING COMMENT '創(chuàng)建時(shí)間',

`condition_amount` DECIMAL(16, 2) COMMENT '滿減金額',

`condition_num` BIGINT COMMENT '滿減件數(shù)',

`benefit_amount` DECIMAL(16, 2) COMMENT '優(yōu)惠金額',

`benefit_discount` DECIMAL(16, 2) COMMENT '優(yōu)惠折扣',

`benefit_rule` STRING COMMENT '優(yōu)惠規(guī)則',

`benefit_level` STRING COMMENT '優(yōu)惠級(jí)別'

) COMMENT '活動(dòng)維度表'

PARTITIONED BY (`dt` STRING)

STORED AS ORC

LOCATION '/warehouse/gmall/dim/dim_activity_full/'

TBLPROPERTIES ('orc.compress' = 'snappy');

2)數(shù)據(jù)裝載

insert overwrite table dim_activity_full partition(dt='2022-06-08')

select

rule.id,

info.id,

activity_name,

rule.activity_type,

dic.dic_name,

activity_desc,

start_time,

end_time,

create_time,

condition_amount,

condition_num,

benefit_amount,

benefit_discount,

case rule.activity_type

when '3101' then concat('滿',condition_amount,'元減',benefit_amount,'元')

when '3102' then concat('滿',condition_num,'件打', benefit_discount,' 折')

when '3103' then concat('打', benefit_discount,'折')

end benefit_rule,

benefit_level

from

(

select

id,

activity_id,

activity_type,

condition_amount,

condition_num,

benefit_amount,

benefit_discount,

benefit_level

from ods_activity_rule_full

where dt='2022-06-08'

)rule

left join

(

select

id,

activity_name,

activity_type,

activity_desc,

start_time,

end_time,

create_time

from ods_activity_info_full

where dt='2022-06-08'

)info

on rule.activity_id=info.id

left join

(

select

dic_code,

dic_name

from ods_base_dic_full

where dt='2022-06-08'

and parent_code='31'

)dic

on rule.activity_type=dic.dic_code;

4地區(qū)維度表

1)建表語句

DROP TABLE IF EXISTS dim_province_full;

CREATE EXTERNAL TABLE dim_province_full

(

`id` STRING COMMENT '省份ID',

`province_name` STRING COMMENT '省份名稱',

`area_code` STRING COMMENT '地區(qū)編碼',

`iso_code` STRING COMMENT '舊版國(guó)際標(biāo)準(zhǔn)地區(qū)編碼,供可視化使用',

`iso_3166_2` STRING COMMENT '新版國(guó)際標(biāo)準(zhǔn)地區(qū)編碼,供可視化使用',

`region_id` STRING COMMENT '地區(qū)ID',

`region_name` STRING COMMENT '地區(qū)名稱'

) COMMENT '地區(qū)維度表'

PARTITIONED BY (`dt` STRING)

STORED AS ORC

LOCATION '/warehouse/gmall/dim/dim_province_full/'

TBLPROPERTIES ('orc.compress' = 'snappy');

2)數(shù)據(jù)裝載

insert overwrite table dim_province_full partition(dt='2022-06-08')

select

province.id,

province.name,

province.area_code,

province.iso_code,

province.iso_3166_2,

region_id,

region_name

from

(

select

id,

name,

region_id,

area_code,

iso_code,

iso_3166_2

from ods_base_province_full

where dt='2022-06-08'

)province

left join

(

select

id,

region_name

from ods_base_region_full

where dt='2022-06-08'

)region

on province.region_id=region.id;

5營(yíng)銷坑位維度表

1)建表語句

DROP TABLE IF EXISTS dim_promotion_pos_full;

CREATE EXTERNAL TABLE dim_promotion_pos_full

(

`id` STRING COMMENT '營(yíng)銷坑位ID',

`pos_location` STRING COMMENT '營(yíng)銷坑位位置',

`pos_type` STRING COMMENT '營(yíng)銷坑位類型 ',

`promotion_type` STRING COMMENT '營(yíng)銷類型',

`create_time` STRING COMMENT '創(chuàng)建時(shí)間',

`operate_time` STRING COMMENT '修改時(shí)間'

) COMMENT '營(yíng)銷坑位維度表'

PARTITIONED BY (`dt` STRING)

STORED AS ORC

LOCATION '/warehouse/gmall/dim/dim_promotion_pos_full/'

TBLPROPERTIES ('orc.compress' = 'snappy');

2)數(shù)據(jù)裝載

insert overwrite table dim_promotion_pos_full partition(dt='2022-06-08')

select

`id`,

`pos_location`,

`pos_type`,

`promotion_type`,

`create_time`,

`operate_time`

from ods_promotion_pos_full

where dt='2022-06-08';

6營(yíng)銷渠道維度表

1)建表語句

DROP TABLE IF EXISTS dim_promotion_refer_full;

CREATE EXTERNAL TABLE dim_promotion_refer_full

(

`id` STRING COMMENT '營(yíng)銷渠道ID',

`refer_name` STRING COMMENT '營(yíng)銷渠道名稱',

`create_time` STRING COMMENT '創(chuàng)建時(shí)間',

`operate_time` STRING COMMENT '修改時(shí)間'

) COMMENT '營(yíng)銷渠道維度表'

PARTITIONED BY (`dt` STRING)

STORED AS ORC

LOCATION '/warehouse/gmall/dim/dim_promotion_refer_full/'

TBLPROPERTIES ('orc.compress' = 'snappy');

2)數(shù)據(jù)裝載

insert overwrite table dim_promotion_refer_full partition(dt='2022-06-08')

select

`id`,

`refer_name`,

`create_time`,

`operate_time`

from ods_promotion_refer_full

where dt='2022-06-08';

7日期維度表

日期維度表不需要從MySQL中導(dǎo),而是從文件中另行導(dǎo)入,也不需要每天導(dǎo)入,每年導(dǎo)入一次即可 1)建表語句

DROP TABLE IF EXISTS dim_date;

CREATE EXTERNAL TABLE dim_date

(

`date_id` STRING COMMENT '日期ID',

`week_id` STRING COMMENT '周ID,一年中的第幾周',

`week_day` STRING COMMENT '周幾',

`day` STRING COMMENT '每月的第幾天',

`month` STRING COMMENT '一年中的第幾月',

`quarter` STRING COMMENT '一年中的第幾季度',

`year` STRING COMMENT '年份',

`is_workday` STRING COMMENT '是否是工作日',

`holiday_id` STRING COMMENT '節(jié)假日'

) COMMENT '日期維度表'

STORED AS ORC

LOCATION '/warehouse/gmall/dim/dim_date/'

TBLPROPERTIES ('orc.compress' = 'snappy');

2)數(shù)據(jù)裝載

(1)創(chuàng)建臨時(shí)表

DROP TABLE IF EXISTS tmp_dim_date_info;

CREATE EXTERNAL TABLE tmp_dim_date_info (

`date_id` STRING COMMENT '日',

`week_id` STRING COMMENT '周ID',

`week_day` STRING COMMENT '周幾',

`day` STRING COMMENT '每月的第幾天',

`month` STRING COMMENT '第幾月',

`quarter` STRING COMMENT '第幾季度',

`year` STRING COMMENT '年',

`is_workday` STRING COMMENT '是否是工作日',

`holiday_id` STRING COMMENT '節(jié)假日'

) COMMENT '時(shí)間維度表'

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

LOCATION '/warehouse/gmall/tmp/tmp_dim_date_info/';

(2)將數(shù)據(jù)文件上傳到HFDS上臨時(shí)表路徑/warehouse/gmall/tmp/tmp_dim_date_info TSV格式的數(shù)據(jù)

-- 日期數(shù)據(jù)不是由我們自己提供的

-- TSV -> ORC

-- ODS(全量) -> DIM(ORC)

insert overwrite table dim_date select * from tmp_dim_date_info;

8用戶維度表(拉鏈(壓縮)表)

將大量數(shù)據(jù)的表進(jìn)行特殊的設(shè)計(jì)進(jìn)行改善,讓數(shù)據(jù)減少,并且不影響業(yè)務(wù)邏輯 – 將數(shù)據(jù)狀態(tài)進(jìn)行時(shí)間標(biāo)記:開始 + 結(jié)束設(shè)計(jì)拉鏈表時(shí),需要在基本表的設(shè)計(jì)基礎(chǔ)上,增加2個(gè)額外字段,用于表示狀態(tài)的范圍(開始,結(jié)束)拉鏈表的數(shù)據(jù),每一個(gè)狀態(tài)的變化會(huì)保存一條數(shù)據(jù),如果狀態(tài)沒有任何的變化,那么數(shù)據(jù)只有一條 1)建表語句

-- 表名:dim_user_full

-- 表中列

-- 主維表 : user_info

-- 相關(guān)維表 : user_address

DROP TABLE IF EXISTS dim_user_zip;

CREATE EXTERNAL TABLE dim_user_zip

(

`id` STRING COMMENT '用戶ID',

`name` STRING COMMENT '用戶姓名',

`phone_num` STRING COMMENT '手機(jī)號(hào)碼',

`email` STRING COMMENT '郵箱',

`user_level` STRING COMMENT '用戶等級(jí)',

`birthday` STRING COMMENT '生日',

`gender` STRING COMMENT '性別',

`create_time` STRING COMMENT '創(chuàng)建時(shí)間',

`operate_time` STRING COMMENT '操作時(shí)間',

`start_date` STRING COMMENT '開始日期',

`end_date` STRING COMMENT '結(jié)束日期'

) COMMENT '用戶維度表'

PARTITIONED BY (`dt` STRING)

STORED AS ORC

LOCATION '/warehouse/gmall/dim/dim_user_zip/'

TBLPROPERTIES ('orc.compress' = 'snappy');

2)數(shù)據(jù)裝載 ① 首日裝載

-- 全量表

-- DataX

-- TSV

-- 增量表

-- Maxwell

-- JSON

-- 首日(全量-select) : bootstrap (3種類型)

-- 每日(增量-binlog) : insert, update, delete

insert overwrite table dim_user_zip partition (dt = '9999-12-31')

select data.id,

concat(substr(data.name, 1, 1), '*') name,

if(data.phone_num regexp '^(13[0-9]|14[01456879]|15[0-35-9]|16[2567]|17[0-8]|18[0-9]|19[0-35-9])\\d{8}$',

concat(substr(data.phone_num, 1, 3), '*'), null) phone_num,

if(data.email regexp '^[a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+(\\.[a-zA-Z0-9_-]+)+$',

concat('*@', split(data.email, '@')[1]), null) email,

data.user_level,

data.birthday,

data.gender,

data.create_time,

data.operate_time,

'2022-06-08' start_date,

'9999-12-31' end_date

from ods_user_info_inc

where dt = '2022-06-08'

and type = 'bootstrap-insert';

② 每日裝載

set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table dim_user_zip partition (dt)

select id,

name,

phone_num,

email,

user_level,

birthday,

gender,

create_time,

operate_time,

start_date,

if(rn = 2, date_sub('2022-06-09', 1), end_date) end_date,

if(rn = 1, '9999-12-31', date_sub('2022-06-09', 1)) dt

from (

select id,

name,

phone_num,

email,

user_level,

birthday,

gender,

create_time,

operate_time,

start_date,

end_date,

row_number() over (partition by id order by start_date desc) rn

from (

select id,

name,

phone_num,

email,

user_level,

birthday,

gender,

create_time,

operate_time,

start_date,

end_date

from dim_user_zip

where dt = '9999-12-31'

union

select id,

concat(substr(name, 1, 1), '*') name,

if(phone_num regexp

'^(13[0-9]|14[01456879]|15[0-35-9]|16[2567]|17[0-8]|18[0-9]|19[0-35-9])\\d{8}$',

concat(substr(phone_num, 1, 3), '*'), null) phone_num,

if(email regexp '^[a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+(\\.[a-zA-Z0-9_-]+)+$',

concat('*@', split(email, '@')[1]), null) email,

user_level,

birthday,

gender,

create_time,

operate_time,

'2022-06-09' start_date,

'9999-12-31' end_date

from (

select data.id,

data.name,

data.phone_num,

data.email,

data.user_level,

data.birthday,

data.gender,

data.create_time,

data.operate_time,

row_number() over (partition by data.id order by ts desc) rn

from ods_user_info_inc

where dt = '2022-06-09'

) t1

where rn = 1

) t2

) t3;

CTE : 共通表表達(dá)式

with

aa as ( select * from test_table_part ),

ab as (select * from aa)

select

*

from aa

join ab;

select

*

from (

select * from test_table_part

) aa

join (

select * from test_table_part

) bb;

select

*

from dim_sku_full

拉鏈表設(shè)計(jì)

-- 數(shù)據(jù)裝載

-- load

-- save

-- 增量表得數(shù)據(jù)操作一般都會(huì)寫2個(gè)

-- 首日數(shù)據(jù)裝載

-- 每日數(shù)據(jù)裝載

-- 首日數(shù)據(jù)裝載

-- 同步方式:maxwell - 全量 - bootstrap - select * from user_info

-- MySQL不保存行為數(shù)據(jù),也就意味著不保存歷史行為數(shù)據(jù)

-- 拉鏈表會(huì)在當(dāng)前表得字段得基礎(chǔ)上,額外添加兩個(gè)字段(start, end),用于標(biāo)記狀態(tài)得有效范圍

-- start : 無法判斷開始范圍

-- end : 無法判斷

-- 折中地考慮

-- 從當(dāng)天開始,結(jié)束時(shí)間取時(shí)間極大值(避免數(shù)據(jù)頻繁修改)

-- 分區(qū)策略

-- 絕大多數(shù)得維度表得分區(qū)策略都是以天為單位

-- 分區(qū)不能采用開始日期作為分區(qū)字段

-- 無法判斷數(shù)據(jù)是否為歷史狀態(tài)還是最新狀態(tài)

-- 好得方式是使用結(jié)束時(shí)間為分區(qū)字段

任務(wù)調(diào)度器

保證每一層的SQL跑完再跑下一層

柚子快報(bào)激活碼778899分享:數(shù)據(jù)倉庫哈哈

http://yzkb.51969.com/

相關(guān)鏈接

評(píng)論可見,查看隱藏內(nèi)容

本文內(nèi)容根據(jù)網(wǎng)絡(luò)資料整理,出于傳遞更多信息之目的,不代表金鑰匙跨境贊同其觀點(diǎn)和立場(chǎng)。

轉(zhuǎn)載請(qǐng)注明,如有侵權(quán),聯(lián)系刪除。

本文鏈接:http://gantiao.com.cn/post/19214819.html

發(fā)布評(píng)論

您暫未設(shè)置收款碼

請(qǐng)?jiān)谥黝}配置——文章設(shè)置里上傳

掃描二維碼手機(jī)訪問

文章目錄